Updating search results...

Search Resources

4 Results

View
Selected filters:
Data Wrangling and Processing for Genomics
Unrestricted Use
CC BY
Rating
0.0 stars

Data Carpentry lesson to learn how to use command-line tools to perform quality control, align reads to a reference genome, and identify and visualize between-sample variation. A lot of genomics analysis is done using command-line tools for three reasons: 1) you will often be working with a large number of files, and working through the command-line rather than through a graphical user interface (GUI) allows you to automate repetitive tasks, 2) you will often need more compute power than is available on your personal computer, and connecting to and interacting with remote computers requires a command-line interface, and 3) you will often need to customize your analyses, and command-line tools often enable more customization than the corresponding GUI tools (if in fact a GUI tool even exists). In a previous lesson, you learned how to use the bash shell to interact with your computer through a command line interface. In this lesson, you will be applying this new knowledge to carry out a common genomics workflow - identifying variants among sequencing samples taken from multiple individuals within a population. We will be starting with a set of sequenced reads (.fastq files), performing some quality control steps, aligning those reads to a reference genome, and ending by identifying and visualizing variations among these samples. As you progress through this lesson, keep in mind that, even if you aren’t going to be doing this same workflow in your research, you will be learning some very important lessons about using command-line bioinformatic tools. What you learn here will enable you to use a variety of bioinformatic tools with confidence and greatly enhance your research efficiency and productivity.

Subject:
Applied Science
Computer Science
Genetics
Information Science
Life Science
Mathematics
Measurement and Data
Material Type:
Module
Provider:
The Carpentries
Author:
Adam Thomas
Ahmed R. Hasan
Aniello Infante
Anita Schürch
Dev Paudel
Erin Alison Becker
Fotis Psomopoulos
François Michonneau
Gaius Augustus
Gregg TeHennepe
Jason Williams
Jessica Elizabeth Mizzi
Karen Cranston
Kari L Jordan
Kate Crosby
Kevin Weitemier
Lex Nederbragt
Luis Avila
Peter R. Hoyt
Rayna Michelle Harris
Ryan Peek
Sheldon John McKay
Sheldon McKay
Taylor Reiter
Tessa Pierce
Toby Hodges
Tracy Teal
Vasilis Lenis
Winni Kretzschmar
dbmarchant
Date Added:
08/07/2020
Introduction to R for Geospatial Data
Unrestricted Use
CC BY
Rating
0.0 stars

The goal of this lesson is to provide an introduction to R for learners working with geospatial data. It is intended as a pre-requisite for the R for Raster and Vector Data lesson for learners who have no prior experience using R. This lesson can be taught in approximately 4 hours and covers the following topics: Working with R in the RStudio GUI Project management and file organization Importing data into R Introduction to R’s core data types and data structures Manipulation of data frames (tabular data) in R Introduction to visualization Writing data to a file The the R for Raster and Vector Data lesson provides a more in-depth introduction to visualization (focusing on geospatial data), and working with data structures unique to geospatial data.

Subject:
Applied Science
Computer Science
Information Science
Mathematics
Measurement and Data
Material Type:
Module
Provider:
The Carpentries
Author:
Anne Fouilloux
Chris Prener
Claudia Engel
David Mawdsley
Erin Becker
François Michonneau
Ido Bar
Jeffrey Oliver
Juan Fung
Katrin Leinweber
Kevin Weitemier
Kok Ben Toh
Lachlan Deer
Marieke Frassl
Matt Clark
Miles McBain
Naupaka Zimmerman
Paula Andrea Martinez
Preethy Nair
Raniere Silva
Rayna Harris
Richard McCosh
Vicken Hillis
butterflyskip
Date Added:
08/07/2020
Project Organization and Management for Genomics
Unrestricted Use
CC BY
Rating
0.0 stars

Data Carpentry Genomics workshop lesson to learn how to structure your metadata, organize and document your genomics data and bioinformatics workflow, and access data on the NCBI sequence read archive (SRA) database. Good data organization is the foundation of any research project. It not only sets you up well for an analysis, but it also makes it easier to come back to the project later and share with collaborators, including your most important collaborator - future you. Organizing a project that includes sequencing involves many components. There’s the experimental setup and conditions metadata, measurements of experimental parameters, sequencing preparation and sample information, the sequences themselves and the files and workflow of any bioinformatics analysis. So much of the information of a sequencing project is digital, and we need to keep track of our digital records in the same way we have a lab notebook and sample freezer. In this lesson, we’ll go through the project organization and documentation that will make an efficient bioinformatics workflow possible. Not only will this make you a more effective bioinformatics researcher, it also prepares your data and project for publication, as grant agencies and publishers increasingly require this information. In this lesson, we’ll be using data from a study of experimental evolution using E. coli. More information about this dataset is available here. In this study there are several types of files: Spreadsheet data from the experiment that tracks the strains and their phenotype over time Spreadsheet data with information on the samples that were sequenced - the names of the samples, how they were prepared and the sequencing conditions The sequence data Throughout the analysis, we’ll also generate files from the steps in the bioinformatics pipeline and documentation on the tools and parameters that we used. In this lesson you will learn: How to structure your metadata, tabular data and information about the experiment. The metadata is the information about the experiment and the samples you’re sequencing. How to prepare for, understand, organize and store the sequencing data that comes back from the sequencing center How to access and download publicly available data that may need to be used in your bioinformatics analysis The concepts of organizing the files and documenting the workflow of your bioinformatics analysis

Subject:
Business and Communication
Genetics
Life Science
Management
Material Type:
Module
Provider:
The Carpentries
Author:
Amanda Charbonneau
Bérénice Batut
Daniel O. S. Ouso
Deborah Paul
Erin Alison Becker
François Michonneau
Jason Williams
Juan A. Ugalde
Kevin Weitemier
Laura Williams
Paula Andrea Martinez
Peter R. Hoyt
Rayna Michelle Harris
Taylor Reiter
Toby Hodges
Tracy Teal
Date Added:
08/07/2020
R for Reproducible Scientific Analysis
Unrestricted Use
CC BY
Rating
0.0 stars

This lesson in part of Software Carpentry workshop and teach novice programmers to write modular code and best practices for using R for data analysis. an introduction to R for non-programmers using gapminder data The goal of this lesson is to teach novice programmers to write modular code and best practices for using R for data analysis. R is commonly used in many scientific disciplines for statistical analysis and its array of third-party packages. We find that many scientists who come to Software Carpentry workshops use R and want to learn more. The emphasis of these materials is to give attendees a strong foundation in the fundamentals of R, and to teach best practices for scientific computing: breaking down analyses into modular units, task automation, and encapsulation. Note that this workshop will focus on teaching the fundamentals of the programming language R, and will not teach statistical analysis. The lesson contains more material than can be taught in a day. The instructor notes page has some suggested lesson plans suitable for a one or half day workshop. A variety of third party packages are used throughout this workshop. These are not necessarily the best, nor are they comprehensive, but they are packages we find useful, and have been chosen primarily for their usability.

Subject:
Applied Science
Computer Science
Information Science
Mathematics
Measurement and Data
Material Type:
Module
Provider:
The Carpentries
Author:
Adam H. Sparks
Ahsan Ali Khoja
Amy Lee
Ana Costa Conrado
Andrew Boughton
Andrew Lonsdale
Andrew MacDonald
Andris Jankevics
Andy Teucher
Antonio Berlanga-Taylor
Ashwin Srinath
Ben Bolker
Bill Mills
Bret Beheim
Clare Sloggett
Daniel
Dave Bridges
David J. Harris
David Mawdsley
Dean Attali
Diego Rabatone Oliveira
Drew Tyre
Elise Morrison
Erin Alison Becker
Fernando Mayer
François Michonneau
Giulio Valentino Dalla Riva
Gordon McDonald
Greg Wilson
Harriet Dashnow
Ido Bar
Jaime Ashander
James Balamuta
James Mickley
Jamie McDevitt-Irwin
Jeffrey Arnold
Jeffrey Oliver
John Blischak
Jonah Duckles
Josh Quan
Julia Piaskowski
Kara Woo
Kate Hertweck
Katherine Koziar
Katrin Leinweber
Kellie Ottoboni
Kevin Weitemier
Kiana Ashley West
Kieran Samuk
Kunal Marwaha
Kyriakos Chatzidimitriou
Lachlan Deer
Lex Nederbragt
Liz Ing-Simmons
Lucy Chang
Luke W Johnston
Luke Zappia
Marc Sze
Marie-Helene Burle
Marieke Frassl
Mark Dunning
Martin John Hadley
Mary Donovan
Matt Clark
Melissa Kardish
Mike Jackson
Murray Cadzow
Narayanan Raghupathy
Naupaka Zimmerman
Nelly Sélem
Nicholas Lesniak
Nicholas Potter
Nima Hejazi
Nora Mitchell
Olivia Rata Burge
Paula Andrea Martinez
Pete Bachant
Phil Bouchet
Philipp Boersch-Supan
Piotr Banaszkiewicz
Raniere Silva
Rayna Michelle Harris
Remi Daigle
Research Bazaar
Richard Barnes
Robert Bagchi
Rémi Emonet
Sam Penrose
Sandra Brosda
Sarah Munro
Sasha Lavrentovich
Scott Allen Funkhouser
Scott Ritchie
Sebastien Renaut
Thea Van Rossum
Timothy Eoin Moore
Timothy Rice
Tobin Magle
Trevor Bekolay
Tyler Crawford Kelly
Vicken Hillis
Yuka Takemon
bippuspm
butterflyskip
waiteb5
Date Added:
03/20/2017