OER Commons

Data Analysis and Visualization in R for Ecologists

Unrestricted Use

CC BY

Data Analysis and Visualization in R for Ecologists

Rating

Data Carpentry lesson from Ecology curriculum to learn how to analyse and visualise ecological data in R. Data Carpentry’s aim is to teach researchers basic concepts, skills, and tools for working with data so that they can get more done in less time, and with less pain. The lessons below were designed for those interested in working with ecology data in R. This is an introduction to R designed for participants with no programming experience. These lessons can be taught in a day (~ 6 hours). They start with some basic information about R syntax, the RStudio interface, and move through how to import CSV files, the structure of data frames, how to deal with factors, how to add/remove rows and columns, how to calculate summary statistics from a data frame, and a brief introduction to plotting. The last lesson demonstrates how to work with databases directly from R.

Subject:: Applied Science; Computer Science; Ecology; Information Science; Life Science; Mathematics; Measurement and Data
Material Type:: Module
Provider:: The Carpentries
Author:: Ankenbrand, Markus; Arindam Basu; Ashander, Jaime; Bahlai, Christie; Bailey, Alistair; Becker, Erin Alison; Bledsoe, Ellen; Boehm, Fred; Bolker, Ben; Bouquin, Daina; Burge, Olivia Rata; Burle, Marie-Helene; Carchedi, Nick; Chatzidimitriou, Kyriakos; Chiapello, Marco; Conrado, Ana Costa; Cortijo, Sandra; Cranston, Karen; Cuesta, Sergio Martínez; Culshaw-Maurer, Michael; Czapanskiy, Max; Daijiang Li; Dashnow, Harriet; Daskalova, Gergana; Deer, Lachlan; Direk, Kenan; Dunic, Jillian; Elahi, Robin; Fishman, Dmytro; Fouilloux, Anne; Fournier, Auriel; Gan, Emilia; Goswami, Shubhang; Guillou, Stéphane; Hancock, Stacey; Hardenberg, Achaz Von; Harrison, Paul; Hart, Ted; Herr, Joshua R.; Hertweck, Kate; Hodges, Toby; Hulshof, Catherine; Humburg, Peter; Jean, Martin; Johnson, Carolina; Johnson, Kayla; Johnston, Myfanwy; Jordan, Kari L; K. A. S. Mislan; Kaupp, Jake; Keane, Jonathan; Kerchner, Dan; Klinges, David; Koontz, Michael; Leinweber, Katrin; Lepore, Mauro Luciano; Li, Ye; Lijnzaad, Philip; Lotterhos, Katie; Mannheimer, Sara; Marwick, Ben; Michonneau, François; Millar, Justin; Moreno, Melissa; Najko Jahn; Obeng, Adam; Odom, Gabriel J.; Pauloo, Richard; Pawlik, Aleksandra Natalia; Pearse, Will; Peck, Kayla; Pederson, Steve; Peek, Ryan; Pletzer, Alex; Quinn, Danielle; Rajeg, Gede Primahadi Wijaya; Reiter, Taylor; Rodriguez-Sanchez, Francisco; Sandmann, Thomas; Seok, Brian; Sfn_brt; Shiklomanov, Alexey; Shivshankar Umashankar; Stachelek, Joseph; Strauss, Eli; Sumedh; Switzer, Callin; Tarkowski, Leszek; Tavares, Hugo; Teal, Tracy; Theobold, Allison; Tirok, Katrin; Tylén, Kristian; Vanichkina, Darya; Voter, Carolyn; Webster, Tara; Weisner, Michael; White, Ethan P; Wilson, Earle; Woo, Kara; Wright, April; Yanco, Scott; Ye, Hao
Date Added:: 03/20/2017

More Less

Data Wrangling and Processing for Genomics

Unrestricted Use

CC BY

Data Wrangling and Processing for Genomics

Rating

Data Carpentry lesson to learn how to use command-line tools to perform quality control, align reads to a reference genome, and identify and visualize between-sample variation. A lot of genomics analysis is done using command-line tools for three reasons: 1) you will often be working with a large number of files, and working through the command-line rather than through a graphical user interface (GUI) allows you to automate repetitive tasks, 2) you will often need more compute power than is available on your personal computer, and connecting to and interacting with remote computers requires a command-line interface, and 3) you will often need to customize your analyses, and command-line tools often enable more customization than the corresponding GUI tools (if in fact a GUI tool even exists). In a previous lesson, you learned how to use the bash shell to interact with your computer through a command line interface. In this lesson, you will be applying this new knowledge to carry out a common genomics workflow - identifying variants among sequencing samples taken from multiple individuals within a population. We will be starting with a set of sequenced reads (.fastq files), performing some quality control steps, aligning those reads to a reference genome, and ending by identifying and visualizing variations among these samples. As you progress through this lesson, keep in mind that, even if you aren’t going to be doing this same workflow in your research, you will be learning some very important lessons about using command-line bioinformatic tools. What you learn here will enable you to use a variety of bioinformatic tools with confidence and greatly enhance your research efficiency and productivity.

Subject:: Applied Science; Computer Science; Genetics; Information Science; Life Science; Mathematics; Measurement and Data
Material Type:: Module
Provider:: The Carpentries
Author:: Adam Thomas; Ahmed R. Hasan; Aniello Infante; Anita Schürch; Dev Paudel; Erin Alison Becker; Fotis Psomopoulos; François Michonneau; Gaius Augustus; Gregg TeHennepe; Jason Williams; Jessica Elizabeth Mizzi; Karen Cranston; Kari L Jordan; Kate Crosby; Kevin Weitemier; Lex Nederbragt; Luis Avila; Peter R. Hoyt; Rayna Michelle Harris; Ryan Peek; Sheldon John McKay; Sheldon McKay; Taylor Reiter; Tessa Pierce; Toby Hodges; Tracy Teal; Vasilis Lenis; Winni Kretzschmar; dbmarchant
Date Added:: 08/07/2020

More Less

Unrestricted Use

CC BY

Genomics Workshop Overview

Rating

Workshop overview for the Data Carpentry genomics curriculum. Data Carpentry’s aim is to teach researchers basic concepts, skills, and tools for working with data so that they can get more done in less time, and with less pain. This workshop teaches data management and analysis for genomics research including: best practices for organization of bioinformatics projects and data, use of command-line utilities, use of command-line tools to analyze sequence quality and perform variant calling, and connecting to and using cloud computing. This workshop is designed to be taught over two full days of instruction. Please note that workshop materials for working with Genomics data in R are in “alpha” development. These lessons are available for review and for informal teaching experiences, but are not yet part of The Carpentries’ official lesson offerings. Interested in teaching these materials? We have an onboarding video and accompanying slides available to prepare Instructors to teach these lessons. After watching this video, please contact team@carpentries.org so that we can record your status as an onboarded Instructor. Instructors who have completed onboarding will be given priority status for teaching at centrally-organized Data Carpentry Genomics workshops.

Subject:: Applied Science; Computer Science; Genetics; Information Science; Life Science; Mathematics; Measurement and Data
Material Type:: Module
Provider:: The Carpentries
Author:: Amanda Charbonneau; Erin Alison Becker; François Michonneau; Jason Williams; Maneesha Sane; Matthew Kweskin; Muhammad Zohaib Anwar; Murray Cadzow; Paula Andrea Martinez; Taylor Reiter; Tracy Teal
Date Added:: 08/07/2020

More Less

Introduction to Cloud Computing for Genomics

Unrestricted Use

CC BY

Introduction to Cloud Computing for Genomics

Rating

Data Carpentry lesson to learn how to work with Amazon AWS cloud computing and how to transfer data between your local computer and cloud resources. The cloud is a fancy name for the huge network of computers that host your favorite websites, stream movies, and shop online, but you can also harness all of that computing power for running analyses that would take days, weeks or even years on your local computer. In this lesson, you’ll learn about renting cloud services that fit your analytic needs, and how to interact with one of those services (AWS) via the command line.

Subject:: Applied Science; Computer Science; Information Science; Mathematics; Measurement and Data
Material Type:: Module
Provider:: The Carpentries
Author:: Abigail Cabunoc Mayes; Adina Howe; Amanda Charbonneau; Bob Freeman; Brittany N. Lasseigne, PhD; Bérénice Batut; Caryn Johansen; Chris Fields; Darya Vanichkina; David Mawdsley; Erin Becker; François Michonneau; Greg Wilson; Jason Williams; Joseph Stachelek; Kari L. Jordan, PhD; Katrin Leinweber; Maxim Belkin; Michael R. Crusoe; Piotr Banaszkiewicz; Raniere Silva; Renato Alves; Rémi Emonet; Stephen Turner; Taylor Reiter; Thomas Morrell; Tracy Teal; William L. Close; ammatsun; vuw-ecs-kevin
Date Added:: 03/28/2017

More Less

Introduction to the Command Line for Genomics

Unrestricted Use

CC BY

Introduction to the Command Line for Genomics

Rating

Data Carpentry lesson to learn to navigate your file system, create, copy, move, and remove files and directories, and automate repetitive tasks using scripts and wildcards with genomics data. Command line interface (OS shell) and graphic user interface (GUI) are different ways of interacting with a computer’s operating system. The shell is a program that presents a command line interface which allows you to control your computer using commands entered with a keyboard instead of controlling graphical user interfaces (GUIs) with a mouse/keyboard combination. There are quite a few reasons to start learning about the shell: For most bioinformatics tools, you have to use the shell. There is no graphical interface. If you want to work in metagenomics or genomics you’re going to need to use the shell. The shell gives you power. The command line gives you the power to do your work more efficiently and more quickly. When you need to do things tens to hundreds of times, knowing how to use the shell is transformative. To use remote computers or cloud computing, you need to use the shell.

Subject:: Applied Science; Computer Science; Genetics; Information Science; Life Science; Mathematics; Measurement and Data
Material Type:: Module
Provider:: The Carpentries
Author:: Amanda Charbonneau; Amy E. Hodge; Anita Schürch; Bastian Greshake Tzovaras; Bérénice Batut; Colin Davenport; Diya Das; Erin Alison Becker; François Michonneau; Giulio Valentino Dalla Riva; Jessica Elizabeth Mizzi; Karen Cranston; Kari L Jordan; Mattias de Hollander; Mike Lee; Niclas Jareborg; Omar Julio Sosa; Rayna Michelle Harris; Ross Cunning; Russell Neches; Sarah Stevens; Shannon EK Joslin; Sheldon John McKay; Siva Chudalayandi; Taylor Reiter; Tobi; Tracy Teal; Tristan De Buysscher
Date Added:: 08/07/2020

More Less

Project Organization and Management for Genomics

Unrestricted Use

CC BY

Project Organization and Management for Genomics

Rating

Data Carpentry Genomics workshop lesson to learn how to structure your metadata, organize and document your genomics data and bioinformatics workflow, and access data on the NCBI sequence read archive (SRA) database. Good data organization is the foundation of any research project. It not only sets you up well for an analysis, but it also makes it easier to come back to the project later and share with collaborators, including your most important collaborator - future you. Organizing a project that includes sequencing involves many components. There’s the experimental setup and conditions metadata, measurements of experimental parameters, sequencing preparation and sample information, the sequences themselves and the files and workflow of any bioinformatics analysis. So much of the information of a sequencing project is digital, and we need to keep track of our digital records in the same way we have a lab notebook and sample freezer. In this lesson, we’ll go through the project organization and documentation that will make an efficient bioinformatics workflow possible. Not only will this make you a more effective bioinformatics researcher, it also prepares your data and project for publication, as grant agencies and publishers increasingly require this information. In this lesson, we’ll be using data from a study of experimental evolution using E. coli. More information about this dataset is available here. In this study there are several types of files: Spreadsheet data from the experiment that tracks the strains and their phenotype over time Spreadsheet data with information on the samples that were sequenced - the names of the samples, how they were prepared and the sequencing conditions The sequence data Throughout the analysis, we’ll also generate files from the steps in the bioinformatics pipeline and documentation on the tools and parameters that we used. In this lesson you will learn: How to structure your metadata, tabular data and information about the experiment. The metadata is the information about the experiment and the samples you’re sequencing. How to prepare for, understand, organize and store the sequencing data that comes back from the sequencing center How to access and download publicly available data that may need to be used in your bioinformatics analysis The concepts of organizing the files and documenting the workflow of your bioinformatics analysis

Subject:: Business and Communication; Genetics; Life Science; Management
Material Type:: Module
Provider:: The Carpentries
Author:: Amanda Charbonneau; Bérénice Batut; Daniel O. S. Ouso; Deborah Paul; Erin Alison Becker; François Michonneau; Jason Williams; Juan A. Ugalde; Kevin Weitemier; Laura Williams; Paula Andrea Martinez; Peter R. Hoyt; Rayna Michelle Harris; Taylor Reiter; Toby Hodges; Tracy Teal
Date Added:: 08/07/2020

More Less

Education Standards

Subject Area

Education Level

Material Type

License Types

Content Source

Primary User

Media Format

Educational Use

Language

Providers

6 Results

Search Resources

Education Standards

Subject Area

Education Level

Material Type

License Types

Content Source

Primary User

Media Format

Educational Use

Language

Providers

6 Results