Updating search results...

Search Resources

92 Results

View
Selected filters:
  • data analysis
Data Organization in Spreadsheets for Ecologists
Unrestricted Use
CC BY
Rating
0.0 stars

Good data organization is the foundation of any research project. Most researchers have data in spreadsheets, so it’s the place that many research projects start. We organize data in spreadsheets in the ways that we as humans want to work with the data, but computers require that data be organized in particular ways. In order to use tools that make computation more efficient, such as programming languages like R or Python, we need to structure our data the way that computers need the data. Since this is where most research projects start, this is where we want to start too! In this lesson, you will learn: Good data entry practices - formatting data tables in spreadsheets How to avoid common formatting mistakes Approaches for handling dates in spreadsheets Basic quality control and data manipulation in spreadsheets Exporting data from spreadsheets In this lesson, however, you will not learn about data analysis with spreadsheets. Much of your time as a researcher will be spent in the initial ‘data wrangling’ stage, where you need to organize the data to perform a proper analysis later. It’s not the most fun, but it is necessary. In this lesson you will learn how to think about data organization and some practices for more effective data wrangling. With this approach you can better format current data and plan new data collection so less data wrangling is needed.

Subject:
Applied Science
Computer Science
Information Science
Mathematics
Measurement and Data
Material Type:
Module
Provider:
The Carpentries
Author:
Christie Bahlai
Peter R. Hoyt
Tracy Teal
Date Added:
03/20/2017
Data Organization in Spreadsheets for Social Scientists
Unrestricted Use
CC BY
Rating
0.0 stars

Lesson on spreadsheets for social scientists. Good data organization is the foundation of any research project. Most researchers have data in spreadsheets, so it’s the place that many research projects start. Typically we organize data in spreadsheets in ways that we as humans want to work with the data. However computers require data to be organized in particular ways. In order to use tools that make computation more efficient, such as programming languages like R or Python, we need to structure our data the way that computers need the data. Since this is where most research projects start, this is where we want to start too! In this lesson, you will learn: Good data entry practices - formatting data tables in spreadsheets How to avoid common formatting mistakes Approaches for handling dates in spreadsheets Basic quality control and data manipulation in spreadsheets Exporting data from spreadsheets In this lesson, however, you will not learn about data analysis with spreadsheets. Much of your time as a researcher will be spent in the initial ‘data wrangling’ stage, where you need to organize the data to perform a proper analysis later. It’s not the most fun, but it is necessary. In this lesson you will learn how to think about data organization and some practices for more effective data wrangling. With this approach you can better format current data and plan new data collection so less data wrangling is needed.

Subject:
Applied Science
Information Science
Mathematics
Measurement and Data
Social Science
Material Type:
Module
Provider:
The Carpentries
Author:
David Mawdsley
Erin Becker
François Michonneau
Karen Word
Lachlan Deer
Peter Smyth
Date Added:
08/07/2020
Data Wrangling and Processing for Genomics
Unrestricted Use
CC BY
Rating
0.0 stars

Data Carpentry lesson to learn how to use command-line tools to perform quality control, align reads to a reference genome, and identify and visualize between-sample variation. A lot of genomics analysis is done using command-line tools for three reasons: 1) you will often be working with a large number of files, and working through the command-line rather than through a graphical user interface (GUI) allows you to automate repetitive tasks, 2) you will often need more compute power than is available on your personal computer, and connecting to and interacting with remote computers requires a command-line interface, and 3) you will often need to customize your analyses, and command-line tools often enable more customization than the corresponding GUI tools (if in fact a GUI tool even exists). In a previous lesson, you learned how to use the bash shell to interact with your computer through a command line interface. In this lesson, you will be applying this new knowledge to carry out a common genomics workflow - identifying variants among sequencing samples taken from multiple individuals within a population. We will be starting with a set of sequenced reads (.fastq files), performing some quality control steps, aligning those reads to a reference genome, and ending by identifying and visualizing variations among these samples. As you progress through this lesson, keep in mind that, even if you aren’t going to be doing this same workflow in your research, you will be learning some very important lessons about using command-line bioinformatic tools. What you learn here will enable you to use a variety of bioinformatic tools with confidence and greatly enhance your research efficiency and productivity.

Subject:
Applied Science
Computer Science
Genetics
Information Science
Life Science
Mathematics
Measurement and Data
Material Type:
Module
Provider:
The Carpentries
Author:
Adam Thomas
Ahmed R. Hasan
Aniello Infante
Anita Schürch
Dev Paudel
Erin Alison Becker
Fotis Psomopoulos
François Michonneau
Gaius Augustus
Gregg TeHennepe
Jason Williams
Jessica Elizabeth Mizzi
Karen Cranston
Kari L Jordan
Kate Crosby
Kevin Weitemier
Lex Nederbragt
Luis Avila
Peter R. Hoyt
Rayna Michelle Harris
Ryan Peek
Sheldon John McKay
Sheldon McKay
Taylor Reiter
Tessa Pierce
Toby Hodges
Tracy Teal
Vasilis Lenis
Winni Kretzschmar
dbmarchant
Date Added:
08/07/2020
Databases and SQL
Unrestricted Use
CC BY
Rating
0.0 stars

Software Carpentry lesson that teaches how to use databases and SQL In the late 1920s and early 1930s, William Dyer, Frank Pabodie, and Valentina Roerich led expeditions to the Pole of Inaccessibility in the South Pacific, and then onward to Antarctica. Two years ago, their expeditions were found in a storage locker at Miskatonic University. We have scanned and OCR the data they contain, and we now want to store that information in a way that will make search and analysis easy. Three common options for storage are text files, spreadsheets, and databases. Text files are easiest to create, and work well with version control, but then we would have to build search and analysis tools ourselves. Spreadsheets are good for doing simple analyses, but they don’t handle large or complex data sets well. Databases, however, include powerful tools for search and analysis, and can handle large, complex data sets. These lessons will show how to use a database to explore the expeditions’ data.

Subject:
Applied Science
Computer Science
Information Science
Mathematics
Measurement and Data
Material Type:
Module
Provider:
The Carpentries
Author:
Amy Brown
Andrew Boughton
Andrew Kubiak
Avishek Kumar
Ben Waugh
Bill Mills
Brian Ballsun-Stanton
Chris Tomlinson
Colleen Fallaw
Dan Michael Heggø
Daniel Suess
Dave Welch
David W Wright
Deborah Gertrude Digges
Donny Winston
Doug Latornell
Erin Alison Becker
Ethan Nelson
Ethan P White
François Michonneau
George Graham
Gerard Capes
Gideon Juve
Greg Wilson
Ioan Vancea
Jake Lever
James Mickley
John Blischak
JohnRMoreau@gmail.com
Jonah Duckles
Jonathan Guyer
Joshua Nahum
Kate Hertweck
Kevin Dyke
Louis Vernon
Luc Small
Luke William Johnston
Maneesha Sane
Mark Stacy
Matthew Collins
Matty Jones
Mike Jackson
Morgan Taschuk
Patrick McCann
Paula Andrea Martinez
Pauline Barmby
Piotr Banaszkiewicz
Raniere Silva
Ray Bell
Rayna Michelle Harris
Rémi Emonet
Rémi Rampin
Seda Arat
Sheldon John McKay
Sheldon McKay
Stephen Davison
Thomas Guignard
Trevor Bekolay
lorra
slimlime
Date Added:
03/20/2017
Did awarding badges increase data sharing in BMJ Open? A randomized controlled trial
Unrestricted Use
CC BY
Rating
0.0 stars

Sharing data and code are important components of reproducible research. Data sharing in research is widely discussed in the literature; however, there are no well-established evidence-based incentives that reward data sharing, nor randomized studies that demonstrate the effectiveness of data sharing policies at increasing data sharing. A simple incentive, such as an Open Data Badge, might provide the change needed to increase data sharing in health and medical research. This study was a parallel group randomized controlled trial (protocol registration: doi:10.17605/OSF.IO/PXWZQ) with two groups, control and intervention, with 80 research articles published in BMJ Open per group, with a total of 160 research articles. The intervention group received an email offer for an Open Data Badge if they shared their data along with their final publication and the control group received an email with no offer of a badge if they shared their data with their final publication. The primary outcome was the data sharing rate. Badges did not noticeably motivate researchers who published in BMJ Open to share their data; the odds of awarding badges were nearly equal in the intervention and control groups (odds ratio = 0.9, 95% CI [0.1, 9.0]). Data sharing rates were low in both groups, with just two datasets shared in each of the intervention and control groups. The global movement towards open science has made significant gains with the development of numerous data sharing policies and tools. What remains to be established is an effective incentive that motivates researchers to take up such tools to share their data.

Subject:
Applied Science
Information Science
Material Type:
Reading
Provider:
Royal Society Open Science
Author:
Adrian Aldcroft
Adrian G. Barnett
Anisa Rowhani-Farid
Date Added:
08/07/2020
Economics Lesson with Stata
Unrestricted Use
CC BY
Rating
0.0 stars

A Data Carpentry curriculum for Economics is being developed by Dr. Miklos Koren at Central European University. These materials are being piloted locally. Development for these lessons has been supported by a grant from the Sloan Foundation.

Subject:
Applied Science
Computer Science
Economics
Information Science
Mathematics
Measurement and Data
Social Science
Material Type:
Module
Provider:
The Carpentries
Author:
Andras Vereckei
Arieda Muço
Miklós Koren
Date Added:
08/07/2020
El Control de Versiones con Git
Unrestricted Use
CC BY
Rating
0.0 stars

Software Carpentry lección para control de versiones con Git Para ilustrar el poder de Git y GitHub, usaremos la siguiente historia como un ejemplo motivador a través de esta lección. El Hombre Lobo y Drácula han sido contratados por Universal Missions para investigar si es posible enviar su próximo explorador planetario a Marte. Ellos quieren poder trabajar al mismo tiempo en los planes, pero ya han experimentado ciertos problemas anteriormente al hacer algo similar. Si se rotan por turnos entonces cada uno gastará mucho tiempo esperando a que el otro termine, pero si trabajan en sus propias copias e intercambian los cambios por email, las cosas se perderán, se sobreescribirán o se duplicarán. Un colega sugiere utilizar control de versiones para lidiar con el trabajo. El control de versiones es mejor que el intercambio de ficheros por email: Nada se pierde una vez que se incluye bajo control de versiones, a no ser que se haga un esfuerzo sustancial. Como se van guardando todas las versiones precedentes de los ficheros, siempre es posible volver atrás en el tiempo y ver exactamente quién escribió qué en un día en particular, o qué versión de un programa fue utilizada para generar un conjunto de resultados en particular. Como se tienen estos registros de quién hizo qué y en qué momento, es posible saber a quién preguntar si se tiene una pregunta en un momento posterior y, si es necesario, revertir el contenido a una versión anterior, de forma similar a como funciona el comando “deshacer” de los editores de texto. Cuando varias personas colaboran en el mismo proyecto, es posible pasar por alto o sobreescribir de manera accidental los cambios hechos por otra persona. El sistema de control de versiones notifica automáticamente a los usuarios cada vez que hay un conflicto entre el trabajo de una persona y la otra. Los equipos no son los únicos que se benefician del control de versiones: los investigadores independientes se pueden beneficiar en gran medida. Mantener un registro de qué ha cambiado, cuándo y por qué es extremadamente útil para todos los investigadores si alguna vez necesitan retomar el proyecto en un momento posterior (e.g. un año después, cuando se ha desvanecido el recuerdo de los detalles).

Subject:
Applied Science
Computer Science
Information Science
Mathematics
Measurement and Data
Material Type:
Module
Provider:
The Carpentries
Author:
Alejandra Gonzalez-Beltran
Amy Olex
Belinda Weaver
Bradford Condon
Casey Youngflesh
Daisie Huang
Dani Ledezma
Francisco Palm
Garrett Bachant
Heather Nunn
Hely Salgado
Ian Lee
Ivan Gonzalez
James E McClure
Javier Forment
Jimmy O'Donnell
Jonah Duckles
K.E. Koziar
Katherine Koziar
Katrin Leinweber
Kevin Alquicira
Kevin MF
Kurt Glaesemann
LauCIFASIS
Leticia Vega
Lex Nederbragt
Mark Woodbridge
Matias Andina
Matt Critchlow
Mingsheng Zhang
Nelly Sélem
Nima Hejazi
Nohemi Huanca Nunez
Olemis Lang
P. L. Lim
Paula Andrea Martinez
Peace Ossom Williamson
Rayna M Harris
Romualdo Zayas-Lagunas
Sarah Stevens
Saskia Hiltemann
Shirley Alquicira
Silvana Pereyra
Tom Morrell
Valentina Bonetti
Veronica Ikeshoji-Orlati
Veronica Jimenez
butterflyskip
dounia
Date Added:
08/07/2020
Empirical assessment of published effect sizes and power in the recent cognitive neuroscience and psychology literature
Unrestricted Use
CC BY
Rating
0.0 stars

We have empirically assessed the distribution of published effect sizes and estimated power by analyzing 26,841 statistical records from 3,801 cognitive neuroscience and psychology papers published recently. The reported median effect size was D = 0.93 (interquartile range: 0.64–1.46) for nominally statistically significant results and D = 0.24 (0.11–0.42) for nonsignificant results. Median power to detect small, medium, and large effects was 0.12, 0.44, and 0.73, reflecting no improvement through the past half-century. This is so because sample sizes have remained small. Assuming similar true effect sizes in both disciplines, power was lower in cognitive neuroscience than in psychology. Journal impact factors negatively correlated with power. Assuming a realistic range of prior probabilities for null hypotheses, false report probability is likely to exceed 50% for the whole literature. In light of our findings, the recently reported low replication success in psychology is realistic, and worse performance may be expected for cognitive neuroscience.

Subject:
Psychology
Social Science
Material Type:
Reading
Provider:
PLOS Biology
Author:
Denes Szucs
John P. A. Ioannidis
Date Added:
08/07/2020
Foster Open Science
Unrestricted Use
CC BY
Rating
0.0 stars

The FOSTER portal is an e-learning platform that brings together the best training resources addressed to those who need to know more about Open Science, or need to develop strategies and skills for implementing Open Science practices in their daily workflows. Here you will find a growing collection of training materials. Many different users - from early-career researchers, to data managers, librarians, research administrators, and graduate schools - can benefit from the portal. In order to meet their needs, the existing materials will be extended from basic to more advanced-level resources. In addition, discipline-specific resources will be created.

Subject:
Applied Science
Life Science
Physical Science
Social Science
Material Type:
Full Course
Provider:
FOSTER Open Science
Author:
FOSTER Open Science
Date Added:
08/07/2020
Genomics Workshop Overview
Unrestricted Use
CC BY
Rating
0.0 stars

Workshop overview for the Data Carpentry genomics curriculum. Data Carpentry’s aim is to teach researchers basic concepts, skills, and tools for working with data so that they can get more done in less time, and with less pain. This workshop teaches data management and analysis for genomics research including: best practices for organization of bioinformatics projects and data, use of command-line utilities, use of command-line tools to analyze sequence quality and perform variant calling, and connecting to and using cloud computing. This workshop is designed to be taught over two full days of instruction. Please note that workshop materials for working with Genomics data in R are in “alpha” development. These lessons are available for review and for informal teaching experiences, but are not yet part of The Carpentries’ official lesson offerings. Interested in teaching these materials? We have an onboarding video and accompanying slides available to prepare Instructors to teach these lessons. After watching this video, please contact team@carpentries.org so that we can record your status as an onboarded Instructor. Instructors who have completed onboarding will be given priority status for teaching at centrally-organized Data Carpentry Genomics workshops.

Subject:
Applied Science
Computer Science
Genetics
Information Science
Life Science
Mathematics
Measurement and Data
Material Type:
Module
Provider:
The Carpentries
Author:
Amanda Charbonneau
Erin Alison Becker
François Michonneau
Jason Williams
Maneesha Sane
Matthew Kweskin
Muhammad Zohaib Anwar
Murray Cadzow
Paula Andrea Martinez
Taylor Reiter
Tracy Teal
Date Added:
08/07/2020
Geospatial Workshop Overview
Unrestricted Use
CC BY
Rating
0.0 stars

Data Carpentry’s aim is to teach researchers basic concepts, skills, and tools for working with data so that they can get more done in less time, and with less pain. Interested in teaching these materials? We have an onboarding video available to prepare Instructors to teach these lessons. After watching this video, please contact team@carpentries.org so that we can record your status as an onboarded Instructor. Instructors who have completed onboarding will be given priority status for teaching at centrally-organized Data Carpentry Geospatial workshops.

Subject:
Applied Science
Geology
Information Science
Mathematics
Measurement and Data
Physical Geography
Physical Science
Social Science
Material Type:
Module
Provider:
The Carpentries
Author:
Anne Fouilloux
Arthur Endsley
Chris Prener
Jeff Hollister
Joseph Stachelek
Leah Wasser
Michael Sumner
Michele Tobias
Stace Maples
Date Added:
08/07/2020
Gravity and Bottle Flipping
Conditional Remix & Share Permitted
CC BY-NC-SA
Rating
0.0 stars

This activity is designed to support a variety of STEM concepts: scientific method, making predictions, gathering and analyzing data, and developing conclusions based on experimentation. This activity draws on active student engagement, and is useful in many STEM content areas.

Subject:
Mathematics
Measurement and Data
Physical Science
Physics
Material Type:
Activity/Lab
Date Added:
09/26/2018
Gravity and Bottle Flipping
Conditional Remix & Share Permitted
CC BY-NC-SA
Rating
0.0 stars

This activity is designed to support a variety of STEM concepts: scientific method, making predictions, gathering and analyzing data, and developing conclusions based on experimentation. This activity draws on active student engagement, and is useful in many STEM content areas.

Subject:
Mathematics
Measurement and Data
Physical Science
Physics
Material Type:
Activity/Lab
Date Added:
04/30/2021
Gravity and Bottle Flipping
Conditional Remix & Share Permitted
CC BY-NC-SA
Rating
0.0 stars

This activity is designed to support a variety of STEM concepts: scientific method, making predictions, gathering and analyzing data, and developing conclusions based on experimentation. This activity draws on active student engagement, and is useful in many STEM content areas.

Subject:
Mathematics
Measurement and Data
Physical Science
Physics
Material Type:
Activity/Lab
Date Added:
05/07/2021
How to Use OSF as an Electronic Lab Notebook
Unrestricted Use
CC BY
Rating
0.0 stars

This webinar outlines how to use the free Open Science Framework (OSF) as an Electronic Lab Notebook for personal work or private collaborations. Fundamental features we cover include how to record daily activity, how to store images or arbitrary data files, how to invite collaborators, how to view old versions of files, and how to connect all this usage to more complex structures that support the full work of a lab across multiple projects and experiments.

Subject:
Applied Science
Computer Science
Information Science
Material Type:
Lecture
Provider:
Center for Open Science
Author:
Center for Open Science
Date Added:
08/07/2020
Image Processing with Python
Unrestricted Use
CC BY
Rating
0.0 stars

This lesson shows how to use Python and skimage to do basic image processing. With support from an NSF iUSE grant, Dr. Tessa Durham Brooks and Dr. Mark Meysenburg at Doane College, Nebraska, USA have developed a curriculum for teaching image processing in Python. This lesson is currently being piloted at different institutions. This pilot phase will be followed by a clean-up phase to incorporate suggestions and feedback from the pilots into the lessons and to make the lessons teachable by the broader community. Development for these lessons has been supported by a grant from the Sloan Foundation.

Subject:
Applied Science
Computer Science
Information Science
Mathematics
Measurement and Data
Material Type:
Module
Provider:
The Carpentries
Author:
Mark Meysenberg
Date Added:
08/07/2020
Introduction to Cloud Computing for Genomics
Unrestricted Use
CC BY
Rating
0.0 stars

Data Carpentry lesson to learn how to work with Amazon AWS cloud computing and how to transfer data between your local computer and cloud resources. The cloud is a fancy name for the huge network of computers that host your favorite websites, stream movies, and shop online, but you can also harness all of that computing power for running analyses that would take days, weeks or even years on your local computer. In this lesson, you’ll learn about renting cloud services that fit your analytic needs, and how to interact with one of those services (AWS) via the command line.

Subject:
Applied Science
Computer Science
Information Science
Mathematics
Measurement and Data
Material Type:
Module
Provider:
The Carpentries
Author:
Abigail Cabunoc Mayes
Adina Howe
Amanda Charbonneau
Bob Freeman
Brittany N. Lasseigne, PhD
Bérénice Batut
Caryn Johansen
Chris Fields
Darya Vanichkina
David Mawdsley
Erin Becker
François Michonneau
Greg Wilson
Jason Williams
Joseph Stachelek
Kari L. Jordan, PhD
Katrin Leinweber
Maxim Belkin
Michael R. Crusoe
Piotr Banaszkiewicz
Raniere Silva
Renato Alves
Rémi Emonet
Stephen Turner
Taylor Reiter
Thomas Morrell
Tracy Teal
William L. Close
ammatsun
vuw-ecs-kevin
Date Added:
03/28/2017
Introduction to Geospatial Concepts
Unrestricted Use
CC BY
Rating
0.0 stars

Data Carpentry lesson to understand data structures and common storage and transfer formats for spatial data. The goal of this lesson is to provide an introduction to core geospatial data concepts. It is intended for learners who have no prior experience working with geospatial data, and as a pre-requisite for the R for Raster and Vector Data lesson . This lesson can be taught in approximately 75 minutes and covers the following topics: Introduction to raster and vector data format and attributes Examples of data types commonly stored in raster vs vector format Introduction to categorical vs continuous raster data and multi-layer rasters Introduction to the file types and R packages used in the remainder of this workshop Introduction to coordinate reference systems and the PROJ4 format Overview of commonly used programs and applications for working with geospatial data The Introduction to R for Geospatial Data lesson provides an introduction to the R programming language while the R for Raster and Vector Data lesson provides a more in-depth introduction to visualization (focusing on geospatial data), and working with data structures unique to geospatial data. The R for Raster and Vector Data lesson assumes that learners are already familiar with both geospatial data concepts and the core concepts of the R language.

Subject:
Applied Science
Computer Science
Information Science
Mathematics
Measurement and Data
Material Type:
Module
Provider:
The Carpentries
Author:
Anne Fouilloux
Chris Prener
Dev Paudel
Ethan P White
Joseph Stachelek
Katrin Leinweber
Lauren O'Brien
Michael Koontz
Paul Miller
Tracy Teal
Whalen
Date Added:
08/07/2020
Introduction to Geospatial Raster and Vector Data with R
Unrestricted Use
CC BY
Rating
0.0 stars

Data Carpentry lesson to open, work with, and plot vector and raster-format spatial data in R. The episodes in this lesson cover how to open, work with, and plot vector and raster-format spatial data in R. Additional topics include working with spatial metadata (extent and coordinate reference systems), reprojecting spatial data, and working with raster time series data.

Subject:
Applied Science
Computer Science
Information Science
Mathematics
Measurement and Data
Material Type:
Module
Provider:
The Carpentries
Author:
Ana Costa Conrado
Angela Li
Anne Fouilloux
Brett Lord-Castillo
Ethan P White
Joseph Stachelek
Juan F Fung
Katrin Leinweber
Klaus Schliep
Kristina Riemer
Lachlan Deer
Lauren O'Brien
Marchand
Punam Amratia
Sergio Marconi
Stéphane Guillou
Tracy Teal
zenobieg
Date Added:
08/07/2020
Introduction to Preprints
Unrestricted Use
CC BY
Rating
0.0 stars

This is a recording of a 45 minute introductory webinar on preprints. With our guest speaker Philip Cohen, we’ll cover what preprints/postprints are, the benefits of preprints, and address some common concerns researcher may have. We’ll show how to determine whether you can post preprints/postprints, and also demonstrate how to use OSF preprints (https://osf.io/preprints/) to share preprints. The OSF is the flagship product of the Center for Open Science, a non-profit technology start-up dedicated to improving the alignment between scientific values and scientific practices. Learn more at cos.io and osf.io, or email contact@cos.io.

Subject:
Applied Science
Computer Science
Information Science
Material Type:
Lecture
Provider:
Center for Open Science
Author:
Center for Open Science
Date Added:
08/07/2020