Updating search results...

Reproducibility

Agreement of research results repeated. Reproducibility, replicability, repeatability, robustness, generalizability, organization, documentation, automation, dissemination, guidance, definitions, and more.

183 affiliated resources

Search Resources

View
Selected filters:
Data Analysis and Visualization with Python for Social Scientists
Unrestricted Use
CC BY
Rating
0.0 stars

Python is a general purpose programming language that is useful for writing scripts to work effectively and reproducibly with data. This is an introduction to Python designed for participants with no programming experience. These lessons can be taught in a day (~ 6 hours). They start with some basic information about Python syntax, the Jupyter notebook interface, and move through how to import CSV files, using the pandas package to work with data frames, how to calculate summary information from a data frame, and a brief introduction to plotting. The last lesson demonstrates how to work with databases directly from Python.

Subject:
Applied Science
Computer Science
Information Science
Mathematics
Measurement and Data
Material Type:
Module
Provider:
The Carpentries
Author:
Geoffrey Boushey
Stephen Childs
Date Added:
08/07/2020
Data Carpentry
Unrestricted Use
CC BY
Rating
0.0 stars

Data Carpentry trains researchers in the core data skills for efficient, shareable, and reproducible research practices. We run accessible, inclusive training workshops; teach openly available, high-quality, domain-tailored lessons; and foster an active, inclusive, diverse instructor community that promotes and models reproducible research as a community norm.

Subject:
Applied Science
Life Science
Physical Science
Social Science
Material Type:
Full Course
Provider:
Data Carpentry Community
Author:
Data Carpentry Community
Date Added:
06/18/2020
Data Carpentry for Biologists
Unrestricted Use
CC BY
Rating
0.0 stars

The Biology Semester-long Course was developed and piloted at the University of Florida in Fall 2015. Course materials include readings, lectures, exercises, and assignments that expand on the material presented at workshops focusing on SQL and R.

Subject:
Applied Science
Biology
Computer Science
Information Science
Life Science
Mathematics
Measurement and Data
Material Type:
Module
Provider:
The Carpentries
Author:
Ethan White
Zachary Brym
Date Added:
08/07/2020
Data Cleaning with OpenRefine for Ecologists
Unrestricted Use
CC BY
Rating
0.0 stars

A part of the data workflow is preparing the data for analysis. Some of this involves data cleaning, where errors in the data are identified and corrected or formatting made consistent. This step must be taken with the same care and attention to reproducibility as the analysis. OpenRefine (formerly Google Refine) is a powerful free and open source tool for working with messy data: cleaning it and transforming it from one format into another. This lesson will teach you to use OpenRefine to effectively clean and format data and automatically track any changes that you make. Many people comment that this tool saves them literally months of work trying to make these edits by hand.

Subject:
Applied Science
Computer Science
Information Science
Mathematics
Measurement and Data
Material Type:
Module
Provider:
The Carpentries
Author:
Cam Macdonell
Deborah Paul
Phillip Doehle
Rachel Lombardi
Date Added:
03/20/2017
Data Intro for Archivists
Unrestricted Use
CC BY
Rating
0.0 stars

This Library Carpentry lesson introduces archivists to working with data. At the conclusion of the lesson you will: be able to explain terms, phrases, and concepts in code or software development; identify and use best practice in data structures; use regular expressions in searches.

Subject:
Applied Science
Information Science
Mathematics
Measurement and Data
Material Type:
Module
Provider:
The Carpentries
Author:
James Baker
Jeanine Finn
Jenny Bunn
Katherine Koziar
Noah Geraci
Scott Peterson
Date Added:
08/07/2020
Data Management with SQL for Ecologists
Unrestricted Use
CC BY
Rating
0.0 stars

Databases are useful for both storing and using data effectively. Using a relational database serves several purposes. It keeps your data separate from your analysis. This means there’s no risk of accidentally changing data when you analyze it. If we get new data we can rerun a query to find all the data that meets certain criteria. It’s fast, even for large amounts of data. It improves quality control of data entry (type constraints and use of forms in Access, Filemaker, etc.) The concepts of relational database querying are core to understanding how to do similar things using programming languages such as R or Python. This lesson will teach you what relational databases are, how you can load data into them and how you can query databases to extract just the information that you need.

Subject:
Applied Science
Computer Science
Information Science
Mathematics
Measurement and Data
Material Type:
Module
Provider:
The Carpentries
Author:
Christina Koch
Donal Heidenblad
Katy Felkner
Rémi Rampin
Timothée Poisot
Date Added:
03/20/2017
Data Management with SQL for Social Scientists
Unrestricted Use
CC BY
Rating
0.0 stars

This is an alpha lesson to teach Data Management with SQL for Social Scientists, We welcome and criticism, or error; and will take your feedback into account to improve both the presentation and the content. Databases are useful for both storing and using data effectively. Using a relational database serves several purposes. It keeps your data separate from your analysis. This means there’s no risk of accidentally changing data when you analyze it. If we get new data we can rerun a query to find all the data that meets certain criteria. It’s fast, even for large amounts of data. It improves quality control of data entry (type constraints and use of forms in Access, Filemaker, etc.) The concepts of relational database querying are core to understanding how to do similar things using programming languages such as R or Python. This lesson will teach you what relational databases are, how you can load data into them and how you can query databases to extract just the information that you need.

Subject:
Applied Science
Computer Science
Information Science
Mathematics
Measurement and Data
Social Science
Material Type:
Module
Provider:
The Carpentries
Author:
Peter Smyth
Date Added:
08/07/2020
Data Organization in Spreadsheets for Ecologists
Unrestricted Use
CC BY
Rating
0.0 stars

Good data organization is the foundation of any research project. Most researchers have data in spreadsheets, so it’s the place that many research projects start. We organize data in spreadsheets in the ways that we as humans want to work with the data, but computers require that data be organized in particular ways. In order to use tools that make computation more efficient, such as programming languages like R or Python, we need to structure our data the way that computers need the data. Since this is where most research projects start, this is where we want to start too! In this lesson, you will learn: Good data entry practices - formatting data tables in spreadsheets How to avoid common formatting mistakes Approaches for handling dates in spreadsheets Basic quality control and data manipulation in spreadsheets Exporting data from spreadsheets In this lesson, however, you will not learn about data analysis with spreadsheets. Much of your time as a researcher will be spent in the initial ‘data wrangling’ stage, where you need to organize the data to perform a proper analysis later. It’s not the most fun, but it is necessary. In this lesson you will learn how to think about data organization and some practices for more effective data wrangling. With this approach you can better format current data and plan new data collection so less data wrangling is needed.

Subject:
Applied Science
Computer Science
Information Science
Mathematics
Measurement and Data
Material Type:
Module
Provider:
The Carpentries
Author:
Christie Bahlai
Peter R. Hoyt
Tracy Teal
Date Added:
03/20/2017
Data Organization in Spreadsheets for Social Scientists
Unrestricted Use
CC BY
Rating
0.0 stars

Lesson on spreadsheets for social scientists. Good data organization is the foundation of any research project. Most researchers have data in spreadsheets, so it’s the place that many research projects start. Typically we organize data in spreadsheets in ways that we as humans want to work with the data. However computers require data to be organized in particular ways. In order to use tools that make computation more efficient, such as programming languages like R or Python, we need to structure our data the way that computers need the data. Since this is where most research projects start, this is where we want to start too! In this lesson, you will learn: Good data entry practices - formatting data tables in spreadsheets How to avoid common formatting mistakes Approaches for handling dates in spreadsheets Basic quality control and data manipulation in spreadsheets Exporting data from spreadsheets In this lesson, however, you will not learn about data analysis with spreadsheets. Much of your time as a researcher will be spent in the initial ‘data wrangling’ stage, where you need to organize the data to perform a proper analysis later. It’s not the most fun, but it is necessary. In this lesson you will learn how to think about data organization and some practices for more effective data wrangling. With this approach you can better format current data and plan new data collection so less data wrangling is needed.

Subject:
Applied Science
Information Science
Mathematics
Measurement and Data
Social Science
Material Type:
Module
Provider:
The Carpentries
Author:
David Mawdsley
Erin Becker
François Michonneau
Karen Word
Lachlan Deer
Peter Smyth
Date Added:
08/07/2020
Data Wrangling and Processing for Genomics
Unrestricted Use
CC BY
Rating
0.0 stars

Data Carpentry lesson to learn how to use command-line tools to perform quality control, align reads to a reference genome, and identify and visualize between-sample variation. A lot of genomics analysis is done using command-line tools for three reasons: 1) you will often be working with a large number of files, and working through the command-line rather than through a graphical user interface (GUI) allows you to automate repetitive tasks, 2) you will often need more compute power than is available on your personal computer, and connecting to and interacting with remote computers requires a command-line interface, and 3) you will often need to customize your analyses, and command-line tools often enable more customization than the corresponding GUI tools (if in fact a GUI tool even exists). In a previous lesson, you learned how to use the bash shell to interact with your computer through a command line interface. In this lesson, you will be applying this new knowledge to carry out a common genomics workflow - identifying variants among sequencing samples taken from multiple individuals within a population. We will be starting with a set of sequenced reads (.fastq files), performing some quality control steps, aligning those reads to a reference genome, and ending by identifying and visualizing variations among these samples. As you progress through this lesson, keep in mind that, even if you aren’t going to be doing this same workflow in your research, you will be learning some very important lessons about using command-line bioinformatic tools. What you learn here will enable you to use a variety of bioinformatic tools with confidence and greatly enhance your research efficiency and productivity.

Subject:
Applied Science
Computer Science
Genetics
Information Science
Life Science
Mathematics
Measurement and Data
Material Type:
Module
Provider:
The Carpentries
Author:
Adam Thomas
Ahmed R. Hasan
Aniello Infante
Anita Schürch
Dev Paudel
Erin Alison Becker
Fotis Psomopoulos
François Michonneau
Gaius Augustus
Gregg TeHennepe
Jason Williams
Jessica Elizabeth Mizzi
Karen Cranston
Kari L Jordan
Kate Crosby
Kevin Weitemier
Lex Nederbragt
Luis Avila
Peter R. Hoyt
Rayna Michelle Harris
Ryan Peek
Sheldon John McKay
Sheldon McKay
Taylor Reiter
Tessa Pierce
Toby Hodges
Tracy Teal
Vasilis Lenis
Winni Kretzschmar
dbmarchant
Date Added:
08/07/2020
Data availability, reusability, and analytic reproducibility: evaluating the impact of a mandatory open data policy at the journal Cognition
Unrestricted Use
CC BY
Rating
0.0 stars

Access to data is a critical feature of an efficient, progressive and ultimately self-correcting scientific ecosystem. But the extent to which in-principle benefits of data sharing are realized in practice is unclear. Crucially, it is largely unknown whether published findings can be reproduced by repeating reported analyses upon shared data (‘analytic reproducibility’). To investigate this, we conducted an observational evaluation of a mandatory open data policy introduced at the journal Cognition. Interrupted time-series analyses indicated a substantial post-policy increase in data available statements (104/417, 25% pre-policy to 136/174, 78% post-policy), although not all data appeared reusable (23/104, 22% pre-policy to 85/136, 62%, post-policy). For 35 of the articles determined to have reusable data, we attempted to reproduce 1324 target values. Ultimately, 64 values could not be reproduced within a 10% margin of error. For 22 articles all target values were reproduced, but 11 of these required author assistance. For 13 articles at least one value could not be reproduced despite author assistance. Importantly, there were no clear indications that original conclusions were seriously impacted. Mandatory open data policies can increase the frequency and quality of data sharing. However, suboptimal data curation, unclear analysis specification and reporting errors can impede analytic reproducibility, undermining the utility of data sharing and the credibility of scientific findings.

Subject:
Applied Science
Information Science
Material Type:
Reading
Provider:
Royal Society Open Science
Author:
Alicia Hofelich Mohr
Bria Long
Elizabeth Clayton
Erica J. Yoon
George C. Banks
Gustav Nilsonne
Kyle MacDonald
Mallory C. Kidwell
Maya B. Mathur
Michael C. Frank
Michael Henry Tessler
Richie L. Lenne
Sara Altman
Tom E. Hardwicke
Date Added:
08/07/2020
Databases and SQL
Unrestricted Use
CC BY
Rating
0.0 stars

Software Carpentry lesson that teaches how to use databases and SQL In the late 1920s and early 1930s, William Dyer, Frank Pabodie, and Valentina Roerich led expeditions to the Pole of Inaccessibility in the South Pacific, and then onward to Antarctica. Two years ago, their expeditions were found in a storage locker at Miskatonic University. We have scanned and OCR the data they contain, and we now want to store that information in a way that will make search and analysis easy. Three common options for storage are text files, spreadsheets, and databases. Text files are easiest to create, and work well with version control, but then we would have to build search and analysis tools ourselves. Spreadsheets are good for doing simple analyses, but they don’t handle large or complex data sets well. Databases, however, include powerful tools for search and analysis, and can handle large, complex data sets. These lessons will show how to use a database to explore the expeditions’ data.

Subject:
Applied Science
Computer Science
Information Science
Mathematics
Measurement and Data
Material Type:
Module
Provider:
The Carpentries
Author:
Amy Brown
Andrew Boughton
Andrew Kubiak
Avishek Kumar
Ben Waugh
Bill Mills
Brian Ballsun-Stanton
Chris Tomlinson
Colleen Fallaw
Dan Michael Heggø
Daniel Suess
Dave Welch
David W Wright
Deborah Gertrude Digges
Donny Winston
Doug Latornell
Erin Alison Becker
Ethan Nelson
Ethan P White
François Michonneau
George Graham
Gerard Capes
Gideon Juve
Greg Wilson
Ioan Vancea
Jake Lever
James Mickley
John Blischak
JohnRMoreau@gmail.com
Jonah Duckles
Jonathan Guyer
Joshua Nahum
Kate Hertweck
Kevin Dyke
Louis Vernon
Luc Small
Luke William Johnston
Maneesha Sane
Mark Stacy
Matthew Collins
Matty Jones
Mike Jackson
Morgan Taschuk
Patrick McCann
Paula Andrea Martinez
Pauline Barmby
Piotr Banaszkiewicz
Raniere Silva
Ray Bell
Rayna Michelle Harris
Rémi Emonet
Rémi Rampin
Seda Arat
Sheldon John McKay
Sheldon McKay
Stephen Davison
Thomas Guignard
Trevor Bekolay
lorra
slimlime
Date Added:
03/20/2017
Data sharing in PLOS ONE: An analysis of Data Availability Statements
Unrestricted Use
CC BY
Rating
0.0 stars

A number of publishers and funders, including PLOS, have recently adopted policies requiring researchers to share the data underlying their results and publications. Such policies help increase the reproducibility of the published literature, as well as make a larger body of data available for reuse and re-analysis. In this study, we evaluate the extent to which authors have complied with this policy by analyzing Data Availability Statements from 47,593 papers published in PLOS ONE between March 2014 (when the policy went into effect) and May 2016. Our analysis shows that compliance with the policy has increased, with a significant decline over time in papers that did not include a Data Availability Statement. However, only about 20% of statements indicate that data are deposited in a repository, which the PLOS policy states is the preferred method. More commonly, authors state that their data are in the paper itself or in the supplemental information, though it is unclear whether these data meet the level of sharing required in the PLOS policy. These findings suggest that additional review of Data Availability Statements or more stringent policies may be needed to increase data sharing.

Subject:
Applied Science
Computer Science
Health, Medicine and Nursing
Information Science
Social Science
Material Type:
Reading
Provider:
PLOS ONE
Author:
Alicia Livinski
Christopher W. Belter
Douglas J. Joubert
Holly Thompson
Lisa M. Federer
Lissa N. Snyders
Ya-Ling Lu
Date Added:
08/07/2020
Discrepancies in the Registries of Diet vs Drug Trials
Unrestricted Use
CC BY
Rating
0.0 stars

This cross-sectional study examines discrepancies between registered protocols and subsequent publications for drug and diet trials whose findings were published in prominent clinical journals in the last decade. ClinicalTrials.gov was established in 2000 in response to the Food and Drug Administration Modernization Act of 1997, which called for registration of trials of investigational new drugs for serious diseases. Subsequently, the scope of ClinicalTrials.gov expanded to all interventional studies, including diet trials. Presently, prospective trial registration is required by the National Institutes of Health for grant funding and many clinical journals for publication.1 Registration may reduce risk of bias from selective reporting and post hoc changes in design and analysis.1,2 Although a study3 of trials with ethics approval in Finland in 2007 identified numerous discrepancies between registered protocols and subsequent publications, the consistency of diet trial registration and reporting has not been well explored.

Subject:
Applied Science
Health, Medicine and Nursing
Material Type:
Reading
Provider:
JAMA Network Open
Author:
Cara B. Ebbeling
David S. Ludwig
Steven B. Heymsfield
Date Added:
08/07/2020
Economics Lesson with Stata
Unrestricted Use
CC BY
Rating
0.0 stars

A Data Carpentry curriculum for Economics is being developed by Dr. Miklos Koren at Central European University. These materials are being piloted locally. Development for these lessons has been supported by a grant from the Sloan Foundation.

Subject:
Applied Science
Computer Science
Economics
Information Science
Mathematics
Measurement and Data
Social Science
Material Type:
Module
Provider:
The Carpentries
Author:
Andras Vereckei
Arieda Muço
Miklós Koren
Date Added:
08/07/2020
The Economics of Reproducibility in Preclinical Research
Unrestricted Use
CC BY
Rating
0.0 stars

Low reproducibility rates within life science research undermine cumulative knowledge production and contribute to both delays and costs of therapeutic drug development. An analysis of past studies indicates that the cumulative (total) prevalence of irreproducible preclinical research exceeds 50%, resulting in approximately US$28,000,000,000 (US$28B)/year spent on preclinical research that is not reproducible—in the United States alone. We outline a framework for solutions and a plan for long-term improvements in reproducibility rates that will help to accelerate the discovery of life-saving therapies and cures.

Subject:
Biology
Life Science
Material Type:
Reading
Provider:
PLOS Biology
Author:
Iain M. Cockburn
Leonard P. Freedman
Timothy S. Simcoe
Date Added:
08/07/2020
Effect of Population Heterogenization on the Reproducibility of Mouse Behavior: A Multi-Laboratory Study
Unrestricted Use
CC BY
Rating
0.0 stars

In animal experiments, animals, husbandry and test procedures are traditionally standardized to maximize test sensitivity and minimize animal use, assuming that this will also guarantee reproducibility. However, by reducing within-experiment variation, standardization may limit inference to the specific experimental conditions. Indeed, we have recently shown in mice that standardization may generate spurious results in behavioral tests, accounting for poor reproducibility, and that this can be avoided by population heterogenization through systematic variation of experimental conditions. Here, we examined whether a simple form of heterogenization effectively improves reproducibility of test results in a multi-laboratory situation. Each of six laboratories independently ordered 64 female mice of two inbred strains (C57BL/6NCrl, DBA/2NCrl) and examined them for strain differences in five commonly used behavioral tests under two different experimental designs. In the standardized design, experimental conditions were standardized as much as possible in each laboratory, while they were systematically varied with respect to the animals' test age and cage enrichment in the heterogenized design. Although heterogenization tended to improve reproducibility by increasing within-experiment variation relative to between-experiment variation, the effect was too weak to account for the large variation between laboratories. However, our findings confirm the potential of systematic heterogenization for improving reproducibility of animal experiments and highlight the need for effective and practicable heterogenization strategies.

Subject:
Applied Science
Health, Medicine and Nursing
Material Type:
Reading
Provider:
PLOS ONE
Author:
Benjamin Zipser
Berry Spruijt
Britta Schindler
Chadi Touma
Christiane Brandwein
David P. Wolfer
Hanno Würbel
Johanneke van der Harst
Joseph P. Garner
Lars Lewejohann
Niek van Stipdonk
Norbert Sachser
Peter Gass
S. Helene Richter
Sabine Chourbaji
Vootele Võikar
Date Added:
08/07/2020
El Control de Versiones con Git
Unrestricted Use
CC BY
Rating
0.0 stars

Software Carpentry lección para control de versiones con Git Para ilustrar el poder de Git y GitHub, usaremos la siguiente historia como un ejemplo motivador a través de esta lección. El Hombre Lobo y Drácula han sido contratados por Universal Missions para investigar si es posible enviar su próximo explorador planetario a Marte. Ellos quieren poder trabajar al mismo tiempo en los planes, pero ya han experimentado ciertos problemas anteriormente al hacer algo similar. Si se rotan por turnos entonces cada uno gastará mucho tiempo esperando a que el otro termine, pero si trabajan en sus propias copias e intercambian los cambios por email, las cosas se perderán, se sobreescribirán o se duplicarán. Un colega sugiere utilizar control de versiones para lidiar con el trabajo. El control de versiones es mejor que el intercambio de ficheros por email: Nada se pierde una vez que se incluye bajo control de versiones, a no ser que se haga un esfuerzo sustancial. Como se van guardando todas las versiones precedentes de los ficheros, siempre es posible volver atrás en el tiempo y ver exactamente quién escribió qué en un día en particular, o qué versión de un programa fue utilizada para generar un conjunto de resultados en particular. Como se tienen estos registros de quién hizo qué y en qué momento, es posible saber a quién preguntar si se tiene una pregunta en un momento posterior y, si es necesario, revertir el contenido a una versión anterior, de forma similar a como funciona el comando “deshacer” de los editores de texto. Cuando varias personas colaboran en el mismo proyecto, es posible pasar por alto o sobreescribir de manera accidental los cambios hechos por otra persona. El sistema de control de versiones notifica automáticamente a los usuarios cada vez que hay un conflicto entre el trabajo de una persona y la otra. Los equipos no son los únicos que se benefician del control de versiones: los investigadores independientes se pueden beneficiar en gran medida. Mantener un registro de qué ha cambiado, cuándo y por qué es extremadamente útil para todos los investigadores si alguna vez necesitan retomar el proyecto en un momento posterior (e.g. un año después, cuando se ha desvanecido el recuerdo de los detalles).

Subject:
Applied Science
Computer Science
Information Science
Mathematics
Measurement and Data
Material Type:
Module
Provider:
The Carpentries
Author:
Alejandra Gonzalez-Beltran
Amy Olex
Belinda Weaver
Bradford Condon
Casey Youngflesh
Daisie Huang
Dani Ledezma
Francisco Palm
Garrett Bachant
Heather Nunn
Hely Salgado
Ian Lee
Ivan Gonzalez
James E McClure
Javier Forment
Jimmy O'Donnell
Jonah Duckles
K.E. Koziar
Katherine Koziar
Katrin Leinweber
Kevin Alquicira
Kevin MF
Kurt Glaesemann
LauCIFASIS
Leticia Vega
Lex Nederbragt
Mark Woodbridge
Matias Andina
Matt Critchlow
Mingsheng Zhang
Nelly Sélem
Nima Hejazi
Nohemi Huanca Nunez
Olemis Lang
P. L. Lim
Paula Andrea Martinez
Peace Ossom Williamson
Rayna M Harris
Romualdo Zayas-Lagunas
Sarah Stevens
Saskia Hiltemann
Shirley Alquicira
Silvana Pereyra
Tom Morrell
Valentina Bonetti
Veronica Ikeshoji-Orlati
Veronica Jimenez
butterflyskip
dounia
Date Added:
08/07/2020
Enhancing Reproducibility through Rigor and Transparency | grants.nih.gov
Read the Fine Print
Rating
0.0 stars

The information provided on this website is designed to assist the extramural community in addressing rigor and transparency in NIH grant applications and progress reports. Scientific rigor and transparency in conducting biomedical research is key to the successful application of knowledge toward improving health outcomes.

Definition Scientific rigor is the strict application of the scientific method to ensure unbiased and well-controlled experimental design, methodology, analysis, interpretation and reporting of results.

Goals The NIH strives to exemplify and promote the highest level of scientific integrity, public accountability, and social responsibility in the conduct of science. Grant applications instructions and the criteria by which reviewers are asked to evaluate the scientific merit of the application are intended to:
• ensure that NIH is funding the best and most rigorous science,
• highlight the need for applicants to describe details that may have been previously overlooked,
• highlight the need for reviewers to consider such details in their reviews through updated review language, and
• minimize additional burden.

Subject:
Applied Science
Health, Medicine and Nursing
Material Type:
Reading
Author:
NIH
Date Added:
08/07/2020