Writing reproducible geoscience papers using R Markdown, Docker, and GitLab

(View Complete Item Description)

Reproducibility is unquestionably at the heart of science. Scientists face numerous challenges in this context, not least the lack of concepts, tools, and workflows for reproducible research in today's curricula.This short course introduces established and powerful tools that enable reproducibility of computational geoscientific research, statistical analyses, and visualisation of results using R (http://www.r-project.org/) in two lessons:1. Reproducible Research with R MarkdownOpen Data, Open Source, Open Reviews and Open Science are important aspects of science today. In the first lesson, basic motivations and concepts for reproducible research touching on these topics are briefly introduced. During a hands-on session the course participants write R Markdown (http://rmarkdown.rstudio.com/) documents, which include text and code and can be compiled to static documents (e.g. HTML, PDF).R Markdown is equally well suited for day-to-day digital notebooks as it is for scientific publications when using publisher templates.2. GitLab and DockerIn the second lesson, the R Markdown files are published and enriched on an online collaboration platform. Participants learn how to save and version documents using GitLab (http://gitlab.com/) and compile them using Docker containers (https://docker.com/). These containers capture the full computational environment and can be transported, executed, examined, shared and archived. Furthermore, GitLab's collaboration features are explored as an environment for Open Science.Prerequisites: Participants should install required software (R, RStudio, a current browser) and register on GitLab (https://gitlab.com) before the course.This short course is especially relevant for early career scientists (ECS).Participants are welcome to bring their own data and R scripts to work with during the course.All material by the conveners will be shared publicly via OSF (https://osf.io/qd9nf/).

Material Type: Activity/Lab

Authors: Daniel Nüst, Edzer Pebesma, Markus Konkol, Rémi Rampin, Vicky Steeves

Introduction to Git & GitHub

(View Complete Item Description)

This workshop introduces the basic concepts of Git version control. Whether you're new to version control or just need an explanation of Git and GitHub, this two hour tutorial will help you understand the concepts of distributed version control. Get to know basic Git concepts and GitHub workflows through step-by-step lessons. We'll even rewrite a bit of history, and touch on how to undo (almost) anything with Git. This is a class for users who are comfortable with a command-line interface.

Material Type: Activity/Lab

Author: Vicky Steeves

Writing a Data Management Plan for Grant Applications

(View Complete Item Description)

A class covering the basics of writing a successful data management plan for federal funding agencies such as the NEH, NSF, NIH, NASA, and others.

Material Type: Activity/Lab

Authors: Nick Wolf, Vicky Steeves

Research Project Management Using the Open Science Framework

(View Complete Item Description)

An introduction to managing, annotating, organizing, archiving, and publishing research data using the Open Science Framework.

Material Type: Activity/Lab

Authors: Nick Wolf, Vicky Steeves

Managing a Personal Research Archive

(View Complete Item Description)

A class on setting up and managing research materials; caring for digital files to enable collaboration, sharing, and re-use; and helpful software/digital tools for organizing personal research files.

Material Type: Activity/Lab

Authors: Nick Wolf, Vicky Steeves

Introduction to Jupyter Notebooks

(View Complete Item Description)

This class is designed for first-time and longer-term users of Jupyter Notebooks, a workspace for writing code. The class focuses on using Notebooks to facilitate sharing and publishing of script workflows. It aims to provide users with knowledge about shortcuts, plugins, and best practices for maximizing re-usability and shareability of Notebook contents.

Material Type: Activity/Lab

Authors: Nick Wolf, Vicky Steeves

Data Is Present: Open Workshops and Hackathons

(View Complete Item Description)

Original data has become more accessible thanks to cultural and technological advances. On the internet, we can find innumerable data sets from sources such as scientific journals and repositories, local and national governments, and non-governmental organisations. Often, these data may be presented in novel ways, by creating new tables or plots, or by integrating additional data. Free, open-source software has become a great companion for open data. This open scholarship project offers free workshops and coding meet-ups (hackathons) to learn and practise data presentation, across the UK. It is made possible by a fellowship of the Software Sustainability Institute.

Material Type: Activity/Lab

Author: Pablo Bernabeu

Python for Harvesting Data on the Web

(View Complete Item Description)

This session is an intermediate-to-advanced level class that offers some ideas for how to approach the following common data wrangling needs in research: 1) Obtain data and load it into a suitable data "container" for analysis, often via a web interface, especially an API, 2) parse the data retrieved via an API and turn it into a useful object for manipulation and analysis, and 3) perform some basic summary counts of records in a dataset and work up a quick visualization.

Material Type: Activity/Lab

Authors: Nick Wolf, Vicky Steeves

Introduction to Research Data Management

(View Complete Item Description)

An introduction to the concepts and best practices of research data management.

Material Type: Activity/Lab

Authors: Nick Wolf, Vicky Steeves

Data Cleaning and Management Using OpenRefine

(View Complete Item Description)

Course materials on using OpenRefine, a powerful tool for cleaning and transforming tabular data.

Material Type: Activity/Lab

Authors: Nick Wolf, Vicky Steeves

Finding & Evaluating Open Data

(View Complete Item Description)

Introduction to finding and evaluating Open Data by NYU DataServices.

Material Type: Activity/Lab

Author: Vicky Steeves

Data Management & Reproducibility

(View Complete Item Description)

Introduction to data management and reproducibility for researchers as a presentation.

Material Type: Lesson

Author: Vicky Steeves

Citing and Being Cited: Data & Code Edition

(View Complete Item Description)

Introduction to citations as a presentation. Citing data and code as well as getting citations for data and code.

Material Type: Lesson

Author: Vicky Steeves

The Role of Libraries in the Age of Computational Reproducibility

(View Complete Item Description)

A lighting talk at csv,conf,4 about how libraries and librarians are helping researchers with reproducibility.

Material Type: Lesson

Authors: Gabriele Hayden, Vicky Steeves

Qualitative Research Using Open Tools

(View Complete Item Description)

Qualitative research has long suffered from a lack of free tools for analysis, leaving no options for researchers without significant funds for software licenses. This presents significant challenges for equity. This panel discussion will explore the first two free/libre open source qualitative analysis tools out there: qcoder (R package) and Taguette (desktop application). Drawing from the diverse backgrounds of the presenters (social science, library & information science, software engineering), we will discuss what openness and extensibility means for qualitative research, and how the two tools we've built facilitate equitable, open sharing.

Material Type: Lesson

Authors: Beth M. Duckles, Vicky Steeves

Reproducibility, Preservation, and Access to Research with ReproZip and ReproServer

(View Complete Item Description)

The adoption of reproducibility remains low, despite incentives becoming increasingly common in different domains, conferences, and journals. The truth is, reproducibility is technically difficult to achieve due to the complexities of computational environments.To address these technical challenges, we created ReproZip, an open-source tool that packs research along with all the necessary information to reproduce it, including data files, software, OS version, and environment variables. Everything is then bundled into an .rpz file, which users can use to reproduce the work with ReproUnzip and an unpacker (Docker, Vagrant, and Singularity). The .rpz file is general and contains rich metadata: more unpackers can be added as needed, better guaranteeing long-term preservation.However, installing the unpackers can still be burdensome for secondary users of ReproZip bundles. In this paper, we will discuss how ReproZip and our new tool ReproServer can be used together to facilitate access to well-preserved, reproducible work. ReproServer is a cloud application that allows users to upload or provide a link to a ReproZip bundle, and then interact with/reproduce the contents from the comfort of their browser. Users are then provided a stable link to the unpacked work on ReproServer they can share with reviewers or colleagues.

Material Type: Activity/Lab

Authors: Fernando Chirigati, Rémi Rampin, Vicky Steeves

Five selfish reasons to work reproducibly

(View Complete Item Description)

And so, my fellow scientists: ask not what you can do for reproducibility; ask what reproducibility can do for you! Here, I present five reasons why working reproducibly pays off in the long run and is in the self-interest of every ambitious, career-oriented scientist.A complex equation on the left half of a black board, an even more complex equation on the right half. A short sentence links the two equations: “Here a miracle occurs”. Two mathematicians in deep thought. “I think you should be more explicit in this step”, says one to the other.This is exactly how it seems when you try to figure out how authors got from a large and complex data set to a dense paper with lots of busy figures. Without access to the data and the analysis code, a miracle occurred. And there should be no miracles in science.Working transparently and reproducibly has a lot to do with empathy: put yourself into the shoes of one of your collaboration partners and ask yourself, would that person be able to access my data and make sense of my analyses. Learning the tools of the trade (Box 1) will require commitment and a massive investment of your time and energy. A priori it is not clear why the benefits of working reproducibly outweigh its costs.Here are some reasons: because reproducibility is the right thing to do! Because it is the foundation of science! Because the world would be a better place if everyone worked transparently and reproducibly! You know how that reasoning sounds to me? Just like yaddah, yaddah, yaddah …It’s not that I think these reasons are wrong. It’s just that I am not much of an idealist; I don’t care how science should be. I am a realist; I try to do my best given how science actually is. And, whether you like it or not, science is all about more publications, more impact factor, more money and more career. More, more, more… so how does working reproducibly help me achieve more as a scientist.

Material Type: Reading

Author: Florian Markowetz

Software Carpentry

(View Complete Item Description)

Since 1998, Software Carpentry has been teaching researchers the computing skills they need to get more done in less time and with less pain. Our volunteer instructors have run hundreds of events for more than 34,000 researchers since 2012. All of our lesson materials are freely reusable under the Creative Commons - Attribution license.

Material Type: Full Course

Author: Software Carpentry Community

Data Carpentry

(View Complete Item Description)

Data Carpentry trains researchers in the core data skills for efficient, shareable, and reproducible research practices. We run accessible, inclusive training workshops; teach openly available, high-quality, domain-tailored lessons; and foster an active, inclusive, diverse instructor community that promotes and models reproducible research as a community norm.

Material Type: Full Course

Author: Data Carpentry Community

ReproducibiliTea

(View Complete Item Description)

Everything you need to know about this ECR-led journal club initiative that helps early career researchers create local Open Science groups that discuss issues, papers and ideas to do with improving science.

Material Type: Lesson

Researchers

All resources in Researchers

Writing reproducible geoscience papers using R Markdown, Docker, and GitLab

Introduction to Git & GitHub

Writing a Data Management Plan for Grant Applications

Research Project Management Using the Open Science Framework

Managing a Personal Research Archive

Introduction to Jupyter Notebooks

Data Is Present: Open Workshops and Hackathons

Python for Harvesting Data on the Web

Introduction to Research Data Management

Data Cleaning and Management Using OpenRefine

Finding & Evaluating Open Data

Data Management & Reproducibility

Citing and Being Cited: Data & Code Edition

The Role of Libraries in the Age of Computational Reproducibility

Qualitative Research Using Open Tools

Reproducibility, Preservation, and Access to Research with ReproZip and ReproServer

Five selfish reasons to work reproducibly

Software Carpentry

Data Carpentry

ReproducibiliTea