Introduction to citations as a presentation. Citing data and code as well as getting citations for data and code.
Course materials on using OpenRefine, a powerful tool for cleaning and transforming tabular data.
This workshop introduces the basic concepts of Git version control. Whether you're new to version control or just need an explanation of Git and GitHub, this two hour tutorial will help you understand the concepts of distributed version control. Get to know basic Git concepts and GitHub workflows through step-by-step lessons. We'll even rewrite a bit of history, and touch on how to undo (almost) anything with Git. This is a class for users who are comfortable with a command-line interface.
This class is designed for first-time and longer-term users of Jupyter Notebooks, a workspace for writing code. The class focuses on using Notebooks to facilitate sharing and publishing of script workflows. It aims to provide users with knowledge about shortcuts, plugins, and best practices for maximizing re-usability and shareability of Notebook contents.
Today we are going to learn the basics of literate programming using Jupyter Notebooks, a popular tool in data science, with the R kernel, so we can run R code in our notebooks. We’ll then take a look at how we use Git and GitHub to keep track of all the versions of our work, collaborate with others, and be open!
A class on setting up and managing research materials; caring for digital files to enable collaboration, sharing, and re-use; and helpful software/digital tools for organizing personal research files.
This session is an intermediate-to-advanced level class that offers some ideas for how to approach the following common data wrangling needs in research: 1) Obtain data and load it into a suitable data "container" for analysis, often via a web interface, especially an API, 2) parse the data retrieved via an API and turn it into a useful object for manipulation and analysis, and 3) perform some basic summary counts of records in a dataset and work up a quick visualization.
Qualitative research has long suffered from a lack of free tools for analysis, leaving no options for researchers without significant funds for software licenses. This presents significant challenges for equity. This panel discussion will explore the first two free/libre open source qualitative analysis tools out there: qcoder (R package) and Taguette (desktop application). Drawing from the diverse backgrounds of the presenters (social science, library & information science, software engineering), we will discuss what openness and extensibility means for qualitative research, and how the two tools we've built facilitate equitable, open sharing.
Various fields in the natural and social sciences face a ‘crisis of confidence’. Broadly, this crisis amounts to a pervasiveness of non-reproducible results in the published literature. For example, in the field of biomedicine, Amgen published findings that out of 53 landmark published results of pre-clinical studies, only 11% could be replicated successfully. This crisis is not confined to biomedicine. Areas that have recently received attention for non-reproducibility include biomedicine, economics, political science, psychology, as well as philosophy. Some scholars anticipate the expansion of this crisis to other disciplines.This course explores the state of reproducibility. After giving a brief historical perspective, case studies from different disciplines (biomedicine, psychology, and philosophy) are examined to understand the issues concretely. Subsequently, problems that lead to non-reproducibility are discussed as well as possible solutions and paths forward.
As research across domains of study has become increasingly reliant on digital tools (librarianship included), the challenges in reproducibility have grown. Alongside this reproducibility challenge are the demands for open scholarship, such as releasing code, data, and articles under an open license.Before, researchers out in the field used to capture their environments through observation, drawings, photographs, and videos; now, researchers and the librarians who work alongside them must capture digital environments and what they contain (e.g. code and data) to achieve reproducibility. Librarians are well-positioned to help patrons open their scholarship, and it’s time to build in reproducibility as a part of our services.Librarians are already engaged with research data management, open access publishing, grant compliance, pre-registration, and it’s time we as a profession add reproducibility to that repertoire. In this webinar, organised by LIBER’s Research Data Management Working Group, speaker Vicky Steeves discusses how she’s built services around reproducibility as a dual appointment between the Libraries and the Center for Data Science at New York University.
The adoption of reproducibility remains low, despite incentives becoming increasingly common in different domains, conferences, and journals. The truth is, reproducibility is technically difficult to achieve due to the complexities of computational environments.To address these technical challenges, we created ReproZip, an open-source tool that packs research along with all the necessary information to reproduce it, including data files, software, OS version, and environment variables. Everything is then bundled into an .rpz file, which users can use to reproduce the work with ReproUnzip and an unpacker (Docker, Vagrant, and Singularity). The .rpz file is general and contains rich metadata: more unpackers can be added as needed, better guaranteeing long-term preservation.However, installing the unpackers can still be burdensome for secondary users of ReproZip bundles. In this paper, we will discuss how ReproZip and our new tool ReproServer can be used together to facilitate access to well-preserved, reproducible work. ReproServer is a cloud application that allows users to upload or provide a link to a ReproZip bundle, and then interact with/reproduce the contents from the comfort of their browser. Users are then provided a stable link to the unpacked work on ReproServer they can share with reviewers or colleagues.
Get some experience with some tools that help us work towards reproducibility with R, especially RMarkdown, knitr, and packrat. Goals for today: You will be able to bundle and unbundle things with packrat; You will be able to create RMarkdown files and knit them into PDF or HTML; You will know how to troubleshoot the inevitable errors you’ll get your first time doing these things.
An introduction to managing, annotating, organizing, archiving, and publishing research data using the Open Science Framework.
From Data Carpentry: Data Carpentry’s aim is to teach researchers basic concepts, skills, and tools for working with data so that they can get more done in less time, and with less pain. The lessons below were designed for those interested in working with social sciences data in R.This is an introduction to R designed for participants with no programming experience. These lessons can be taught in a day (~ 6 hours). They start with some basic information about R syntax, the RStudio interface, and move through how to import CSV files, the structure of data frames, how to deal with factors, how to add/remove rows and columns, how to calculate summary statistics from a data frame, and a brief introduction to plotting.
A lighting talk at csv,conf,4 about how libraries and librarians are helping researchers with reproducibility.
A class covering the basics of writing a successful data management plan for federal funding agencies such as the NEH, NSF, NIH, NASA, and others.
Reproducibility is unquestionably at the heart of science. Scientists face numerous challenges in this context, not least the lack of concepts, tools, and workflows for reproducible research in today's curricula.This short course introduces established and powerful tools that enable reproducibility of computational geoscientific research, statistical analyses, and visualisation of results using R (http://www.r-project.org/) in two lessons:1. Reproducible Research with R MarkdownOpen Data, Open Source, Open Reviews and Open Science are important aspects of science today. In the first lesson, basic motivations and concepts for reproducible research touching on these topics are briefly introduced. During a hands-on session the course participants write R Markdown (http://rmarkdown.rstudio.com/) documents, which include text and code and can be compiled to static documents (e.g. HTML, PDF).R Markdown is equally well suited for day-to-day digital notebooks as it is for scientific publications when using publisher templates.2. GitLab and DockerIn the second lesson, the R Markdown files are published and enriched on an online collaboration platform. Participants learn how to save and version documents using GitLab (http://gitlab.com/) and compile them using Docker containers (https://docker.com/). These containers capture the full computational environment and can be transported, executed, examined, shared and archived. Furthermore, GitLab's collaboration features are explored as an environment for Open Science.Prerequisites: Participants should install required software (R, RStudio, a current browser) and register on GitLab (https://gitlab.com) before the course.This short course is especially relevant for early career scientists (ECS).Participants are welcome to bring their own data and R scripts to work with during the course.All material by the conveners will be shared publicly via OSF (https://osf.io/qd9nf/).