Introduction to citations as a presentation. Citing data and code as well as getting citations for data and code.
Copyright Law: Cases and Materials is a free copyright law textbook designed for a four-credit copyright course, which is what we teach at NYU School of Law. Model syllabi for four-credit and three-credit courses are available in the Faculty Resources section of this website. All faculty teaching copyright law are welcome to access the Faculty Resources, including the faculty discussion forum, by becoming a registered user of the site. To register, write us at email@example.com or firstname.lastname@example.org.
The textbook is made available under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License. Under the terms of this license, you are free to copy and redistribute the textbook in part or whole in any format provided that (1) you do so only for non-commercial purposes, and (2) you comply with the attribution principles of the license (credit the authors, and link to the license). Note please that this license does not permit you to make modifications to the textbook or to create derivative works. That said, there are a wide variety of derivatives that we would gladly permit. If you want to make modifications to the textbook, please contact us.
Course materials on using OpenRefine, a powerful tool for cleaning and transforming tabular data.
This workshop introduces the basic concepts of Git version control. Whether you're new to version control or just need an explanation of Git and GitHub, this two hour tutorial will help you understand the concepts of distributed version control. Get to know basic Git concepts and GitHub workflows through step-by-step lessons. We'll even rewrite a bit of history, and touch on how to undo (almost) anything with Git. This is a class for users who are comfortable with a command-line interface.
This class is designed for first-time and longer-term users of Jupyter Notebooks, a workspace for writing code. The class focuses on using Notebooks to facilitate sharing and publishing of script workflows. It aims to provide users with knowledge about shortcuts, plugins, and best practices for maximizing re-usability and shareability of Notebook contents.
Today we are going to learn the basics of literate programming using Jupyter Notebooks, a popular tool in data science, with the R kernel, so we can run R code in our notebooks. We’ll then take a look at how we use Git and GitHub to keep track of all the versions of our work, collaborate with others, and be open!
A class on setting up and managing research materials; caring for digital files to enable collaboration, sharing, and re-use; and helpful software/digital tools for organizing personal research files.
This session is an intermediate-to-advanced level class that offers some ideas for how to approach the following common data wrangling needs in research: 1) Obtain data and load it into a suitable data "container" for analysis, often via a web interface, especially an API, 2) parse the data retrieved via an API and turn it into a useful object for manipulation and analysis, and 3) perform some basic summary counts of records in a dataset and work up a quick visualization.
Qualitative research has long suffered from a lack of free tools for analysis, leaving no options for researchers without significant funds for software licenses. This presents significant challenges for equity. This panel discussion will explore the first two free/libre open source qualitative analysis tools out there: qcoder (R package) and Taguette (desktop application). Drawing from the diverse backgrounds of the presenters (social science, library & information science, software engineering), we will discuss what openness and extensibility means for qualitative research, and how the two tools we've built facilitate equitable, open sharing.
Various fields in the natural and social sciences face a ‘crisis of confidence’. Broadly, this crisis amounts to a pervasiveness of non-reproducible results in the published literature. For example, in the field of biomedicine, Amgen published findings that out of 53 landmark published results of pre-clinical studies, only 11% could be replicated successfully. This crisis is not confined to biomedicine. Areas that have recently received attention for non-reproducibility include biomedicine, economics, political science, psychology, as well as philosophy. Some scholars anticipate the expansion of this crisis to other disciplines.This course explores the state of reproducibility. After giving a brief historical perspective, case studies from different disciplines (biomedicine, psychology, and philosophy) are examined to understand the issues concretely. Subsequently, problems that lead to non-reproducibility are discussed as well as possible solutions and paths forward.
As research across domains of study has become increasingly reliant on digital tools (librarianship included), the challenges in reproducibility have grown. Alongside this reproducibility challenge are the demands for open scholarship, such as releasing code, data, and articles under an open license.Before, researchers out in the field used to capture their environments through observation, drawings, photographs, and videos; now, researchers and the librarians who work alongside them must capture digital environments and what they contain (e.g. code and data) to achieve reproducibility. Librarians are well-positioned to help patrons open their scholarship, and it’s time to build in reproducibility as a part of our services.Librarians are already engaged with research data management, open access publishing, grant compliance, pre-registration, and it’s time we as a profession add reproducibility to that repertoire. In this webinar, organised by LIBER’s Research Data Management Working Group, speaker Vicky Steeves discusses how she’s built services around reproducibility as a dual appointment between the Libraries and the Center for Data Science at New York University.
The adoption of reproducibility remains low, despite incentives becoming increasingly common in different domains, conferences, and journals. The truth is, reproducibility is technically difficult to achieve due to the complexities of computational environments.To address these technical challenges, we created ReproZip, an open-source tool that packs research along with all the necessary information to reproduce it, including data files, software, OS version, and environment variables. Everything is then bundled into an .rpz file, which users can use to reproduce the work with ReproUnzip and an unpacker (Docker, Vagrant, and Singularity). The .rpz file is general and contains rich metadata: more unpackers can be added as needed, better guaranteeing long-term preservation.However, installing the unpackers can still be burdensome for secondary users of ReproZip bundles. In this paper, we will discuss how ReproZip and our new tool ReproServer can be used together to facilitate access to well-preserved, reproducible work. ReproServer is a cloud application that allows users to upload or provide a link to a ReproZip bundle, and then interact with/reproduce the contents from the comfort of their browser. Users are then provided a stable link to the unpacked work on ReproServer they can share with reviewers or colleagues.
An introduction to managing, annotating, organizing, archiving, and publishing research data using the Open Science Framework.
From Data Carpentry: Data Carpentry’s aim is to teach researchers basic concepts, skills, and tools for working with data so that they can get more done in less time, and with less pain. The lessons below were designed for those interested in working with social sciences data in R.This is an introduction to R designed for participants with no programming experience. These lessons can be taught in a day (~ 6 hours). They start with some basic information about R syntax, the RStudio interface, and move through how to import CSV files, the structure of data frames, how to deal with factors, how to add/remove rows and columns, how to calculate summary statistics from a data frame, and a brief introduction to plotting.
A lighting talk at csv,conf,4 about how libraries and librarians are helping researchers with reproducibility.
Trademark Law: An Open-Source Casebook is a free, “open” textbook designed for a four-credit trademark course, which is what I teach at NYU School of Law. Model syllabi for four-credit and three-credit courses are available in the Faculty Resources section of this website.
A class covering the basics of writing a successful data management plan for federal funding agencies such as the NEH, NSF, NIH, NASA, and others.