When first sharing research data, researchers often raise questions about the value, benefits, and mechanisms for sharing. Many stakeholders and interested parties, such as funding agencies, communities, other researchers, or members of the public may be interested in research, results and related data. This lesson addresses data sharing in the context of the data life cycle, the value of sharing data, concerns about sharing data, and methods and best practices for sharing data.
Some research funders have a mandate for data resulting from their funded research to be shared. This presentation provides a general definition of data sharing and how scholars can identify and follow data sharing mandates.
Background. Attribution to the original contributor upon reuse of published data is important both as a reward for data creators and to document the provenance of research findings. Previous studies have found that papers with publicly available datasets receive a higher number of citations than similar studies without available data. However, few previous analyses have had the statistical power to control for the many variables known to predict citation rate, which has led to uncertain estimates of the “citation benefit”. Furthermore, little is known about patterns in data reuse over time and across datasets. Method and Results. Here, we look at citation rates while controlling for many known citation predictors and investigate the variability of data reuse. In a multivariate regression on 10,555 studies that created gene expression microarray data, we found that studies that made data available in a public repository received 9% (95% confidence interval: 5% to 13%) more citations than similar studies for which the data was not made available. Date of publication, journal impact factor, open access status, number of authors, first and last author publication history, corresponding author country, institution citation history, and study topic were included as covariates. The citation benefit varied with date of dataset deposition: a citation benefit was most clear for papers published in 2004 and 2005, at about 30%. Authors published most papers using their own datasets within two years of their first publication on the dataset, whereas data reuse papers published by third-party investigators continued to accumulate for at least six years. To study patterns of data reuse directly, we compiled 9,724 instances of third party data reuse via mention of GEO or ArrayExpress accession numbers in the full text of papers. The level of third-party data use was high: for 100 datasets deposited in year 0, we estimated that 40 papers in PubMed reused a dataset by year 2, 100 by year 4, and more than 150 data reuse papers had been published by year 5. Data reuse was distributed across a broad base of datasets: a very conservative estimate found that 20% of the datasets deposited between 2003 and 2007 had been reused at least once by third parties. Conclusion. After accounting for other factors affecting citation rate, we find a robust citation benefit from open data, although a smaller one than previously reported. We conclude there is a direct effect of third-party data reuse that persists for years beyond the time when researchers have published most of the papers reusing their own data. Other factors that may also contribute to the citation benefit are considered. We further conclude that, at least for gene expression microarray data, a substantial fraction of archived datasets are reused, and that the intensity of dataset reuse has been steadily increasing since 2003.
As sharing data openly becomes more and more the norm, and not just because of mandates for federal funding, more researchers may become more interested in sharing data. Benefits of data sharing for educational research include increased collaboration, acceleration of knowledge through novel and creative research questions, and an increase in equitable opportunities for early career researchers and faculty at under-resourced institutions. In this session, Sara Hart covers the benefits of data sharing as well as the “how to” of how to prepare data for sharing. Participants are provided information about data sharing and resources to support their own data sharing.
Sharing data and code are important components of reproducible research. Data sharing in research is widely discussed in the literature; however, there are no well-established evidence-based incentives that reward data sharing, nor randomized studies that demonstrate the effectiveness of data sharing policies at increasing data sharing. A simple incentive, such as an Open Data Badge, might provide the change needed to increase data sharing in health and medical research. This study was a parallel group randomized controlled trial (protocol registration: doi:10.17605/OSF.IO/PXWZQ) with two groups, control and intervention, with 80 research articles published in BMJ Open per group, with a total of 160 research articles. The intervention group received an email offer for an Open Data Badge if they shared their data along with their final publication and the control group received an email with no offer of a badge if they shared their data with their final publication. The primary outcome was the data sharing rate. Badges did not noticeably motivate researchers who published in BMJ Open to share their data; the odds of awarding badges were nearly equal in the intervention and control groups (odds ratio = 0.9, 95% CI [0.1, 9.0]). Data sharing rates were low in both groups, with just two datasets shared in each of the intervention and control groups. The global movement towards open science has made significant gains with the development of numerous data sharing policies and tools. What remains to be established is an effective incentive that motivates researchers to take up such tools to share their data.
Intensified and extensive data production and data storage are characteristics of contemporary western societies. Health data sharing is increasing with the growth of Information and Communication Technology (ICT) platforms devoted to the collection of personal health and genomic data. However, the sensitive and personal nature of health data poses ethical challenges when data is disclosed and shared even if for scientific research purposes.
With this in mind, the Science and Values Working Group of the COST Action CHIP ME ‘Citizen's Health through public-private Initiatives: Public health, Market and Ethical perspectives’ (IS 1303) identified six core values they considered to be essential for the ethical sharing of health data using ICT platforms. We believe that using this ethical framework will promote respectful scientific practices in order to maintain individuals’ trust in research.
We use these values to analyse five ICT platforms and explore how emerging data sharing platforms are reconfiguring the data sharing experience from a range of perspectives. We discuss which types of values, rights and responsibilities they entail and enshrine within their philosophy or outlook on what it means to share personal health information. Through this discussion we address issues of the design and the development process of personal health data and patient-oriented infrastructures, as well as new forms of technologically-mediated empowerment.
The last ten years have witnessed increasing awareness of questionable research practices (QRPs) in the life sciences, including p-hacking, HARKing, lack of replication, publication bias, low statistical power and lack of data sharing (see Figure 1). Concerns about such behaviours have been raised repeatedly for over half a century but the incentive structure of academia has not changed to address them. Despite the complex motivations that drive academia, many QRPs stem from the simple fact that the incentives which offer success to individual scientists conflict with what is best for science. On the one hand are a set of gold standards that centuries of the scientific method have proven to be crucial for discovery: rigour, reproducibility, and transparency. On the other hand are a set of opposing principles born out of the academic career model: the drive to produce novel and striking results, the importance of confirming prior expectations, and the need to protect research interests from competitors. Within a culture that pressures scientists to produce rather than discover, the outcome is a biased and impoverished science in which most published results are either unconfirmed genuine discoveries or unchallenged fallacies. This observation implies no moral judgement of scientists, who are as much victims of this system as they are perpetrators.
Is there a difference in citation rates between articles that were published with links to data and articles that were not? Besides being interesting from a purely academic point of view, this question is also highly relevant for the process of furthering science. Data sharing not only helps the process of verification of claims, but also the discovery of new findings in archival data. However, linking to data still is a far cry away from being a "practice", especially where it comes to authors providing these links during the writing and submission process. You need to have both a willingness and a publication mechanism in order to create such a practice. Showing that articles with links to data get higher citation rates might increase the willingness of scientists to take the extra steps of linking data sources to their publications. In this presentation we will show this is indeed the case: articles with links to data result in higher citation rates than articles without such links. The ADS is funded by NASA Grant NNX09AB39G.
The Mozilla Science Lab is developing an Open Data Training Program. This repository will be where we build and share our curriculum and resources for open data.
14. Brave New World: Privacy, Data Sharing and Evidence Based Policy Making
The trifecta of globalization, urbanization and digitization have created new opportunities and challenges across our nation, cities, boroughs and urban centers. Cities in particular are in a unique position at the center of commerce and technology becoming hubs for innovation and practical application of emerging technology. In this rapidly changing 24/7 digitized world, governments are leveraging innovation and technology to become more effective, efficient, transparent and to be able to better plan for and anticipate the needs of its citizens, businesses and community organizations. This class will provide the framework for how cities and communities can become smarter and more accessible with technology and more connected.
MANTRA is a free, online non-assessed course with guidelines to help you understand and reflect on how to manage the digital data you collect throughout your research. It has been crafted for the use of post-graduate students, early career researchers, and also information professionals. It is freely available on the web for anyone to explore on their own.
Through a series of interactive online units you will learn about terminology, key concepts, and best practice in research data management.
There are eight online units in this course and one set of offline (downloadable) data handling tutorials that will help you:
1. Understand the nature of research data in a variety of disciplinary settings
2. Create a data management plan and apply it from the start to the finish of your research project
3. Name, organise, and version your data files effectively
4. Gain familiarity with different kinds of data formats and know how and when to transform your data
5. Document your data well for yourself and others, learn about metadata standards and cite data properly
6. Know how to store and transport your data safely and securely (backup and encryption)
7. Understand legal and ethical requirements for managing data about human subjects; manage intellectual property rights
8. Understand the benefits of sharing, preserving and licensing data for re-use
9. Improve your data handling skills in one of four software environments: R, SPSS, NVivo, or ArcGIS
Each unit takes up to one hour, plus time for further reading and carrying out the data handling exercises. In the units you will find explanations, descriptions, examples, exercises, and video clips in which academics, PhD students and others talk about the challenges of managing research data. The data handling tutorials assume some experience with each software environment and provide exercises in PDF along with open datasets to download and work through using your own installed software.
MANTRA modules and data handling exercises are available for download via Zenodo: https://doi.org/10.5281/zenodo.1035218
This list of resources consists of resources for researchers, editors, and reviewers interested in practicing open science principles, particularly in education research. This list is not exhaustive but meant as a starting point for individuals wanting to learn more about doing open science work specifically for qualitative research.This list was compiled by the following contributors: Rachel Renbarger, Sondra Stegenga, Thomas, Sebastian Karcher, and Crystal Steltenpohl. This resource list grew out of a hackathon at the Virtual Unconference on Open Scholarship Practices in Education Research.
This Github repository contains curriculum and materials for courses and workshops taught through the University of Miami Libraries.
If you are not looking for the repository, but simply the curriculum materials, please see the hosted version: https://umiamilibraries.github.io/courses-and-workshops/.
The repository was started through the Data Curation Initiative (http://library.miami.edu/datacuration) at the University of Miami Libraries (http://library.miami.edu).
The repository was created by Tim Norris with help and inspiriation from many others including Elizabeth Fish, Angela Clark, and all the students, faculty, and staff who have participated in the seminar.
The foundation of health and medical research is data. Data sharing facilitates the progress of research and strengthens science. Data sharing in research is widely discussed in the literature; however, there are seemingly no evidence-based incentives that promote data sharing. Methods A systematic review (registration: doi.org/10.17605/OSF.IO/6PZ5E) of the health and medical research literature was used to uncover any evidence-based incentives, with pre- and post-empirical data that examined data sharing rates. We were also interested in quantifying and classifying the number of opinion pieces on the importance of incentives, the number observational studies that analysed data sharing rates and practices, and strategies aimed at increasing data sharing rates. Results Only one incentive (using open data badges) has been tested in health and medical research that examined data sharing rates. The number of opinion pieces (n = 85) out-weighed the number of article-testing strategies (n = 76), and the number of observational studies exceeded them both (n = 106). Conclusions Given that data is the foundation of evidence-based health and medical research, it is paradoxical that there is only one evidence-based incentive to promote data sharing. More well-designed studies are needed in order to increase the currently low rates of data sharing.
Efforts to make research results open and reproducible are increasingly reflected by journal policies encouraging or mandating authors to provide data availability statements. As a consequence of this, there has been a strong uptake of data availability statements in recent literature. Nevertheless, it is still unclear what proportion of these statements actually contain well-formed links to data, for example via a URL or permanent identifier, and if there is an added value in providing them. We consider 531,889 journal articles published by PLOS and BMC which are part of the PubMed Open Access collection, categorize their data availability statements according to their content and analyze the citation advantage of different statement categories via regression. We find that, following mandated publisher policies, data availability statements have become common by now, yet statements containing a link to a repository are still just a fraction of the total. We also find that articles with these statements, in particular, can have up to 25.36% higher citation impact on average: an encouraging result for all publishers and authors who make the effort of sharing their data. All our data and code are made available in order to reproduce and extend our results.
Open access to research data has been described as a driver of innovation and a potential cure for the reproducibility crisis in many academic fields. Against this backdrop, policy makers are increasingly advocating for making research data and supporting material openly available online. Despite its potential to further scientific progress, widespread data sharing in small science is still an ideal practised in moderation. In this article, we explore the question of what drives open access to research data using a survey among 1564 mainly German researchers across all disciplines. We show that, regardless of their disciplinary background, researchers recognize the benefits of open access to research data for both their own research and scientific progress as a whole. Nonetheless, most researchers share their data only selectively. We show that individual reward considerations conflict with widespread data sharing. Based on our results, we present policy implications that are in line with both individual reward considerations and scientific progress.