What is the purpose of DataRescue Princeton?
The event aims to help preserve publicly available climate and environmental data by bringing together Princeton faculty, staff and students for a local data rescue event held in coordination with the national data rescue initiative. We are asking Princeton participants to identify datasets that are important to them that can be included in the data rescue processing.
The goals of DataRescue Princeton and its sister events throughout the country are to ensure that important scientific data be responsibly archived so as to be permanently findable, accessible, and usable and to raise awareness of the significance of this data in improving our understanding of issues that are important in our communities and everyday lives.
Register by Thursday, May 18th. Free T-shirts for the first volunteers!
What should I bring?
Bring your laptop and charger. A limited number of laptops will be available.
Food will be provided.
What can I do?
Volunteers can pick from several different roles or tracks based on their skills and expertise. If you are interested in being leads/guides for every role or track, contact data.rescue@princeton.edu.
Read the “Getting Started Guide” for additional information.
Roles
- Researchers – Inspect “uncrawlable” list and investigate how it can best be harvested
- Harvesters – Capture “uncrawlable” data, requires substantial technical expertise
- Checkers/Baggers – Perform quality assurance on dataset and verify content according to standards
- Describers – Creates a descriptive record in the DataRefuge CKAN repository
Archiving More Complex Datasets
Researching: Researchers inspect the “uncrawlable” list to confirm that Seeders’ assessments were correct (that is, that the URL/dataset is indeed uncrawlable), and investigate how the dataset could be best harvested. Researching.md describes this process in more detail.
We recommend that Researchers and Harvesters (see below) work together in pairs, as much communication is needed between the two roles. In some cases, one person will fulfill both roles.
Skills needed: Strong front-end web experience and enjoy research. Understanding of how federal data is organized is helpful.
Harvesting: Harvesters take the “uncrawlable” data and try to figure out how to actually capture it based on the recommendations of the Researchers. This is a complex task which can require substantial technical expertise and different techniques for different tasks. Harvesters should also review the Harvesting Toolkit for tools.
Skills needed: Programming in Python, JavaScript, C, etc. comfortable with command line or experience working with structured data. Front end web development a plus.
Checking/Bagging: Checkers inspect a harvested dataset and make sure that it is complete. The main question the checkers need to answer is “will the bag make sense to a scientist”? Checkers need to have an in-depth understanding of harvesting goals and potential content variations for datasets. Checking is currently performed by Baggers and does not exist as a separate stage in the Archivers app.
Baggers perform some quality assurance on the dataset to make sure the content is correct and corresponds to the original URL. Then they package the data into a bagit file (or “bag”), which includes basic technical metadata, and upload it to the final DataRefuge destination.
Skills needed: Data or web archiving experience or strong technical skills and attention to detail.
Describing: Describers create a descriptive record in the DataRefuge CKAN repository for each bag. Then they link the record to the bag and make the record public.
Skills needed: Experience working with scientific data or metadata practices.
Important: Volunteers must read the overview section of the DataRescue workflow to learn more about the different roles when signing up.
Video tutorial: Archivers app: https://www.youtube.com/watch?v=tvSSILnHnpA
Citizen Science: Volunteers will explore and contribute to the amazing world that is the Zooniverse, where anyone can be a researcher! Zooniverse’s “goal is to enable research that would not be possible, or practical, otherwise.”
The projects are very clearly described and instructions for participation are simple. “You’ll be able to study authentic objects of interest gathered by researchers, like images of faraway galaxies, historical records and diaries, or videos of animals in their natural habitats. By answering simple questions about them, you’ll help contribute to our understanding of our world, our history, our Universe, and more.” It is wonderful way to engage in research in so many areas of study at your own pace and make a real impact.
Zooniverse.org: https://www.zooniverse.org/about
This event is organized by representatives from the Princeton Institute for Computational Science & Engineering (PICSciE), Office of Information Technology (OIT), and Princeton University Library and co-sponsored by the Center for Digital Humanities (CDH), Princeton Environmental Institute (PEI) and the Andlinger Center for Energy and the Environment (ACEE) in conjunction with the national initiative directed by the Environmental Data and Governance Initiative (EDGI) and PPEH DataRefuge.

