2019 Conducting Reproducible Science with Sciunits
Tanu Malik, DePaul University
Science is conducted collaboratively and often requires sharing of computational experiments. An experiment often includes diverse elements such as software, its past execution, provenance, and associated documentation. The notion of a ``research object’’ implies aggregation and identification of such diverse elements of computational experiments. Mere aggregation is, however, not sufficient for the sharing of computational experiments. Other users must be able to easily recompute on these shared research objects. We will present the "sciunit'', a reusable research object in which aggregated content is tracked, secured and made recomputable in different environments. We describe a Git-like client that efficiently creates, stores, repeats, and reproduces sciunits. We show that sciunits repeat computational experiments with minimal storage and processing overhead.
We will show how Sciunts improve reproducibility in computational hydrology with Hydroshare.org, an online collaboration environment for sharing data, models, and code. Researchers create sciunits on their local machines and add hydrology-specific metadata to these sciunits using Hydroshare’s API. Tools like CyberGIS and JupyterHub that have been integrated with HydroShare make sciunits reusable to run models using notebooks, Docker containers, and cloud resources. We will report on lessons learned and how sciunits can be used to improve the practice of maintaining Findable, Accessible, Interoperable, and Reusable (FAIR) computational experiments across all geoscience domains. Finally, we will report on complexity of computational experiments in the domains of solid earth and space science and unmet validation and documentation challenges in maintaining and using sciunits.
|When:||Thursday 14 February, 2019, 2:00 pm - 3:00 pm PST|