Geodynamics - Publishing

You are here: Home / Software / Best Practices / Publishing

18.222.44.156

Publishing

Details: Published on Saturday, 02 September 2023 02:01

Congratulations on completing your research project. Now it is time to publish.

Open research statements are now a common requirement when publishing research. These support reuse, validation, and citation and often take the form of Data availability, Data access, Code availability, Open Research, and Software availability statements. “Data available upon request” is increasingly considered insufficient by many funding agencies and publishers. Availability of research data and methods also address the need for reviewers to have access to sufficient information to evaluate the publication. In doing so, it is an aid to reproducibility and replicability of research, an integral part of the scientific method.

Here we adopt the definitions of reproducibility and replicability from NAS (2019):

Reproducibility is obtaining consistent results using the same input data; computational steps, methods, and code; and conditions of analysis.

Replicability is obtaining consistent results across studies aimed at answering the same scientific question, each of which has obtained its own data. Two studies may be considered to have replicated if they obtain consistent results given the level of uncertainty inherent in the system under study.

Definitions more
Another term often used is repeatable which is a different concept. A researcher should be able to repeat their computation. A different research group should be able to reproduce the results with the same model setup. A different research group, using their own model set up, should be able to replicate the results.

A description of methods and experimental data and parameters have always been required when publishing peer reviewed research. However, in the modern era of big data and computation, fulfilling this requirement means making code, input, and output files openly available. In some places, the term “computational reproducibility” may be used.

Below we specify what elements should be deposited into a trusted repository for models to be reproducible but not executable. We adopt the philosophy that upon execution from a binary, container or compilation, a learner (someone new to the code but not yet a skilled user) should be able to reproduce all model results within the range of reasonable numerical and system tolerances.

What to Deposit

When constructing your computational model, you make choices - choices of geometry, meshes, initial conditions, boundary conditions, and material models. The choices made in compilation and numerics chosen e.g. solvers, time stepping also impact your results. In addition, research sometimes requires you to modify or add to code and/or write new pre- or post-processors. The combination of these choices makes your research unique. Models can be derivatives of other work and/or part of a release package and those works should be cited. Your modifications are what makes your model distinct and the fusion of these changes around a research hypothesis forms the so-called 3rd pillar of science, computation - theory and experimentation being the 1st and 2nd, respectively.

Along with the code repository itself, modeling software often has a defined directory structure for model inputs and outputs. When depositing, retain this directory structure as practicable.

Data

Deposit any input file you have modified and used in support of your publication and needed to reproduce results. Deposit any output files needed to verify that the reproduction of results was successful e.g. log files and scripts, and support figures in publication. Preserve the expected (default) directory structure. From these files, a learner should be able to recreate raw model outputs from given input files and scripts.

Include:

Configuration files

✓Log files or scripts that include information on compile and runtime environment

Input files

✓Run scripts or similar, including parameters set on the command line Parameters files that specify the physics of the model Parameters files that specify numerics of the model Output files Log files. In some cases, an output file containing a complete description of the simulation is available and may substitute for some input information requirements. While this meets the definition of reproducibility, it is not a recommended substitute for the original parameter of configuration files.
✓Data as needed to reproduce figures. Also consider depositing model output in formats for reuse by others.

See links to individual software packages for specific guidance.

Additional guidance is provided by AGU - Guidelines for Research Primarily Based on Numerical Models or Theory.

Code

Some but not all research is based on a released and archived version of the code and/or executable binaries. Use the following guidelines to decide whether you must deposit code used in your research.

If you:

Used a regular release with no modifications, you do not need to archive the code. This has already been done for you. Just remember to cite the proper version using the DOI published on Zenodo. See the User Manual or the software landing page How to Cite tab on geodynamics.org
Used a version from others, cite their work. If necessary, contact the authors and work with them on a proper software citation.
Used a development version (version between releases along the main branch), fork the code at the proper git hash and deposit it in an approved repository.
Made minor changes in a few places or added a few plugins using a previous archived version, deposit just the files you have changed in an approved repository.
Made major changes, create a zip release of your fork and branch and deposit it in an approved repository.

Pre- and Post-processing

Custom software for the analysis and visualization of the data should be properly deposited and cited.

Scripts

Scripts necessary to reproduce results should be properly deposited and cited. Standard graphics, spreadsheet, or word processing programs do not need to be cited or archived. This includes ParaView, VisIt, and VAPoR.

IMPORTANT
Always remember to properly cite a released version of a software package you used, even if you made code modifications and deposited your own version so developers can receive proper credit. Citation information for all CIG software packages can be found in their manual or software landing page on geodynamics.org.

Repositories

Your data and software should be deposited in a trusted community repository that assigns a persistent identifier (PID) such as a DOI and has a stated preservation policy. PIDs establish the authenticity of a resource and provide access to a resource if its location changes. A url is NOT a PID.

Repositories commonly used include:

Zenodo. Any format up to 50GB per dataset. Higher quotas granted on a case-by-case basis.
Figshare. Any format up to 20GB for free accounts. For Nature Portfolio journals, the free limit is increased to 100GB per manuscript.
OSF. Free for public projects up to 50GB OSF. Institutional Repositories. Consult with your University / Institutional Library on repositories available to you in support of compliance with funding and publishing requirements.

AGU provides a listing and description of domain-discipline repositories. See also the Generalist Repository Comparison Chart and https://www.re3data.org/.

Check with your journal for recommendations. Most require FAIR-aligned repositories.

CAUTION
GitHub is not a FAIR-aligned repository supporting persistence. It can not be used for preservation. GitHub terms of use include the possibility that GitHub can delete a repo, or the owner can delete. A GitHub url is not a persistent identifier. Instead, use the GitHub/Zenodo bridge to preserve the version used in your research, cite the preserved version, and also include the URL to your GitHub repository for convenience and reuse.

CIG Community on Zenodo

CIG has a community on Zenodo to promote discoverability of research products in geodynamics. All CIG software are a part of this community. When depositing your model and software, associate your research with our Communities:

Computational Infrastructure for Geodynamics

Aid additional discoverability in Zenodo by using the recommended keywords. The Keyword metadata field is found in the Basic Information section.

Remember to add Related/alternate identifiers field to the metadata for related publications and datasets. You can edit your metadata after you publish but you cannot alter the deposit.

Examples

Many journals are signatories to or endorse the Enabling FAIR Data Commitment Statement in the Earth, Space, and Environmental Sciences. As such, they share the joint belief that data should, to the greatest extent possible, be shared, open, and stored in community-approved FAIR-aligned repositories, and strive to adopt a shared set of author guidelines that support these principles, providing a common set of expectations for authors in the Earth, space, and environmental sciences. These include:

Citation within the article with the corresponding reference in the reference list.
Use of persistent identifiers.
Software used in the research should also be cited and deposited in an archival repository that assigns persistent identifiers.
A Data Availability Statement describing how the data underlying the findings of the paper can be accessed and reused.

AGU

AGU has the most comprehensive data and software policies as well as guidance for authors. Meeting AGUs requirements should satisfy most journal policies. AGU provides the following templates for data and software:

Data Availability Template for Data
The [type of data] data used for [brief context, description] in the study are available at [repository, source name] via [DOI, persistent identifier link] with [license, access conditions] [optional in-text citation in References]

Data Availability Template for Software
[Version number] of the [software name] used for [brief context, description of what the software was used for] is preserved at [DOI, persistent identifier link], available via [license type, access conditions] and developed openly at [software development]

Depending on the software used, you may consider archiving both your data and your code modifications in the same repository. Include a README file that describes the contents of your deposit and the associated publication. For some data (e.g. observational or experimental), depositing in a domain repository is the best place. Note that not all repositories support depositing both data and software. In addition, creating two citable objects clarifies what is being cited.

Journal Policies

Follow the links below to find more information on relevant policies for specific journals.

Publisher	Journal	Resources
American Astronomical Society Publishing	The Astrophysical Journal	Policy Statement on Software Data Guide
American Geophysical Union	Journal of Geophysical Research Solid Earth Geophysical Research Letters (GRL) Geochemistry, Geophysics, Geosystems (G3) Reviews of Geophysics Tectonics	Data and Software for Authors
Copernicus	Solid Earth	Data Policy
Elsevier	EPSL Icarus	Guide for Authors Data Statement
	Tectonophysics	Guide for Authors
IOP	The Astrophysical Journal	Data Availability Policy
Nature Portfolio	Nature Geoscience Nature Communications	Reporting standards Authorship
Oxford Academic	Geophysical Journal International	Data, Results, and Software Policy

Additional Guidance

Additional software specific guidance on data deposits and software and data availability statement (links pending):

ASPECT [website]
PyLith
Rayleigh [manual]
SPECFEM