[CIG-MC] AGU's new data policy

Louise Kellogg lhkellogg at ucdavis.edu
Tue Mar 6 18:37:38 PST 2018


Hi Scott,

I have similar concerns about the cost-benefit. Among other things, the cost of *reproducing* a calculation presumably generally drops over time, making me question the value of storing the model output vs. regenerating it. The geodynamo calculations are an exception because they were so expensive to generate.

This is different for the seismic and geodetic observations stored by IRIS and UNAVCO, which are not reproducible since they are observations of the natural system.

I also have a concern about discoverability. If you store your model outputs on a system provided by VT and I store mine on a UC system, would anyone really be able to find them? We as a community have not agreed on metadata or other relevant information for finding data. So I fear we may end up with a pricy “write once, read never” system.

Best,

Louise


> On Mar 6, 2018, at 5:48 PM, Scott King <sking07 at vt.edu> wrote:
> 
> True.  Others might want our data.  That is worth thinking about. 
> 
> Maybe I’m a frustrated economist at heart.  Mr google tells me that the cost of enterprise storage is between $2,500 and $10,000 a year. That money is coming out of the total funding available for science somehow.  My 3 TB for this one paper would cost the scientific community something like 0.25 to 1.0 GRA per year.  Food for thought.  
> 
> Sent from my iPhone
> 
> On Mar 6, 2018, at 4:30 PM, Lorraine Hwang <ljhwang at ucdavis.edu <mailto:ljhwang at ucdavis.edu>> wrote:
> 
>> There may also be communities who are interested in model output for reuse as well as reproducibility.
>> 
>> Best,
>> -Lorraine
>> 
>> *****************************
>> Lorraine Hwang, Ph.D.
>> Associate Director, CIG
>> 530.752.3656
>> 
>> 
>> 
>>> On Mar 6, 2018, at 9:28 AM, Cooper, Catherine M <cmcooper at wsu.edu <mailto:cmcooper at wsu.edu>> wrote:
>>> 
>>> 
>>> I wonder if it wouldn’t be helpful to have a community statement as to what we consider “data” and what we agree needs to be shared for reproducibility (which we all agree is important)?  But it seems like we might need to do some outreach on this if there is some misunderstanding about model output as data amongst AGU and NASA (this has come up in proposal reviews).  
>>> 
>>> 
>>>> On Mar 6, 2018, at 9:01 AM, Juliane Dannberg <judannberg at gmail.com <mailto:judannberg at gmail.com>> wrote:
>>>> 
>>>> My experience with this is similar to what Thorsten describes. I also regularly have TB-sized model output, and usually include the doi of the version of the code I used in the paper, upload all input files/scripts etc. I used as supplementary material, and include a sentence that "all input files necessary to reproduce the model results are included in the supplementary material". So far, that seemed to be an acceptable solution, also for AGU journals. 
>>>> But I agree that there doesn't seem to be a good way to archive TB-sized model output over long periods of time...
>>>> 
>>>> Best, 
>>>> Juliane
>>>> 
>>>> 
>>>> ----------------------------------------------------------------------
>>>> Juliane Dannberg
>>>> Postdoctoral Fellow, Colorado State University
>>>> http://www.math.colostate.edu/~dannberg/ <https://urldefense.proofpoint.com/v2/url?u=http-3A__www.math.colostate.edu_-257Edannberg_&d=DwMDaQ&c=C3yme8gMkxg_ihJNXS06ZyWk4EJm8LdrrvxQb-Je7sw&r=3iAcC0lqlf3gOx4_NidEiA&m=HfT2q7BUNv7lwQ4rNBn6WCxad64-R40vEvd3Ehweq84&s=iVhuH7wc5RA1dEULO5hsENSQBBcmRpe1dBNQMzkZxOU&e=>
>>>> 
>>>> 
>>>> Am 3/6/2018 um 9:39 AM schrieb Thorsten Becker:
>>>>> The way I have interpreted AGU's guidelines for geodynamic studies as AGU editor is to not ask for archiving of model output, but to ask for general access to all material that would be needed to recreate that output, or some simpler version of it that is proof of concept. I.e. input data, input files, and a DOI to version of code, for example, if a community code is used. 
>>>>> 
>>>>> The general idea is, of course, to make things reproducible, and AGU and Wiley are among those who realize that this can cause problems, and are working on solutions with the community. 
>>>>> 
>>>>> One particular issue is that I have not asked for verification that results are actually reproducible, and taken authors assurances that codes will be shared at face value (besides when the publications were of technical nature, and we ask reviewers to actually try to download and run the software, for example (which usually never works)). I think that part might change, in that publishers may ask for a code access link and somehow archive this. 
>>>>> 
>>>>> I can also see some solutions akin to asking for a Docker set up, archived somewhere, that will allow anyone to rerun the models. There are interesting challenges involved, but in the end, I think moving to more openness and reproducibility is a good thing, and the success of CIG shows how some issues that were raised before we moved into this model resolved themselves. Things are perfect, but we're making progress. 
>>>>> 
>>>>> My personal experience with publishing numerical stuff in highly visible journals is that, within a week, there are people actually asking to get all the code and all the input files to rerun our models, and we've always shared all of our stuff, of course. I realize that this is a significant workload (particularly for my grad students who actually put this stuff together...) and somehow AGU and publishers need to do more to support people with large data volumes, seismological inversions being another example. 
>>>>> 
>>>>> 
>>>>> Thorsten Becker - UTIG & DGS, JSG, UT Austin <https://urldefense.proofpoint.com/v2/url?u=http-3A__www-2Dudc.ig.utexas.edu_external_becker_&d=DwMDaQ&c=C3yme8gMkxg_ihJNXS06ZyWk4EJm8LdrrvxQb-Je7sw&r=3iAcC0lqlf3gOx4_NidEiA&m=HfT2q7BUNv7lwQ4rNBn6WCxad64-R40vEvd3Ehweq84&s=ht52HpJGdxPfwGDTFHbOvc6DI21TD42eHQ4S-Bm8Iyo&e=>
>>>>> On Tue, Mar 6, 2018 at 7:17 AM, Scott King <sdk at vt.edu <mailto:sdk at vt.edu>> wrote:
>>>>> 
>>>>> AGU journals have a new data policy requiring that all the data from the work must be in a publicly accessible repository.  In general I think this is a good thing.   They provide several possible solutions.   From the editor letter…
>>>>> 
>>>>> "AGU requires that data needed to understand and build upon the published research be available in public repositories following best practices <https://urldefense.proofpoint.com/v2/url?u=http-3A__publications.agu.org_author-2Dresource-2Dcenter_publication-2Dpolicies_data-2Dpolicy_data-2Dpolicy-2Dfaq_&d=DwMDaQ&c=C3yme8gMkxg_ihJNXS06ZyWk4EJm8LdrrvxQb-Je7sw&r=3iAcC0lqlf3gOx4_NidEiA&m=HfT2q7BUNv7lwQ4rNBn6WCxad64-R40vEvd3Ehweq84&s=ySxEBeS3cBYIOD8hSSkU7WymnOp8M-wucwrFXLBFHss&e=>. This includes an explicit statement in the Acknowledgments section on where users can access or find the data for this paper. Citations to archived data should be included in your reference list and all references, including those cited in the supplement, should be included in the main reference list. All listed references must be available to the general reader by the time of acceptance.”
>>>>> 
>>>>> They list several possible repositories, none of which seem appropriate for 2.9 TB of CicomS results. Set aside the philosophical issue that model results are not “data” (they don’t accept that).   I have the output used in the published figures down to a reasonable size but. I’m curious what others are doing.  Has anyone else run into this yet?  (If not you will.)  I’m curious if there is a community consensus regarding a repository where all geodynamics results would/could end up, as opposed to ending up with them scattered across 3-4 (or more) potential repositories.  Maybe that’s not something to worry about, but since this is new and to me at least I’ve had no time to think it through, I’m curious what others are doing.
>>>>> 
>>>>> Thoughts?
>>>>> 
>>>>> Cheers,
>>>>> 
>>>>> Scott
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> _______________________________________________
>>>>> CIG-MC mailing list
>>>>> CIG-MC at geodynamics.org <mailto:CIG-MC at geodynamics.org>
>>>>> http://lists.geodynamics.org/cgi-bin/mailman/listinfo/cig-mc <https://urldefense.proofpoint.com/v2/url?u=http-3A__lists.geodynamics.org_cgi-2Dbin_mailman_listinfo_cig-2Dmc&d=DwMDaQ&c=C3yme8gMkxg_ihJNXS06ZyWk4EJm8LdrrvxQb-Je7sw&r=3iAcC0lqlf3gOx4_NidEiA&m=HfT2q7BUNv7lwQ4rNBn6WCxad64-R40vEvd3Ehweq84&s=XOhKoMDTham1Kxbm10gSj_HK0WwQs7oPVG5RjUctuS0&e=>
>>>>> 
>>>>> 
>>>>> 
>>>>> _______________________________________________
>>>>> CIG-MC mailing list
>>>>> CIG-MC at geodynamics.org <mailto:CIG-MC at geodynamics.org>
>>>>> http://lists.geodynamics.org/cgi-bin/mailman/listinfo/cig-mc <https://urldefense.proofpoint.com/v2/url?u=http-3A__lists.geodynamics.org_cgi-2Dbin_mailman_listinfo_cig-2Dmc&d=DwMDaQ&c=C3yme8gMkxg_ihJNXS06ZyWk4EJm8LdrrvxQb-Je7sw&r=3iAcC0lqlf3gOx4_NidEiA&m=HfT2q7BUNv7lwQ4rNBn6WCxad64-R40vEvd3Ehweq84&s=XOhKoMDTham1Kxbm10gSj_HK0WwQs7oPVG5RjUctuS0&e=>
>>>> _______________________________________________
>>>> CIG-MC mailing list
>>>> CIG-MC at geodynamics.org <mailto:CIG-MC at geodynamics.org>
>>>> https://urldefense.proofpoint.com/v2/url?u=http-3A__lists.geodynamics.org_cgi-2Dbin_mailman_listinfo_cig-2Dmc&d=DwIGaQ&c=C3yme8gMkxg_ihJNXS06ZyWk4EJm8LdrrvxQb-Je7sw&r=3iAcC0lqlf3gOx4_NidEiA&m=HfT2q7BUNv7lwQ4rNBn6WCxad64-R40vEvd3Ehweq84&s=XOhKoMDTham1Kxbm10gSj_HK0WwQs7oPVG5RjUctuS0&e= <https://urldefense.proofpoint.com/v2/url?u=http-3A__lists.geodynamics.org_cgi-2Dbin_mailman_listinfo_cig-2Dmc&d=DwIGaQ&c=C3yme8gMkxg_ihJNXS06ZyWk4EJm8LdrrvxQb-Je7sw&r=3iAcC0lqlf3gOx4_NidEiA&m=HfT2q7BUNv7lwQ4rNBn6WCxad64-R40vEvd3Ehweq84&s=XOhKoMDTham1Kxbm10gSj_HK0WwQs7oPVG5RjUctuS0&e=>
>>> _______________________________________________
>>> CIG-MC mailing list
>>> CIG-MC at geodynamics.org <mailto:CIG-MC at geodynamics.org>
>>> http://lists.geodynamics.org/cgi-bin/mailman/listinfo/cig-mc <http://lists.geodynamics.org/cgi-bin/mailman/listinfo/cig-mc>
>> _______________________________________________
>> CIG-MC mailing list
>> CIG-MC at geodynamics.org <mailto:CIG-MC at geodynamics.org>
>> http://lists.geodynamics.org/cgi-bin/mailman/listinfo/cig-mc <http://lists.geodynamics.org/cgi-bin/mailman/listinfo/cig-mc>_______________________________________________
> CIG-MC mailing list
> CIG-MC at geodynamics.org
> http://lists.geodynamics.org/cgi-bin/mailman/listinfo/cig-mc

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.geodynamics.org/pipermail/cig-mc/attachments/20180306/203147c4/attachment-0001.html>


More information about the CIG-MC mailing list