Difference between revisions of "Make data accessible by Cedric David"

From Geoscience Paper of the Future
Jump to: navigation, search
Line 5: Line 5:
 
Several tasks therefore need be performed here:  <br/>
 
Several tasks therefore need be performed here:  <br/>
 
- Narrow down a list of all necessary files to reproduce the results published.  <b>Completed</b>. <br/>  
 
- Narrow down a list of all necessary files to reproduce the results published.  <b>Completed</b>. <br/>  
- Make sure all experiments can be reproduced.  One unforeseen issue is that one of the experiments was only performed in the first RAPID publication and never used since then.  The experiment is still hard-coded in the source and is not currently an option in the most recent version of RAPID that has included a "namelist" since April 2011.  Some code modifications would be necessary to check that the experiment can indeed be reproduced.  Basic tests were run, it looks like these files can be reproduced so they will be included in data publication.  <b>Pending</b>. <br/>  
+
- Make sure all experiments can be reproduced.  One unforeseen issue is that one of the experiments was only performed in the first RAPID publication and never used since then.  The experiment is still hard-coded in the source and is not currently an option in the most recent version of RAPID that has included a "namelist" since April 2011.  Some code modifications would be necessary to check that the experiment can indeed be reproduced.  Basic tests were run, it looks like these files can be reproduced so they will be included in data publication.  <b>Completed</b>. <br/>  
 
- Select a repository.  This was a challenge.  Nine of the files in the article are larger than 250 MB which ruled FigShare out.  I also looked into Dryad which is an option because the data corresponds to a peer-reviewed paper.  But Dryad has data publication fees which become large when repositories get larger than 10 GB which will happen for the other RAPID datasets that I plan to share.  Finally I looked at Zenodo which allows for free publication and accepts large files (up to 2 GB).  This is sufficient for the dataset used here.  I also contacted Zenodo to ask to waive the 2 GB limit for some of my future uploads and received an encouraging answer.  So Zenodo was selected here.  <b>Completed</b>. <br/>  
 
- Select a repository.  This was a challenge.  Nine of the files in the article are larger than 250 MB which ruled FigShare out.  I also looked into Dryad which is an option because the data corresponds to a peer-reviewed paper.  But Dryad has data publication fees which become large when repositories get larger than 10 GB which will happen for the other RAPID datasets that I plan to share.  Finally I looked at Zenodo which allows for free publication and accepts large files (up to 2 GB).  This is sufficient for the dataset used here.  I also contacted Zenodo to ask to waive the 2 GB limit for some of my future uploads and received an encouraging answer.  So Zenodo was selected here.  <b>Completed</b>. <br/>  
 
- Select a license.  Going with CC BY to share as widely as possible while retaining authorship and benefiting from potential citations.  <b>Completed</b>. <br/>  
 
- Select a license.  Going with CC BY to share as widely as possible while retaining authorship and benefiting from potential citations.  <b>Completed</b>. <br/>  

Revision as of 22:57, 2 April 2015


Details on how to do this task: Make data accessible

Selected input and output data files corresponding to the first RAPID peer-reviewed article are already available online on the RAPID website. However, not all the files needed to reproduce the experiments of the first publication are currently available. And I haven't made sure that all experiments can indeed be reproduced.
Several tasks therefore need be performed here:
- Narrow down a list of all necessary files to reproduce the results published. Completed.
- Make sure all experiments can be reproduced. One unforeseen issue is that one of the experiments was only performed in the first RAPID publication and never used since then. The experiment is still hard-coded in the source and is not currently an option in the most recent version of RAPID that has included a "namelist" since April 2011. Some code modifications would be necessary to check that the experiment can indeed be reproduced. Basic tests were run, it looks like these files can be reproduced so they will be included in data publication. Completed.
- Select a repository. This was a challenge. Nine of the files in the article are larger than 250 MB which ruled FigShare out. I also looked into Dryad which is an option because the data corresponds to a peer-reviewed paper. But Dryad has data publication fees which become large when repositories get larger than 10 GB which will happen for the other RAPID datasets that I plan to share. Finally I looked at Zenodo which allows for free publication and accepts large files (up to 2 GB). This is sufficient for the dataset used here. I also contacted Zenodo to ask to waive the 2 GB limit for some of my future uploads and received an encouraging answer. So Zenodo was selected here. Completed.
- Select a license. Going with CC BY to share as widely as possible while retaining authorship and benefiting from potential citations. Completed.
- Check that co-authors agree on data sharing. An email was sent. All authors agreed. Completed.
- Write description for files. This turned out to be much more of a lengthy process than I had expected. The description was divided in several sections: corresponding publication, time format, data sources, software used, study domain, description of files (file type, units, what's inside, how it is sorted, time range, how the values where computed, how this file was prepared), known bugs, funding (because Zenodo doesn't include a box for non-EU funding). Completed.
- Share files. The Zenodo repository was started and all files uploaded. It would have been good to know that the metadata (i.e. description) could be modified after submission (but not the files). I had submission fear... http://dx.doi.org/10.5281/zenodo.16565. Completed.