Difference between revisions of "Make data accessible by Cedric David"
(Set PropertyValue: StartDate = 2015-02-21) |
|||
(20 intermediate revisions by the same user not shown) | |||
Line 2: | Line 2: | ||
<br/><b>Details on how to do this task:</b> [[Make data accessible]]<br/><br/> | <br/><b>Details on how to do this task:</b> [[Make data accessible]]<br/><br/> | ||
<!-- Add any wiki Text above this Line --> | <!-- Add any wiki Text above this Line --> | ||
+ | Selected input and output data files corresponding to the first RAPID peer-reviewed article are already available online on the [http://rapid-hub.org RAPID website]. However, not all the files needed to reproduce the experiments of the first publication are currently available. And I haven't made sure that all experiments can indeed be reproduced. <br/> | ||
+ | Several tasks therefore need be performed here: <br/> | ||
+ | - Narrow down a list of all necessary files to reproduce the results published. <b>Completed</b>. <br/> | ||
+ | - Make sure all experiments can be reproduced. One unforeseen issue is that one of the experiments was only performed in the first RAPID publication and never used since then. The experiment is still hard-coded in the source and is not currently an option in the most recent version of RAPID that has included a "namelist" since April 2011. Some code modifications would be necessary to check that the experiment can indeed be reproduced. Basic tests were run, it looks like these files can be reproduced so they will be included in data publication. <b>Completed</b>. <br/> | ||
+ | - Select a repository. This was a challenge. Nine of the files in the article are larger than 250 MB which ruled FigShare out. I also looked into Dryad which is an option because the data corresponds to a peer-reviewed paper. But Dryad has data publication fees which become large when repositories get larger than 10 GB which will happen for the other RAPID datasets that I plan to share. Finally I looked at Zenodo which allows for free publication and accepts large files (up to 2 GB). This is sufficient for the dataset used here. I also contacted Zenodo to ask to waive the 2 GB limit for some of my future uploads and received an encouraging answer. So Zenodo was selected here. <b>Completed</b>. <br/> | ||
+ | - Select a license. Going with CC BY to share as widely as possible while retaining authorship and benefiting from potential citations. <b>Completed</b>. <br/> | ||
+ | - Check that co-authors agree on data sharing. An email was sent. All authors agreed. <b>Completed</b>. <br/> | ||
+ | - Write description for files. This turned out to be much more of a lengthy process than I had expected. The description was divided in several sections: corresponding publication, time format, data sources, software used, study domain, description of files (file type, units, what's inside, how it is sorted, time range, how the values where computed, how this file was prepared), known bugs, funding (because Zenodo doesn't include a box for non-EU funding). <b>Completed</b>. <br/> | ||
+ | - Share files. The Zenodo repository was started and all files uploaded. It would have been good to know that the metadata (i.e. description) could be modified after submission (but not the files). I had submission fear... http://dx.doi.org/10.5281/zenodo.16565. Publication took 24-48 hours after submission. <b>Completed</b>. <br/> | ||
+ | |||
+ | |||
<!-- Do NOT Edit below this Line --> | <!-- Do NOT Edit below this Line --> | ||
{{#set: | {{#set: | ||
− | Progress= | + | Expertise=Open_science| |
+ | Expertise=Geosciences| | ||
+ | Owner=Cedric_David| | ||
+ | Progress=100| | ||
StartDate=2015-02-21| | StartDate=2015-02-21| | ||
+ | TargetDate=2015-03-06| | ||
Type=Low}} | Type=Low}} |
Latest revision as of 16:35, 4 April 2015
Details on how to do this task: Make data accessible
Selected input and output data files corresponding to the first RAPID peer-reviewed article are already available online on the RAPID website. However, not all the files needed to reproduce the experiments of the first publication are currently available. And I haven't made sure that all experiments can indeed be reproduced.
Several tasks therefore need be performed here:
- Narrow down a list of all necessary files to reproduce the results published. Completed.
- Make sure all experiments can be reproduced. One unforeseen issue is that one of the experiments was only performed in the first RAPID publication and never used since then. The experiment is still hard-coded in the source and is not currently an option in the most recent version of RAPID that has included a "namelist" since April 2011. Some code modifications would be necessary to check that the experiment can indeed be reproduced. Basic tests were run, it looks like these files can be reproduced so they will be included in data publication. Completed.
- Select a repository. This was a challenge. Nine of the files in the article are larger than 250 MB which ruled FigShare out. I also looked into Dryad which is an option because the data corresponds to a peer-reviewed paper. But Dryad has data publication fees which become large when repositories get larger than 10 GB which will happen for the other RAPID datasets that I plan to share. Finally I looked at Zenodo which allows for free publication and accepts large files (up to 2 GB). This is sufficient for the dataset used here. I also contacted Zenodo to ask to waive the 2 GB limit for some of my future uploads and received an encouraging answer. So Zenodo was selected here. Completed.
- Select a license. Going with CC BY to share as widely as possible while retaining authorship and benefiting from potential citations. Completed.
- Check that co-authors agree on data sharing. An email was sent. All authors agreed. Completed.
- Write description for files. This turned out to be much more of a lengthy process than I had expected. The description was divided in several sections: corresponding publication, time format, data sources, software used, study domain, description of files (file type, units, what's inside, how it is sorted, time range, how the values where computed, how this file was prepared), known bugs, funding (because Zenodo doesn't include a box for non-EU funding). Completed.
- Share files. The Zenodo repository was started and all files uploaded. It would have been good to know that the metadata (i.e. description) could be modified after submission (but not the files). I had submission fear... http://dx.doi.org/10.5281/zenodo.16565. Publication took 24-48 hours after submission. Completed.