Document the provenance of the results

From Geoscience Paper of the Future
Revision as of 23:35, 12 March 2015 by Yolanda (Talk | contribs) (What To Do)

Jump to: navigation, search


What This Task Involves

The training session and training materials indicate how to:

  1. Capture the provenance of the results in a paper
  2. Develop a workflow sketch, a formal workflow, or a provenance record that represent to different degrees of accuracy what the provenance of the results is
  3. Publish the provenance and make it part of a publication

Training Materials

This training session was held on March 6, 2015:

Suggested Readings

  • "A Primer for the PROV Provenance Model." Yolanda Gil, Simon Miles, Khalid Belhajjame, Helena Deus, Daniel Garijo, Graham Klyne, Paolo Missier, Stian Soiland-Reyes, and Stephan Zednik. Published as a W3C Working Group Note on 30 April 2013.
    • A brief and practical introduction to the PROV standard for provenance, showing examples of how to represent the provenance record in RDF through a simple notation called Turtle

What To Do

We described many options in the training. Here is a sketch of the most common approach:

  1. At the very minimum, describe the workflow in the text (a "Methods" section) or in an appendix
    • Mention the datasets used, the software, and the data flow across the software components
    • Specify unique identifiers for data and software, mention the version used, credit all the sources
  2. Develop a workflow sketch and show it in a figure or in an appendix
    • Capture high-level dataflow across components
  3. To really capture the full provenance, specify the formal workflow or provenance record
    • The formal workflow shows all data flow across components, corresponding to the detailed command line invocations and parameter values used
    • Options:
      1. Describe it as a graph where the nodes are computations and the links show data and parameters
      2. Use the PROV provenance standard (start with a result and trace back how it was generated)
      3. Use a workflow system (e.g. WINGS to create the data flow graph
    • Publish the formal workflow or provenance record, and assign a unique identifier
      • Cite it in the paper
      • Show the provenance graph