TY - JOUR
T1 - Revealing the Detailed Lineage of Script Outputs Using Hybrid Provenance
AU - Zhang, Qian
AU - Cao, Yang
AU - Wang, Qiwen
AU - Vu, Duc
AU - Thavasimani, Priyaa
AU - Mcphillips, Timothy
AU - Missier, Paolo
AU - Slaughter, Peter
AU - Jones, Christopher
AU - Jones, Mathew B.
AU - Ludäscher, Bertram
PY - 2018/8/13
Y1 - 2018/8/13
N2 - We illustrate how combining retrospective and prospectiveprovenance can yield scientifically meaningfulhybrid provenance representations of the computational histories of data produced during a script run. Weuse scripts from multiple disciplines (astrophysics, climate science, biodiversity data curation, and social network analysis), implemented in Python, R, and MATLAB, to highlight the usefulness of diverse forms of retrospective provenance when coupled with prospective provenance. Users provide prospective provenance, i.e., the conceptual workflows latent in scripts, via simple YesWorkflow annotations, embedded as script comments. Runtime observables can be linked to prospective provenance via relational views and queries. These observables could be found hidden in filenames or folder structures, be recorded in log files, or they can be automatically captured using tools such as noWorkflow or the DataONE RunManagers. The YesWorkflow toolkit, example scripts, and demonstration code are available via an open source repository.
AB - We illustrate how combining retrospective and prospectiveprovenance can yield scientifically meaningfulhybrid provenance representations of the computational histories of data produced during a script run. Weuse scripts from multiple disciplines (astrophysics, climate science, biodiversity data curation, and social network analysis), implemented in Python, R, and MATLAB, to highlight the usefulness of diverse forms of retrospective provenance when coupled with prospective provenance. Users provide prospective provenance, i.e., the conceptual workflows latent in scripts, via simple YesWorkflow annotations, embedded as script comments. Runtime observables can be linked to prospective provenance via relational views and queries. These observables could be found hidden in filenames or folder structures, be recorded in log files, or they can be automatically captured using tools such as noWorkflow or the DataONE RunManagers. The YesWorkflow toolkit, example scripts, and demonstration code are available via an open source repository.
U2 - 10.2218/ijdc.v12i2.585
DO - 10.2218/ijdc.v12i2.585
M3 - Article
SN - 1746-8256
VL - 12
SP - 390
EP - 408
JO - International Journal of Digital Curation
JF - International Journal of Digital Curation
IS - 2
ER -