Many scientists are using workflows to systematically design and run computational experiments. Once the workflow is executed, the scientist may want to publish the dataset generated as a result, to be, e.g., reused by other scientists as input to their experiments. In doing so, the scientist needs to curate such dataset by specifying metadata information that describes it, e.g. its derivation history, origins and ownership. To assist the scientist in this task, we ex- plore in this paper the use of provenance traces collected by work- flow management systems when enacting workflows. Specifically, we identify the shortcomings of such raw provenance traces in sup- porting the data publishing task, and propose an approach whereby distilled, yet more informative, provenance traces that are fit for the data publishing task can be derived.
|Title of host publication||host publication|
|Publication status||Published - 2013|
|Event||International Workshop on Managing and Querying Provenance Data at Scale - |
Duration: 1 Jan 1824 → …
|Conference||International Workshop on Managing and Querying Provenance Data at Scale|
|Period||1/01/24 → …|