CWL+Research Object == Complete Provenance

Farah Zaib Khan, Stian Soiland-Reyes, Andrew Lonie, Richard Sinnott

Research output: Contribution to conferencePosterpeer-review

187 Downloads (Pure)

Abstract

The term Provenance is referred to as ‘The beginning of something’s existence; something’s origin’ Or ‘A record of ownership of a work of art or an antique, used as a guide to authenticity or quality’. Provenance tracking is crucial in scientific studies where workflows have emerged as an exemplar approach to mechanize data-intensive analyses. Gil et al. analyze challenges of scientific workflows and concluded that formally specified workflow helps
‘accelerate the rate of scientific process’ and facilitates others to reproduce the given experiment provided that provenance of end-to-end process at every level is captured.

We have implemented exemplar GATK variant calling workflow using three approaches to workflow definition namely Galaxy, CWL and Cpipe to identify assumptions implicit in these approaches. These assumptions lead to limited or no understanding of reproducibility requirements due to lack of documentation and comprehensive provenance tracking and resulted in identification of provenance information crucial for genomic workflows.

CWL provides a declarative approach to workflow declaration making minimal assumptions about precise software environment, base software dependencies, configuration settings, alteration of parameters and software versions. It aims to provide an open source extensible standard to build flexible and customized workflows including intricate details of every process. It facilitates capture of information by supporting declaration of requirements, `cwl:tool` and checksums etc. Currently, there is no mechanism to gather the produced information as a result of a workflow run into one bundle for future use. We propose to demonstrate the implementation of a module for CWL.
Original languageEnglish
Number of pages1
DOIs
Publication statusAccepted/In press - 14 Jun 2017
EventBioinformatics Open Source Conference (BOSC) 2017 - ISMB/ECCB 2017, Prague, Czech Republic
Duration: 22 Jul 201723 Jul 2017
http://www.open-bio.org/wiki/BOSC_2017

Conference

ConferenceBioinformatics Open Source Conference (BOSC) 2017
Abbreviated titleBOSC
Country/TerritoryCzech Republic
CityPrague
Period22/07/1723/07/17
Internet address

Keywords

  • research object
  • cwl
  • provenance
  • reproducibility

Fingerprint

Dive into the research topics of 'CWL+Research Object == Complete Provenance'. Together they form a unique fingerprint.

Cite this