Portable workflow and tool descriptions with the CWL

Peter Amstutz, Nebojša Tijanić, Stian Soiland-Reyes, John Kern, Luka Stojanovic, Tim Pierce, John Chilton, Maxim Mikheev, Samuel Lampa, Hervé Ménager, Scott Frazer, Venkat Sai Malladi, Michael R. Crusoe

    Research output: Contribution to conferenceAbstractpeer-review

    129 Downloads (Pure)

    Abstract

    Bioinformatics workflow platforms provide provenance tracking, execution and data management, repeatability, and an environment for data exploration and visualization. Example F/OSS bioinformatics workflow platforms include Arvados, Galaxy, Mobyle, iPlant DiscoveryEnvironment, Apache Taverna and Yabi. Each one presently represent workflows using different vocabularies and formats, and adding new tools requires different procedures for each system.

    Neither the description of the workflows nor the descriptions of the tools that power them are usable outside of the platforms they were written for. This results in duplicated effort, reduced reusability, and impedes collaboration.

    Three engineers (Peter Amstutz, John Chilton, and Nebojsa Tijanic) from leading bioinformatics platform teams (Curoverse, Galaxy Team, and Seven Bridges Genomics) and a tool author (Michael R. Crusoe / khmer project) started working together at the BOSC 2014 Codefest with an initial focus on developing a portable means of representing, sharing and invoking command line tools which was then the basis for portable workflow descriptions. The group placed high value on re-using existing formats and ontologies; they governed themselves with a lazy consensus / do-ocracy approach.

    On March 31st, 2015 the group released their second draft of the Common Workflow Language specification. The serialized form is a YAML document that is validated by an Apache Avro schema and can be interpreted as an RDF graph using JSON-LD. The documents are also valid Wf4Ever 'wfdesc' descriptions after a simple transformation. Future drafts will include the use of the EDAM ontology to describe the tools enabling discovery via the ELIXIR tool registry.

    Seven Bridges Genomics, the Galaxy Project, and the organization behind Arvados (Curoverse) have started to implement support for the Common Workflow Language, with interest from other projects and organizations like Apache Taverna, BioDatomics and the Broad Institute. Developers on the Galaxy Team are exploring adding CWL tool description support with plans to add support for the CWL workflow descriptions. Tool authors and other community members will benefit as they will only have to describe their tool and workflow interfaces once. This will enable scientists, researchers and other analysts to share their workflows and pipelines in an interoperable and yet human readable manner.
    Original languageEnglish
    Number of pages1
    Publication statusPublished - 10 Jul 2015
    EventBioinformatics Open Source Conference - ISMB 2015, Dublin, Ireland
    Duration: 10 Jun 201511 Jun 2015
    Conference number: 2015
    http://www.open-bio.org/wiki/BOSC_2015

    Conference

    ConferenceBioinformatics Open Source Conference
    Abbreviated titleBOSC
    Country/TerritoryIreland
    CityDublin
    Period10/06/1511/06/15
    Internet address

    Keywords

    • workflow
    • Scientific workflows
    • bioinformatics
    • cwl

    Fingerprint

    Dive into the research topics of 'Portable workflow and tool descriptions with the CWL'. Together they form a unique fingerprint.

    Cite this