Fine-grained and efficient lineage querying of collection-based workflow provenance

Paolo Missier, Norman W. Paton, Khalid Belhajjame

    Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

    Abstract

    The management and querying of workflow provenance data underpins a collection of activities, including the analysis of workflow results, and the debugging of workflows or services. Such activities require efficient evaluation of lineage queries over potentially complex and voluminous provenance logs. Näive implementations of lineage queries navigate provenance logs by joining tables that represent the flow of data between connected processors invoked from workflows. In this paper we provide an approach to provenance querying that: (i) avoids joins over provenance logs by using information about the workflow definition to inform the construction of queries that directly target relevant lineage results; (ii) provides fine grained provenance querying, even for workflows that create and consume collections; and (iii) scales effectively to address complex workflows, workflows with large intermediate data sets, and queries over multiple workflows. Copyright 2010 ACM.
    Original languageEnglish
    Title of host publicationAdvances in Database Technology - EDBT 2010 - 13th International Conference on Extending Database Technology, Proceedings|Adv. Database Technol. - EDBT - Int. Conf. Extending Database Technol., Proc.
    PublisherAssocation for Computing Machinery
    Pages299-310
    Number of pages11
    ISBN (Print)9781605589459
    DOIs
    Publication statusPublished - 2010
    Event13th International Conference on Extending Database Technology: Advances in Database Technology - EDBT 2010 - Lausanne
    Duration: 1 Jul 2010 → …

    Publication series

    NameACM International Conference Proceeding Series

    Conference

    Conference13th International Conference on Extending Database Technology: Advances in Database Technology - EDBT 2010
    CityLausanne
    Period1/07/10 → …

    Fingerprint

    Dive into the research topics of 'Fine-grained and efficient lineage querying of collection-based workflow provenance'. Together they form a unique fingerprint.

    Cite this