Projects per year
Abstract
Scientific workflows are used to describe series of structured activities and computations that arise in scientific problem-solving, providing scientists from virtually any discipline with a means to specify and enact their experiments. From a computational perspective, such experiments (workflows) can be defined as directed acyclic graphs where the nodes correspond to analysis operations, which can be supplied locally or by third party web services, and where the edges specify the flow of data between those operations.
Besides being useful to describe and execute computations, workflows also allow encoding of scientific methods and know-how. Hence they are valuable objects from a scholarly point of view, for several reasons: (i) to allow assessment of the reproducability of results; (ii) to be reused by the same or by a different scientist; (iii) to be repurposed for other goals than those for which it was originally built; (iv) to validate the method that led to a new scientific insight; (v) to serve as live-tutorials, exposing how to take advantage of existing data infrastructure, etc. This follows a trend that can be observed in disciplines such as Biology and Astronomy, with other types of objects, such as databases, increasingly becoming part of the research outcomes of an individual or a group, and hence also being shared, cited, reused, versioned, etc.
Challenges for the preservation of scientific workflows in data intensive science, include: (a) the consideration of complex digital objects that comprise both their static and dynamic aspects, including workflow models, the provenance of their executions, and interconnections between workflows and related resources, (b) the provision of access, manipulation, sharing, reuse and evolution functions to these complex digital objects, (c) integral lifecycle management functions for workflows and their associated materials.
Thus the use of workflow specifications on their own does not guarantee to support reusability, shareability, reproducibility, or better understanding of scientific methods. Workflow environment tools evolve across the years, or they may even disappear. The services and tools used by the workflow may change or evolve too. Finally, the data used by the workflow may be updated or no longer available. To overcome these issues, additional information may be needed. This includes annotations to describe the operations performed by the workflow; annotations to provide details like authors, versions, citations, etc.; links to other resources, such as the provenance of the results obtained by executing the workflow, datasets used as input, etc.. Such additional annotations enable a comprehensive view of the experiment, and encourage inspection of the different elements of that experiment, providing the scientist with a picture of the strengths and weaknesses of the digital experiment in relation to decay, adaptability, stability, etc.
These richly annotation objects are what we call workflow-centric Research Objects. The notion of Research Object (as discussed in [BECHHOFER11]) is a general idea that aims to extend traditional publication mechanisms and take us "beyond the pdf" [FORCE11]]. An RO is an aggregation of resources along with annotations on those resources. The aggregation itself may also be annotated, where by annotation, we mean the association of arbitrary additional information with a resource. The Research Object thus collects together relevant resources along with annotations that enable the understanding, reuse etc. of its constituent parts. In a workflow-centric Reearch Object describing an investigation for example, annotations could describe how data sources have been used or how intermediate results were derived. Executable papers [EXEC] similarly aim to support validation, exploration and interaction in publication in order to support validation. Hunter [HUNTER06] proposes the notion of Scientific Publication Packages (SPP) to describe "the selective encapsulation of raw data, derived products, algorithms, software and textual publications". SPPs are motivated primarily by the need to create archives for the variety of artifacts produced during the course of a scientific investigation; they ideally contain data, methods, software and documents, but also their provenance as well.
Besides being useful to describe and execute computations, workflows also allow encoding of scientific methods and know-how. Hence they are valuable objects from a scholarly point of view, for several reasons: (i) to allow assessment of the reproducability of results; (ii) to be reused by the same or by a different scientist; (iii) to be repurposed for other goals than those for which it was originally built; (iv) to validate the method that led to a new scientific insight; (v) to serve as live-tutorials, exposing how to take advantage of existing data infrastructure, etc. This follows a trend that can be observed in disciplines such as Biology and Astronomy, with other types of objects, such as databases, increasingly becoming part of the research outcomes of an individual or a group, and hence also being shared, cited, reused, versioned, etc.
Challenges for the preservation of scientific workflows in data intensive science, include: (a) the consideration of complex digital objects that comprise both their static and dynamic aspects, including workflow models, the provenance of their executions, and interconnections between workflows and related resources, (b) the provision of access, manipulation, sharing, reuse and evolution functions to these complex digital objects, (c) integral lifecycle management functions for workflows and their associated materials.
Thus the use of workflow specifications on their own does not guarantee to support reusability, shareability, reproducibility, or better understanding of scientific methods. Workflow environment tools evolve across the years, or they may even disappear. The services and tools used by the workflow may change or evolve too. Finally, the data used by the workflow may be updated or no longer available. To overcome these issues, additional information may be needed. This includes annotations to describe the operations performed by the workflow; annotations to provide details like authors, versions, citations, etc.; links to other resources, such as the provenance of the results obtained by executing the workflow, datasets used as input, etc.. Such additional annotations enable a comprehensive view of the experiment, and encourage inspection of the different elements of that experiment, providing the scientist with a picture of the strengths and weaknesses of the digital experiment in relation to decay, adaptability, stability, etc.
These richly annotation objects are what we call workflow-centric Research Objects. The notion of Research Object (as discussed in [BECHHOFER11]) is a general idea that aims to extend traditional publication mechanisms and take us "beyond the pdf" [FORCE11]]. An RO is an aggregation of resources along with annotations on those resources. The aggregation itself may also be annotated, where by annotation, we mean the association of arbitrary additional information with a resource. The Research Object thus collects together relevant resources along with annotations that enable the understanding, reuse etc. of its constituent parts. In a workflow-centric Reearch Object describing an investigation for example, annotations could describe how data sources have been used or how intermediate results were derived. Executable papers [EXEC] similarly aim to support validation, exploration and interaction in publication in order to support validation. Hunter [HUNTER06] proposes the notion of Scientific Publication Packages (SPP) to describe "the selective encapsulation of raw data, derived products, algorithms, software and textual publications". SPPs are motivated primarily by the need to create archives for the variety of artifacts produced during the course of a scientific investigation; they ideally contain data, methods, software and documents, but also their provenance as well.
Original language | English |
---|---|
Publisher | researchobject.org |
DOIs | |
Publication status | Published - 30 Nov 2013 |
Fingerprint
Dive into the research topics of 'Wf4Ever Research Object Model 1.0 (2013-11-30): Wf4Ever Specification 30 November 2013'. Together they form a unique fingerprint.Projects
- 1 Finished
-
WF4EVER: Wf4Ever - Advanced Workflow Preservation Technologies for Enhanced Science
Goble, C. (PI), Bechhofer, S. (CoI), Fernandes, A. (CoI), Soiland-Reyes, S. (Researcher) & Belhajjame, K. (Researcher)
1/12/10 → 30/11/13
Project: Research
-
Research Object Bundle 1.0: researchobject.org Specification 05 November 2014
Soiland-Reyes, S., Gamble, M. & Haines, R., 1 Nov 2014, researchobject.org. 9 p.Research output: Book/Report › Other report
Open Access -
Structuring research methods and data with the research object model: genomics workflows as a case study
Hettne, K., Dharuri, H., Zhao, J., Wolstencroft, K., Belhajjame, K., Soiland-Reyes, S., Mina, E., Thompson, M., Cruickshank, D., Verdes-Montenegro, L., Garrido, J., de Roure, D., Corcho, O., Klyne, G., van Schouwen, R., 't Hoen, P. A. C., Bechhofer, S., Goble, C. & Roos, M., 2014, In: Journal of Biomedical Semantics. 5, 1, 41 p.Research output: Contribution to journal › Article › peer-review
Open AccessFile673 Downloads (Pure) -
The Research Object Suite of Ontologies: Sharing and Exchanging Research Data and Methods on the Open Web
Belhajjame, K., Zhao, J., Garijo, D., Hettne, K., Palma, R., Corcho, O., Gomez-Perez, J. M., Bechhofer, S., Klyne, G. & Goble, C., 4 Feb 2014, In: Journal of Web Semantics.Research output: Contribution to journal › Article › peer-review
File