Workflow re-use and discovery in bioinformatics

Carole Goble, Antoon Goderis

Research output: ThesisDoctoral Thesis

317557 Downloads (Pure)


Scientists in many disciplines are increasingly faced with analysing a deluge of scientific data from sources scattered across the globe. Workflow techniques have the potential to become an important part of on-line experimentation as they allow scien-tists to describe and enact their experimental processes in a structured, repeatable andverifiable way.

Given the availability of scientist-friendly workflow editors, scientists are moving away from cutting and pasting data between Web pages in favour of producing automated workflows based on Web services.

An increasingly large pool of workflows is being shared and made available for re-use. The notion that these workflows and the experimental processes they represent are a useful, re-usable artifact in their own right is new. As a new phenomenon, scientific workflow re-use and discovery is not well understood and it is unclear whether and how it could be supported automatically.

The thesis analyses the workflow re-use and discovery process based on surveys, interviews and user experiments with scientists and scientific programmers from different disciplines. We also analyse the impact of using multiple models of computationon workflow re-use. In particular, we show how some models of computation are better re-usable than others.

Further, we capture and model scientist re-use and discovery behaviour when re-using data flow workflows from the bioinformatics domain. The result is a suite of human benchmarks of value to developers of workflow discovery techniques.

Finally, the benchmarks enable us to evaluate a range of existing service discov-ery based techniques and novel workflow-structure based discovery techniques. Thetechniques vary in the language they work over (natural language or a Semantic Web language) and the level of workflow detail they process. The evaluation shows that performance of the workflow discovery techniques swings substantially depending ont he task in question. This argues in favour of a multi-varied approach that combines multiple techniques.
Original languageEnglish
QualificationDoctor of Philosophy
Awarding Institution
  • Goble, Carole, Supervisor, External person
Award date31 Dec 2007
Publication statusPublished - 3 Oct 2008


Dive into the research topics of 'Workflow re-use and discovery in bioinformatics'. Together they form a unique fingerprint.

Cite this