Projects per year
Abstract
Over the past three years, we have been developing the Specimen Data Refinery (SDR) to automate the extraction of data from specimen images as part of the SYNTHESYS project (Walton et al. 2020). The SDR provides an easy to deploy, open source, web-based interface to multiple workflows that enable a user to create new or enhance existing natural history specimen records. The SDR uses the Galaxy workflow platform as the basis for managing data analysis, and where possible, using existing Galaxy community tools and approaches (Jalili et al. 2020, Hardisty et al. 2022). We have developed a library of domain-specific tools including semantic segmentation, optical character recognition, hand-written text recognition, barcode reading and natural language processing. These tools have been designed to work on standardised images of specimens, specifically herbarium sheets, pinned insects and microscope slides.
In this presentation, we provide our technical approach in developing the SDR, including the Galaxy workflow platform, application deployment, and tool interoperability, using FAIR digital objects (e.g., RO-Crates and openDigital Specimen objects (Soiland-Reyes et al. 2022, Addink and Hardisty 2020)). We present an evaluation of the tools, including segmentation, text recognition, and others, and the new challenges in using the resulting data from both a technical and social perspective.
In this presentation, we provide our technical approach in developing the SDR, including the Galaxy workflow platform, application deployment, and tool interoperability, using FAIR digital objects (e.g., RO-Crates and openDigital Specimen objects (Soiland-Reyes et al. 2022, Addink and Hardisty 2020)). We present an evaluation of the tools, including segmentation, text recognition, and others, and the new challenges in using the resulting data from both a technical and social perspective.
Original language | English |
---|---|
Article number | e93500 |
Journal | Biodiversity Information Science and Standards (BISS) |
Volume | 6 |
DOIs | |
Publication status | Published - 23 Aug 2022 |
Event | Biodiversity Information Standards (TDWG 2022): Stronger Together: Standards for linking biodiversity data - Sofia, Bulgaria Duration: 17 Oct 2022 → 21 Oct 2022 Conference number: 2022 https://www.tdwg.org/conferences/2022/ |
Fingerprint
Dive into the research topics of 'The Specimen Data Refinery: Using a scientific workflow approach for information extraction'. Together they form a unique fingerprint.Projects
- 1 Finished
-
H2020 SYNTHESYS+
Goble, C. (PI), Williams, A. (Researcher), Brack, P. (Researcher) & Soiland-Reyes, S. (Researcher)
1/02/19 → 31/01/23
Project: Research
-
Incrementally building FAIR Digital Objects with Specimen Data Refinery workflows
Woolland, O., Brack, P., Soiland-reyes, S., Scott, B. & Livermore, L., 12 Oct 2022, In: Research Ideas and Outcomes. 8, e94349.Research output: Contribution to journal › Conference article › peer-review
Open Access -
The Specimen Data Refinery: A Canonical Workflow Framework and FAIR Digital Object Approach to Speeding up Digital Mobilisation of Natural History Collections: A Canonical Workflow Framework and FAIR Digital Object Approach to Speeding up Digital Mobilisation of Natural History Collections
Hardisty, A., Brack, P., Goble, C., Livermore, L., Scott, B., Groom, Q., Owen, S. & Soiland-Reyes, S., 1 Apr 2022, In: Data Intelligence. 4, 2, p. 320-341 22 p.Research output: Contribution to journal › Article › peer-review
Open AccessFile92 Downloads (Pure) -
Landscape Analysis for the Specimen Data Refinery
Walton, S., Livermore, L., Bánki, O., Cubey, R., Drinkwater, R., Englund, M., Goble, C., Groom, Q., Kermorvant, C., Rey, I., Santos, C., Scott, B., Williams, A. & Wu, Z., 14 Aug 2020, In: Research Ideas and Outcomes. 6, 25 p., e57602.Research output: Contribution to journal › Article › peer-review
Open AccessFile240 Downloads (Pure)