Conserving Philippine Biodiversity by Understanding Big Data: Integration and Analysis of Heterogeneous Information

  • Ananiadou, Sophia (Participant)
  • Nhung Nguyen (Participant)
  • Axel Soto (Participant)
  • Paul Thompson (Participant)

Impact: Awareness and understanding, Environmental, Policy, Technological


Biodiversity plays a central role in our daily lives, given its implications on ecological resilience, food security, species and subspecies endangerment and natural sustainability. Research in this domain has recently seen accelerated growth, leading to the “big data” scenario of the biodiversity literature. However, Information on Philippine biodiversity is largely fragmented due to the siloed formats in which different local institutions store their data. Equally, although scientific literature offers invaluable information that can fill in the knowledge gaps, its overwhelming volume and lack of structure can hinder thorough manual examination. Consequently, a comprehensive body of knowledge on Philippine biodiversity remains unavailable, hampering the timely formulation of environmental policies, and the discovery of new natural products that can potentially provide medicinal benefits.

The British Council-funded project Conserving Philippine Biodiversity by Understanding Big Data (COPIOUS) was collaboratively delivered by the University of Manchester’s National Centre for Text Mining (NaCTeM), the University of the Philippines and the Filipino Biodiversity Management Bureau to tackle the dilemma. It has advanced how Philippine biodiversity information is being collected and published, through the construction of an online knowledge repository by applying text mining-based big data analytics to biodiversity literature. The repository is a synergy of different types of information, e.g., taxonomic, occurrence, ecological, biomolecular, and biochemical, thus providing users with a comprehensive view of species of interest that allows them to carry out predictive analysis on species distributions, and investigate potential medicinal applications of natural products derived from Philippine species.

In constructing such a repository, several advanced text-mining technologies are applied to biodiversity documents. Most of these documents, e.g., legacy literature in the Biodiversity Heritage Library (BHL), have undergone optical character recognition (OCR) and thus contain a significant amount of noise. The research team, therefore, performed a rule-based approach for cleaning up the text. Active learning methods were then incorporated into Argo, a Web-based text mining workbench, to extract targeted named entities and relations from the documents. All extracted information was combined with structured information sourced from various Philippine biodiversity research groups, and stored in a database over which a search engine was built to facilitate knowledge discovery.

The repository facilitated by text mining, which can extract non-trivial patterns or knowledge from unstructured textual data in document collections, has successfully been applied to the biomedical literature and in the biodiversity domain to unlock knowledge hidden in the literature. It has supported the Philippine government’s efforts on conserving the country’s natural resources, which in turn can translate to benefits for the Philippine population, in terms of ecosystem resilience and access to alternative medicines. Training on the use of text mining infrastructure has been employed by the Leibniz Information Centre for Science and Technology University Library, and Peter the Great St. Petersburg Polytechnic University showed huge interest in indexing their document collections with semantic metadata using custom information extraction workflows developed using the Argo platform. This demonstrates the applicability of the work in COPIOUS on many other use cases, and its contribution to facilitating the collaborative study and discussion of legacy biodiversity documents by a worldwide community.

The COPIOUS project was featured in the University of Manchester’s Better World Showcase 2018, which celebrates the important contribution that the Faculty of Science and Engineering makes through research. The project was also shortlisted for the People’s Vote Award.
Category of impactAwareness and understanding, Environmental, Policy, Technological
Impact levelAdoption

Research Beacons, Institutes and Platforms

  • Biotechnology
  • Digital Futures
  • Manchester Institute of Biotechnology
  • Institute for Data Science and AI