Abstract
Most bio text-mining efforts so far have focused on identification of biological, molecular and chemical entities from the literature to support knowledge acquisition and discovery in the life sciences. There are also a growing number of bioinformatics services and tools available. This raises the challenging problem of semi-automated annotation, documentation and discovery of services suitable for a specific data analysis and/or integration into workflows. The first step in this process would be to build a controlled vocabulary to describe bioinformatics services, which can then be used for service retrieval and discovery. In this paper we present a methodology that combines lexical and contextual profiles of candidate terms to suggest terms for the bioinformatics vocabulary. The method achieved an estimated precision in the range 70-90% with recall between 20 and 90%. After processing the whole of BMC Bioinformatics, almost 80% of the top 300 terms were deemed as conceptual terms relevant for describing the major concepts in bioinformatics. In addition to this, the method has also extracted a number of service and tool names. The controlled vocabulary is freely available at: http://gnode1.mib.man.ac.uk/ bioinf/CV.
Original language | English |
---|---|
Title of host publication | 3rd International Symposium on Semantic Mining in Biomedicine, SMBM 2008 - Proceedings|Int. Symp. Semantic Min. Biomed., SMBM - Proc. |
Pages | 5-12 |
Number of pages | 7 |
Publication status | Published - 2008 |
Event | 3rd International Symposium on Semantic Mining in Biomedicine, SMBM 2008 - Turku Duration: 1 Jul 2008 → … http://mars.cs.utu.fi/smbm2008/files/smbm2008proceedings/smbmpaper_26.pdf |
Conference
Conference | 3rd International Symposium on Semantic Mining in Biomedicine, SMBM 2008 |
---|---|
City | Turku |
Period | 1/07/08 → … |
Internet address |