Abstract
There are now numerous options available to achieve various tasks in bioinformatics, but, as yet, little progress has been made to capture the common practice by analysing usage and mentions of databases and tools within the literature. In this paper we analyse the variability and ambiguity of database and software name mentions and provide a set of 30 full-text documents manually annotated on the mention level. Our analyses show that identification of mentions of databases and tools is not a task that can be achieved through dictionary matching alone: our baseline dictionary look-up achieved a F-score of just over 50%. This is primarily because of high variability and ambiguity in database and software mentions contained within the literature and due to the extensive number of new resources introduced. We characterise the issues with various mention types and propose potential ways of capturing additional database and software mentions in the literature.
Original language | English |
---|---|
Title of host publication | SMBM 2012 - Proceedings of the 5th International Symposium on Semantic Mining in Biomedicine|SMBM - Proc. Int. Symp. Semantic Min. Biomed. |
Place of Publication | http://www.zora.uzh.ch/64476/ |
Pages | 2-9 |
Number of pages | 7 |
DOIs | |
Publication status | Published - 2012 |
Event | 5th International Symposium on Semantic Mining in Biomedicine, SMBM 2012 - Zurich Duration: 1 Jul 2012 → … http://https://www.escholar.manchester.ac.uk/uk-ac-man-scw:175435 |
Conference
Conference | 5th International Symposium on Semantic Mining in Biomedicine, SMBM 2012 |
---|---|
City | Zurich |
Period | 1/07/12 → … |
Internet address |