Projects per year
Abstract
Text mining (TM) methods have been used extensively to extract relations and events from the literature. In addition, TM techniques have been used to extract various types or dimensions of interpretative information, known as Meta-Knowledge (MK), from the context of relations and events, e.g. negation, speculation, certainty and knowledge type. However, most existing methods have focussed on the extraction of individual dimensions of MK, without investigating how they can be combined to obtain even richer contextual information. In this paper, we describe a novel, supervised method to extract new MK dimensions that encode Research Hypotheses (an author’s intended knowledge gain) and New Knowledge (an author’s findings). The method incorporates various features, including a combination of simple MK dimensions.
Methods
We identify previously explored dimensions and then use a random forest to combine these with linguistic features into a classification model. To facilitate evaluation of the model, we have enriched two existing corpora annotated with relations and events, i.e., a subset of the GENIA-MK corpus and the EU-ADR corpus, by adding attributes to encode whether each relation or event corresponds to Research Hypothesis or New Knowledge. In the GENIA-MK corpus, these new attributes complement simpler MK dimensions that had previously been annotated.
Results
We show that our approach is able to assign different types of MK dimensions to relations and events with a high degree of accuracy. Firstly, our method is able to improve upon the previously reported state of the art performance for an existing dimension, i.e., Knowledge Type. Secondly, we also demonstrate high F1-score in predicting the new dimensions of Research Hypothesis (GENIA: 0.914, EU-ADR 0.802) and New Knowledge (GENIA: 0.829, EU-ADR 0.836).
Conclusion
We have presented a novel approach for predicting New Knowledge and Research Hypothesis, which combines simple MK dimensions to achieve high F1-scores. The extraction of such information is valuable for a number of practical TM applications.
Original language | English |
---|---|
Article number | 46 |
Number of pages | 13 |
Journal | BMC Medical Informatics and Decision Making |
Volume | 18 |
Early online date | 25 Jun 2018 |
DOIs | |
Publication status | Published - Dec 2018 |
Keywords
- text mining
- meta-knowledge
- hypotheses
- new knowledge
Fingerprint
Dive into the research topics of 'Identification of research hypotheses and new knowledge from scientific literature'. Together they form a unique fingerprint.Projects
- 2 Finished
-
Manchester Molecular Pathology Innovation Centre (MMPathIC): Bridging the Gap Between Biomarker Discovery and Health and Wealth.
Freemont, A. (PI), Ananiadou, S. (CoI), Barton, A. (CoI), Black, G. (CoI), Bruce, I. (CoI), Buchan, I. (CoI), Byers, R. (CoI), Dive, C. (CoI), Goodacre, R. (CoI), Griffiths, C. (CoI), Hoyland, J. (CoI), Payne, K. (CoI), Radford, J. (CoI) & Whetton, A. (CoI)
1/10/15 → 31/03/21
Project: Research
-
Enriching Metabolic PATHwaY models with evidence from the literature (EMPATHY).
Ananiadou, S. (PI) & Kell, D. (CoI)
1/04/15 → 23/03/19
Project: Research
Datasets
-
Identification of research hypotheses and new knowledge from scientific literature
Shardlow, M. (Contributor), Batista-Navarro, R. T. (Contributor), Thompson, P. (Contributor), Raheel, N. (Contributor), Mcnaught, J. (Contributor) & Ananiadou, S. (Contributor), figshare , 25 Jun 2018
DOI: 10.6084/m9.figshare.c.4145369.v1, https://figshare.com/collections/Identification_of_research_hypotheses_and_new_knowledge_from_scientific_literature/4145369/1
Dataset