Abstract
Background
How to treat a disease remains the commonest type of clinical question. Obtaining evidence-based answers from biomedical literature is difficult. Analogical reasoning with embeddings from deep learning (embedding analogies) may extract such biomedical facts, although the state-of-the-art focuses on pair-based proportional (pairwise) analogies like man:woman::king:queen (“queen = -man +king +woman”).
Objective
To systematically extract disease-treatment statements with a Semantic Deep Learning (SemDeep) approach underpinned by prior knowledge and another type of four-term analogy (other than pairwise).
Methods
As preliminaries, we investigate CBOW embedding analogies in a common-English corpus with 5 lines of text, and observe a type of four-term analogy (not pairwise) applying the 3CosAdd formula and relating the semantic fields person and death: “dagger = -Romeo +die +died” (search query: -Romeo +die +died).
Our SemDeep approach works with pre-existing items of knowledge (what is known) to make inferences sanctioned by a four-term analogy (search query -x +z1 +z2) from CBOW and Skip-gram embeddings created with a PubMed Systematic Reviews subset (PMSB dataset).
Stage1: Knowledge acquisition (acquisition of domain-specific terms). Obtaining a set of terms, the candidate y, from embeddings using vector arithmetic. Some n-gram pairs from the cosine and validated with evidence (prior knowledge)
are the input for the 3cosAdd, seeking a type of four-term analogy relating the semantic fields disease and treatment.
Stage 2: Knowledge organisation (explicit conceptualisation of the meaning of terms). Identification of candidates sanctioned by the analogy belonging to the semantic field treatment, next mapping these candidates to UMLS
Metathesaurus concepts with MetaMap. A concept pair is a brief disease-treatment statement (biomedical fact).
Stage 3: Knowledge validation (validating statements). An evidence-based evaluation followed by human validation of the biomedical facts potentially useful for clinicians.
Results
We obtain 5352 n-gram pairs from 446 search queries applying the 3CosAdd. The micro-averaging performance of MetaMap for those candidate y belonging to the semantic field treatment is F measure=80.00% (precision=77.00%,
recall=83.25%). We develop an empirical heuristic with some predictive power for the clinical winners, i.e. search queries bringing candidates y with evidence of a therapeutic intent for target disease x. The search queries -asthma
+inhaled_corticosteroids +inhaled_corticosteroid and -epilepsy +valproate +antiepileptic_drug are clinical winners, finding eight evidence-based beneficial treatments.
Conclusions
Extracting treatments with therapeutic intent by analogical reasoning from embeddings (423K n-grams from the PMSB dataset) is an ambitious goal. Our SemDeep approach is knowledge-based, underpinned by embedding analogies
exploiting prior knowledge. The biomedical facts from embedding analogies (a four-term type not pairwise) are potentially useful for clinicians. The heuristic offers a practical way to find beneficial treatments for well-known diseases.
Learning from deep learning models does not need a massive amount of data. Embedding analogies are not limited to pairwise, hence, analogical reasoning with embeddings is underexploited.
How to treat a disease remains the commonest type of clinical question. Obtaining evidence-based answers from biomedical literature is difficult. Analogical reasoning with embeddings from deep learning (embedding analogies) may extract such biomedical facts, although the state-of-the-art focuses on pair-based proportional (pairwise) analogies like man:woman::king:queen (“queen = -man +king +woman”).
Objective
To systematically extract disease-treatment statements with a Semantic Deep Learning (SemDeep) approach underpinned by prior knowledge and another type of four-term analogy (other than pairwise).
Methods
As preliminaries, we investigate CBOW embedding analogies in a common-English corpus with 5 lines of text, and observe a type of four-term analogy (not pairwise) applying the 3CosAdd formula and relating the semantic fields person and death: “dagger = -Romeo +die +died” (search query: -Romeo +die +died).
Our SemDeep approach works with pre-existing items of knowledge (what is known) to make inferences sanctioned by a four-term analogy (search query -x +z1 +z2) from CBOW and Skip-gram embeddings created with a PubMed Systematic Reviews subset (PMSB dataset).
Stage1: Knowledge acquisition (acquisition of domain-specific terms). Obtaining a set of terms, the candidate y, from embeddings using vector arithmetic. Some n-gram pairs from the cosine and validated with evidence (prior knowledge)
are the input for the 3cosAdd, seeking a type of four-term analogy relating the semantic fields disease and treatment.
Stage 2: Knowledge organisation (explicit conceptualisation of the meaning of terms). Identification of candidates sanctioned by the analogy belonging to the semantic field treatment, next mapping these candidates to UMLS
Metathesaurus concepts with MetaMap. A concept pair is a brief disease-treatment statement (biomedical fact).
Stage 3: Knowledge validation (validating statements). An evidence-based evaluation followed by human validation of the biomedical facts potentially useful for clinicians.
Results
We obtain 5352 n-gram pairs from 446 search queries applying the 3CosAdd. The micro-averaging performance of MetaMap for those candidate y belonging to the semantic field treatment is F measure=80.00% (precision=77.00%,
recall=83.25%). We develop an empirical heuristic with some predictive power for the clinical winners, i.e. search queries bringing candidates y with evidence of a therapeutic intent for target disease x. The search queries -asthma
+inhaled_corticosteroids +inhaled_corticosteroid and -epilepsy +valproate +antiepileptic_drug are clinical winners, finding eight evidence-based beneficial treatments.
Conclusions
Extracting treatments with therapeutic intent by analogical reasoning from embeddings (423K n-grams from the PMSB dataset) is an ambitious goal. Our SemDeep approach is knowledge-based, underpinned by embedding analogies
exploiting prior knowledge. The biomedical facts from embedding analogies (a four-term type not pairwise) are potentially useful for clinicians. The heuristic offers a practical way to find beneficial treatments for well-known diseases.
Learning from deep learning models does not need a massive amount of data. Embedding analogies are not limited to pairwise, hence, analogical reasoning with embeddings is underexploited.
Original language | English |
---|---|
Journal | JMIR medical informatics |
Volume | 8 |
Issue number | 8 |
DOIs | |
Publication status | Published - 6 Aug 2020 |
Research Beacons, Institutes and Platforms
- Cathie Marsh Institute