TY - CONF
T1 - Extracting adverse drug reactions and their context using sequence labelling ensembles
AU - Milosevic, Nikola
AU - Nenadic, Goran
AU - Belousov, Maksim
AU - Dixon, William
PY - 2018/4
Y1 - 2018/4
N2 - Adverse drug reactions (ADR) present a challenge for drug development, drug administration that harms millions and kills more than a hundred thousand patients only in the United States. Despite the fact that vendors are bound by the law to report adverse drug reactions, they are not reported in a structured form, therefore it is hard for practitioners to retrieve, manage and appropriately use this information, which may prevent causing unwanted harm to patients and may improve patients’ quality of life significantly. Also, adverse drug reactions are important source of human phenotypic data and can be used to predict drug targets in personalized medicine.We present a set of taggers for extracting adverse drug reactions and related entities, including factors, severity, negations, drug class and animal from drug labels (provided as a part of ADR track on TAC2017). The system used a mix of rule-based, machine learning and deep learning methodologies in order to annotate the data. Initially, it parses the provided drug label by classifying text chunks into different categories: titles, tables, lists and text paragraphs. Then each text chunk is splitted into words (or tokens) and different type of token-level features are extracted. For conditional random fields (CRF) we utilised part-of-speech tags, grammatical relations (dependencies), vocabulary and semantic features (UMLS semantic types and GENIA named-entity tags). On the other hand, for bidirectional long short-term memory networks (BLSTM) we utilised word2vec word embeddings pre-trained on large text corpora from generic medical (Wikipedia+PMC+PubMed) and target (drug-labels) domains.For ensemble model, we propose the modification of Wolpert’s stacked generalisation that firstly trains the CRF classifier, using the previously described features, and then utilises its predicted probabilities for each class to build an additional token-level embeddings for the BLSTM classifier. We evaluated the system by participating on ADR track on Text Analytics Conference (TAC2017). The performance was measured on unseen data provided by NIST achieving F1-scores of 76.00%
AB - Adverse drug reactions (ADR) present a challenge for drug development, drug administration that harms millions and kills more than a hundred thousand patients only in the United States. Despite the fact that vendors are bound by the law to report adverse drug reactions, they are not reported in a structured form, therefore it is hard for practitioners to retrieve, manage and appropriately use this information, which may prevent causing unwanted harm to patients and may improve patients’ quality of life significantly. Also, adverse drug reactions are important source of human phenotypic data and can be used to predict drug targets in personalized medicine.We present a set of taggers for extracting adverse drug reactions and related entities, including factors, severity, negations, drug class and animal from drug labels (provided as a part of ADR track on TAC2017). The system used a mix of rule-based, machine learning and deep learning methodologies in order to annotate the data. Initially, it parses the provided drug label by classifying text chunks into different categories: titles, tables, lists and text paragraphs. Then each text chunk is splitted into words (or tokens) and different type of token-level features are extracted. For conditional random fields (CRF) we utilised part-of-speech tags, grammatical relations (dependencies), vocabulary and semantic features (UMLS semantic types and GENIA named-entity tags). On the other hand, for bidirectional long short-term memory networks (BLSTM) we utilised word2vec word embeddings pre-trained on large text corpora from generic medical (Wikipedia+PMC+PubMed) and target (drug-labels) domains.For ensemble model, we propose the modification of Wolpert’s stacked generalisation that firstly trains the CRF classifier, using the previously described features, and then utilises its predicted probabilities for each class to build an additional token-level embeddings for the BLSTM classifier. We evaluated the system by participating on ADR track on Text Analytics Conference (TAC2017). The performance was measured on unseen data provided by NIST achieving F1-scores of 76.00%
KW - text mining
KW - Adverse drug reaction
KW - natural language processing
KW - machine learning
UR - http://inspiratron.org/wp-content/uploads/2018/04/HealTAC_poster.pdf
M3 - Poster
T2 - UK Health Text Analytics Conference
Y2 - 18 April 2018 through 19 April 2018
ER -