Abstract
Currently, news articles are produced, shared and consumed at an extremely rapid rate. Although their quantity is increasing, at the same time, their quality and trustworthiness is becoming fuzzier. Hence, it is important not only to automate information extraction but also to quantify the certainty of this information. Automated identification of expressions that affect certainty has been studied both in
the scientific and newswire domains, but performance is considerably higher in tasks focusing on scientific text. We compare the differences in the definition and expression of uncertainty between a scientific domain, i.e., biomedicine, and newswire. We delve into the different aspects that affect the certainty of an
extracted event in a news article and examine whether they can be easily identified by techniques already validated in the biomedical domain. Finally, we present a comparison of the syntactic and lexical differences between the the expression of certainty in the biomedical and newswire domains, using two annotated
corpora.
the scientific and newswire domains, but performance is considerably higher in tasks focusing on scientific text. We compare the differences in the definition and expression of uncertainty between a scientific domain, i.e., biomedicine, and newswire. We delve into the different aspects that affect the certainty of an
extracted event in a news article and examine whether they can be easily identified by techniques already validated in the biomedical domain. Finally, we present a comparison of the syntactic and lexical differences between the the expression of certainty in the biomedical and newswire domains, using two annotated
corpora.
Original language | English |
---|---|
Title of host publication | Proceedings of the NAACL Workshop on Computational Semantics Beyond Events and Roles (SemBEaR) |
Pages | 6-20 |
Number of pages | 15 |
Publication status | Published - 5 Jun 2018 |