Abstract
Despite the increasing availability of ontology-based semantic resources for biomedical content representation, large amounts of clinical data are in narrative form only. Therefore, many clinical information management tasks require to unlock this information using natural language processing (NLP). Clinical corpora annotated by humans are crucial resources. On the one hand, they are needed to train and domain-fine-tune language models with the goal to transform information from unstructured free text into an interoperable form. On the other hand, manually annotated corpora are indispensable for assessing the results of information extraction using NLP. Annotation quality is crucial. Therefore, detailed
annotation guidelines are needed to define the form that extracted information should take, to prevent human annotators from making erratic annotation decisions and to guarantee a good inter-annotator agreement. Our hypothesis is that, to this end, human annotations (and subsequently machine annotations
learned from human annotations) should (i) be based on ontological principles, and (ii) be consistent with existing clinical documentation standards. With the experience of several annotation projects, we highlight the need for sophisticated guidelines. We formulate a set of abstract principles on which such guidelines should be based, followed by examples of how to keep them, on the one hand, user-friendly and consistent, and on the other hand compatible with the international semantic standards SNOMED CT and FHIR, including their areas of overlap. We sketch the representation of the resulting representations in a
knowledge graph as a state-of-the-art semantic representation paradigm, which can be enriched by additional content on A-Box and T-Box levels and on which symbolic and neural reasoning tasks can be applied.
annotation guidelines are needed to define the form that extracted information should take, to prevent human annotators from making erratic annotation decisions and to guarantee a good inter-annotator agreement. Our hypothesis is that, to this end, human annotations (and subsequently machine annotations
learned from human annotations) should (i) be based on ontological principles, and (ii) be consistent with existing clinical documentation standards. With the experience of several annotation projects, we highlight the need for sophisticated guidelines. We formulate a set of abstract principles on which such guidelines should be based, followed by examples of how to keep them, on the one hand, user-friendly and consistent, and on the other hand compatible with the international semantic standards SNOMED CT and FHIR, including their areas of overlap. We sketch the representation of the resulting representations in a
knowledge graph as a state-of-the-art semantic representation paradigm, which can be enriched by additional content on A-Box and T-Box levels and on which symbolic and neural reasoning tasks can be applied.
| Original language | English |
|---|---|
| Title of host publication | Proceedings of the 14th International Conference on Biomedical Ontologies (ICBO 2023) |
| Editors | Fernanda Farinelli, Amanda Damasceno de Souza, Eduardo Ribeiro Felipe |
| Publisher | CEUR Workshop Proceedings |
| Pages | 36-47 |
| Number of pages | 12 |
| Publication status | Published - 2023 |
| Event | 14th International Conference on Biomedical Ontologies - Brasília, Brazil Duration: 28 Aug 2023 → 1 Sept 2023 |
Conference
| Conference | 14th International Conference on Biomedical Ontologies |
|---|---|
| Abbreviated title | ICBO 2023 |
| Country/Territory | Brazil |
| City | Brasília |
| Period | 28/08/23 → 1/09/23 |
Keywords
- formal ontologies
- clinical information models
- natural language processing
- text annotation guidelines
- electronic health records