O-114 Natural language processing as a tool for developing and updating job exposure matrices for chemical exposures in the general population

Ioannis Basinas, Paul Thompson, Qianqian Xie, Sophia Annaniadou, Calvin Ge, Eelco Kuijpers, Hakan Tinnerberg, Zara Ann Stockholm, Jorunn Kirkeleit, Karen S Galea, Bendik Brinchmann, Christine Cramer, Evana Amir Taher, Vivi Schlunssen, Martie van Tonger

Research output: Other contributionpeer-review


Workplaces are dynamic environments, in which temporal changes in conditions and exposures frequently occur. Such changes are rarely captured by existing Job Exposure Matrices (JEMs), which are typically developed using information available at a certain point in time. As such, they are unable to take into account potential future changes, which could negatively impact the reliability of JEMs when used outside their development period. Moreover, the process of developing JEMs for emerging or new exposure factors is a laborious, time-consuming process. Within the Exposome Project for Health and Occupational Research (EPHOR; https://www.ephor-project.eu/), we have been exploring the use of Natural Language Processing (NLP) as a vehicle for streamlining the update of existing JEMs and the development of new JEMs. Specifically, we will develop named entity recognition (NER) tools to automatically detect mentions of exposure-related concepts in literature, thus increasing the efficiency of locating relevant information for JEM update and development.Accordingly, we have developed a novel annotated corpus, i.e., 50 literature articles concerning workplace exposure to diesel exhaust, in which exposure assessment experts used guidelines to annotate all mentions of six different named entity categories (substance, occupation, industry/workplace, job task/activity, measurement device and sample type) occurring in the abstract, methods and results sections. The corpus will be used to train machine learning NER algorithms. Each article was annotated independently by two experts, and Inter-Annotator Agreement (IAA) scores were calculated to assess annotation quality. Exact matching scores (requiring agreement of semantic category and exact annotation span) ranged from 0.38 to 0.79 F1 for individual categories (average: 0.56). Relaxed matching scores (requiring agreement of category and partially overlapping spans) ranged from 0.63 to 0.87 F1 (average: 0.72). These results suggest that annotation quality is sufficient for machine learning. We will present the annotation scheme, guidelines and preliminary analysis of the results.
Original languageEnglish
Publication statusPublished - Mar 2023


Dive into the research topics of 'O-114 Natural language processing as a tool for developing and updating job exposure matrices for chemical exposures in the general population'. Together they form a unique fingerprint.

Cite this