Using explainable machine learning to better understand source and process contributions to atmospheric bio-aerosol

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review


The role of atmospheric bio-aerosols as determinants of environmental and human health outcomes is receiving more attention. However, a lack of fully evaluated end-to-end detection techniques hinders our understanding of identifying bioaerosol types and their environmental drivers, particularly in complex environments. In this study we mitigate these challenges through development of a novel machine learning framework that combines unsupervised deep learning and explainable machine learning techniques. The first step combines bidirectional long short-term memory autoencoder (Bilstm-AE) and a relatively new hierarchical, fast, clustering technique. Our results indicate that this approach outperforms other models, successfully distinguishing between fungal spores, non-biological aerosols, and pollen solely based on fluorescence information without the need for training data. Subsequently using automated machine learning and the SHapley Additive eXplanation (SHAP) method, we quantitatively discerned the environmental drivers of bioaerosol types. The variation of SHAP value indicated that the elevated pollen concentrations at night could be attributed to changes in its air mass composition and origins. More importantly, we find ambient evidence that pollen may break into smaller fragments when RH is over 90, leading to significant changes in its fluorescence spectrum and a rapid increase in its concentration. Overall we find that combining unsupervised deep learning and explainable machine learning could provide new insights into type-specific bioaerosols process.
Original languageUndefined
Title of host publicationEGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-16338,
Publication statusPublished - 14 Apr 2024

Cite this