TY - GEN
T1 - Using explainable machine learning to better understand source and process contributions to atmospheric bio-aerosol
AU - Zhang, Hao
AU - Song, Congbo
AU - Topping, David
AU - Crawford, Ian
AU - Gallagher, Martin
AU - Chan, Man Nin
AU - Lee, Hing Bun martin
AU - Xing, Sinan
AU - Ng, Tsin Hung
AU - Tai, Amos
PY - 2024/4/14
Y1 - 2024/4/14
N2 - The role of atmospheric bio-aerosols as determinants of environmental and human health outcomes is receiving more attention. However, a lack of fully evaluated end-to-end detection techniques hinders our understanding of identifying bioaerosol types and their environmental drivers, particularly in complex environments. In this study we mitigate these challenges through development of a novel machine learning framework that combines unsupervised deep learning and explainable machine learning techniques. The first step combines bidirectional long short-term memory autoencoder (Bilstm-AE) and a relatively new hierarchical, fast, clustering technique. Our results indicate that this approach outperforms other models, successfully distinguishing between fungal spores, non-biological aerosols, and pollen solely based on fluorescence information without the need for training data. Subsequently using automated machine learning and the SHapley Additive eXplanation (SHAP) method, we quantitatively discerned the environmental drivers of bioaerosol types. The variation of SHAP value indicated that the elevated pollen concentrations at night could be attributed to changes in its air mass composition and origins. More importantly, we find ambient evidence that pollen may break into smaller fragments when RH is over 90, leading to significant changes in its fluorescence spectrum and a rapid increase in its concentration. Overall we find that combining unsupervised deep learning and explainable machine learning could provide new insights into type-specific bioaerosols process.
AB - The role of atmospheric bio-aerosols as determinants of environmental and human health outcomes is receiving more attention. However, a lack of fully evaluated end-to-end detection techniques hinders our understanding of identifying bioaerosol types and their environmental drivers, particularly in complex environments. In this study we mitigate these challenges through development of a novel machine learning framework that combines unsupervised deep learning and explainable machine learning techniques. The first step combines bidirectional long short-term memory autoencoder (Bilstm-AE) and a relatively new hierarchical, fast, clustering technique. Our results indicate that this approach outperforms other models, successfully distinguishing between fungal spores, non-biological aerosols, and pollen solely based on fluorescence information without the need for training data. Subsequently using automated machine learning and the SHapley Additive eXplanation (SHAP) method, we quantitatively discerned the environmental drivers of bioaerosol types. The variation of SHAP value indicated that the elevated pollen concentrations at night could be attributed to changes in its air mass composition and origins. More importantly, we find ambient evidence that pollen may break into smaller fragments when RH is over 90, leading to significant changes in its fluorescence spectrum and a rapid increase in its concentration. Overall we find that combining unsupervised deep learning and explainable machine learning could provide new insights into type-specific bioaerosols process.
U2 - 10.5194/egusphere-egu24-16338
DO - 10.5194/egusphere-egu24-16338
M3 - Conference contribution
BT - EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-16338,
ER -