Revealing Drivers of Haze Pollution by Explainable Machine Learning

Linlu Hou, Qili Dai, Congbo Song, Bowen Liu, Fangzhou Guo, Tianjiao Dai, Linxuan Li, Baoshuang Liu, Xiaohui Bi, Yufen Zhang, Yinchang Feng

Research output: Contribution to journalArticlepeer-review


Many places on earth still suffer from a high level of atmospheric fine particulate matter (PM2.5) pollution. Formation of a particulate pollution event or haze episode (HE) involves many factors, including meteorology, emissions, and chemistry. Understanding the direct causes of and key drivers behind the HE is thus essential. Traditionally, this is done via chemical transport models. However, substantial uncertainties are introduced into the model estimation when there are significant changes in the emissions inventory due to interventions (e.g., the COVID-19 lockdown). Here we applied a Random Forest model coupled with a Shapley additive explanation algorithm, a post hoc explanation technique, to investigate the roles of major meteorological factors, primary emissions, and chemistry in five severe HEs that occurred before or during the COVID-19 lockdown in China. We discovered that, in addition to the high level of primary emissions, PM2.5 in these haze episodes was largely driven by meteorological effects (with average contributions of 30-65 μg m-3 for the five HEs), followed by chemistry (∼15-30 μg m-3). Photochemistry was likely the major pathway of formation of nitrate, while air humidity was the predominant factor in forming sulfate. Our results highlight that the machine learning driven by data has the potential to be a complementary tool in predicting and interpreting air pollution.

Original languageEnglish
Pages (from-to)112-119
Number of pages8
JournalEnvironmental Science and Technology Letters
Issue number2
Publication statusPublished - 8 Feb 2022


Dive into the research topics of 'Revealing Drivers of Haze Pollution by Explainable Machine Learning'. Together they form a unique fingerprint.

Cite this