ARTIFICIAL INTELLIGENCE (AI) MACHINE LEARNING MODELLING OF THE SPATIAL DISTRIBUTION OF GROUNDWATER ARSENIC HAZARD AND ITS ATTRIBUTABLE HEALTH RISKS

  • Ruohan Wu

Student thesis: Phd

Abstract

Groundwater arsenic (As) is an important environmental hazard, hence, predicting its distribution is important to inform stakeholders. Geogenic arsenic contamination in groundwaters poses a severe health risk to hundreds of millions of people globally. Epidemiological studies have demonstrated that long-term exposure to arsenic can lead to various skin diseases, internal (lung, liver, bladder) cancers, and cardiovascular diseases. Cardiovascular diseases (CVD) have been recognized as the most serious non-carcinogenic detrimental health outcome arising from chronic exposure to arsenic. Although there is an increasing number of artificial intelligence/machine learning models of arsenic in groundwater, generally, the cut-off-selected criteria are not designed to optimise cost-benefits of mitigation of groundwater arsenic hazards from economic perspective. The major aim of this study was to utilise artificial intelligence (AI) / machine learning modelling to predict the distribution of groundwater arsenic, in part to estimate groundwater arsenic-attributable health risks. Secondary aims were to showcase the utility of various novel aspects of such modelling, notably (i) the modification of existing methodologies to generate what are termed here "pseudo-contour" maps of hazard distribution; (ii) the relative merits of objective AI approaches vs hybrid approaches involving the use of expert selected; and (iii) the benefits of using of objective cost-optimized cut-off criteria, an approach that hitherto has not been reported in published AI/machine learning models of groundwater arsenic distribution. The study areas in this PhD study are India and Uruguay where the utilization of high arsenic hazard groundwater as drinking water and has posed, and still dose pose, a serious threat to public health. The previously well-known high arsenic groundwater in India have been documented in the alluvial sediments along the Ganges and Brahmaputra plains, and high groundwater arsenic occurrences in Uruguay have been documented in the Raigon and Mercedes aquifers. Aside from some of well-known high groundwater arsenic areas of India and Uruguay, the complete distribution of arsenic in groundwater is not comprehensively understood through the countries, and groundwater arsenic attributable health risks also need to be estimated. Firstly, logistic regression models were used to generate hazard and risk maps of groundwater arsenic across Gujarat State, India. A pseudo-contour map of groundwater arsenic concentrations maps greater arsenic hazard (10 ug/L) in the Kachchh District and Banas Kantha District. The total number of people consume groundwater arsenic exceeding 10 ug/L is estimated to be around 49,000. Using simple previously published dose-response relationships, this is estimated to have given rise to 700 (prevalence) cases of skin cancer and around 10 cases of premature avoidable mortality/ annum from internal (lung, liver, bladder) cancers - that latter value is on the order of just 0.001% of internal cancers in Gujarat, reflecting the relative low groundwater arsenic hazard in Gujarat State. Secondly, the distribution of groundwater arsenic in Uruguay modelled by a variety of machine learning, basic expert systems, and hybrid approaches is presented. A pure random forest approach gave rise to a groundwater arsenic distribution model with a very high degree of accuracy, which is consistent with known high groundwater arsenic hazard areas. Hybrid approach separating the country into sedimentary/crystalline and shallow/deep aquifer domains resulted in slight material improvement in a high arsenic hazard distribution and improved accuracy. Hybrid machine learning models with expert selection of important environmental parameters may sometimes be a better choice than pure machine learning models, particularly where there are incomplete datasets, but perhaps, counterintuitively, this is not always the case. Thirdly, using a novel pseudo-contou
Date of Award1 Aug 2023
Original languageEnglish
Awarding Institution
  • The University of Manchester
SupervisorDavid Polya (Supervisor)

Cite this

'