TY - JOUR
T1 - Predicting Heavy Metal Concentrations in Shallow Aquifer Systems Based on Low-Cost Physiochemical Parameters Using Machine Learning Techniques
AU - Huynh, Thi Minh Trang
AU - Ni, Chuen Fa
AU - Su, Yu Sheng
AU - Nguyen, Vo Chau Ngan
AU - Lee, I. Hsien
AU - Lin, Chi Ping
AU - Nguyen, Hoang Hiep
N1 - Publisher Copyright:
© 2022 by the authors.
PY - 2022/10
Y1 - 2022/10
N2 - Monitoring ex-situ water parameters, namely heavy metals, needs time and laboratory work for water sampling and analytical processes, which can retard the response to ongoing pollution events. Previous studies have successfully applied fast modeling techniques such as artificial intelligence algorithms to predict heavy metals. However, neither low-cost feature predictability nor explainability assessments have been considered in the modeling process. This study proposes a reliable and explainable framework to find an effective model and feature set to predict heavy metals in groundwater. The integrated assessment framework has four steps: model selection uncertainty, feature selection uncertainty, predictive uncertainty, and model interpretability. The results show that Random Forest is the most suitable model, and quick-measure parameters can be used as predictors for arsenic (As), iron (Fe), and manganese (Mn). Although the model performance is auspicious, it likely produces significant uncertainties. The findings also demonstrate that arsenic is related to nutrients and spatial distribution, while Fe and Mn are affected by spatial distribution and salinity. Some limitations and suggestions are also discussed to improve the prediction accuracy and interpretability.
AB - Monitoring ex-situ water parameters, namely heavy metals, needs time and laboratory work for water sampling and analytical processes, which can retard the response to ongoing pollution events. Previous studies have successfully applied fast modeling techniques such as artificial intelligence algorithms to predict heavy metals. However, neither low-cost feature predictability nor explainability assessments have been considered in the modeling process. This study proposes a reliable and explainable framework to find an effective model and feature set to predict heavy metals in groundwater. The integrated assessment framework has four steps: model selection uncertainty, feature selection uncertainty, predictive uncertainty, and model interpretability. The results show that Random Forest is the most suitable model, and quick-measure parameters can be used as predictors for arsenic (As), iron (Fe), and manganese (Mn). Although the model performance is auspicious, it likely produces significant uncertainties. The findings also demonstrate that arsenic is related to nutrients and spatial distribution, while Fe and Mn are affected by spatial distribution and salinity. Some limitations and suggestions are also discussed to improve the prediction accuracy and interpretability.
KW - Random Forest
KW - explainable artificial intelligence (XAI)
KW - groundwater quality
KW - heavy metals
KW - prediction intervals
UR - http://www.scopus.com/inward/record.url?scp=85139974732&partnerID=8YFLogxK
U2 - 10.3390/ijerph191912180
DO - 10.3390/ijerph191912180
M3 - 期刊論文
AN - SCOPUS:85139974732
SN - 1661-7827
VL - 19
JO - International Journal of Environmental Research and Public Health
JF - International Journal of Environmental Research and Public Health
IS - 19
M1 - 12180
ER -