Estimating monthly PM2.5 concentrations from satellite remote sensing data, meteorological variables, and land use data using ensemble statistical modeling and a random forest approach

Chu Chih Chen, Yin Ru Wang, Hung Yi Yeh, Tang Huang Lin, Chun Sheng Huang, Chang Fu Wu

Research output: Contribution to journalArticlepeer-review

32 Scopus citations

Abstract

Fine particulate matter (PM2.5) is associated with various adverse health outcomes and poses serious concerns for public health. However, ground monitoring stations for PM2.5 measurements are mostly installed in population-dense or urban areas. Thus, satellite retrieved aerosol optical depth (AOD) data, which provide spatial and temporal surrogates of exposure, have become an important tool for PM2.5 estimates in a study area. In this study, we used AOD estimates of surface PM2.5 together with meteorological and land use variables to estimate monthly PM2.5 concentrations at a spatial resolution of 3 km2 over Taiwan Island from 2015 to 2019. An ensemble two-stage estimation procedure was proposed, with a generalized additive model (GAM) for temporal-trend removal in the first stage and a random forest model used to assess residual spatiotemporal variations in the second stage. We obtained a model-fitting R2 of 0.98 with a root mean square error (RMSE) of 1.40 μg/m3. The leave-one-out cross-validation (LOOCV) R2 with seasonal stratification was 0.82, and the RMSE was 3.85 μg/m3, whereas the R2 and RMSE obtained by using the pure random forest approach produced R2 and RMSE values of 0.74 and 4.60 μg/m3, respectively. The results indicated that the ensemble modeling approach had a higher predictive ability than the pure machine learning method and could provide reliable PM2.5 estimates over the entire island, which has complex terrain in terms of land use and topography.

Original languageEnglish
Article number118159
JournalEnvironmental Pollution
Volume291
DOIs
StatePublished - 15 Dec 2021

Keywords

  • Aerosol optical depth
  • Generalized additive model
  • Inverse distance weighting
  • Land use regression
  • Leave-one-out cross-validation

Fingerprint

Dive into the research topics of 'Estimating monthly PM2.5 concentrations from satellite remote sensing data, meteorological variables, and land use data using ensemble statistical modeling and a random forest approach'. Together they form a unique fingerprint.

Cite this