High Performance Machine Learning Models of Large Scale Air Pollution Data in Urban Area

Preserving the air quality in urban areas is crucial for the health of the population as well as for the environment. The availability of large volumes of measurement data on the concentrations of air pollutants enables their analysis and modelling to establish trends and dependencies in order to forecast and prevent future pollution. This study proposes a new approach for modelling air pollutants data using the powerful machine learning method Random Forest (RF) and Auto-Regressive Integrated Moving Average (ARIMA) methodology. Initially, a RF model of the pollutant is built and analysed in relation to the meteorological variables. This model is then corrected through subsequent modelling of its residuals using the univariate ARIMA. The approach is demonstrated for hourly data on seven air pollutants (O₃, NOx, NO, NO₂, CO, SO₂, PM₁₀) in the town of Dimitrovgrad, Bulgaria over 9 years and 3 months. Six meteorological and three time variables are used as predictors. High-performance models are obtained explaining the data with R² = 90%-98%.

eISSN:: 1314-4081
Language:: English

Publication timeframe:: 4 times per year
Journal Subjects:: Computer Sciences, Information Technology

Journal RSS Feed

High Performance Machine Learning Models of Large Scale Air Pollution Data in Urban Area

Published Online: Dec 31, 2020

Page range: 49 - 60

Received: Sep 10, 2020

Accepted: Nov 04, 2020

DOI: https://doi.org/10.2478/cait-2020-0060

Keywords
Machine learning, Random Forest, Autoregressive integrated moving average, error correction, time series, forecasting

© 2020 Snezhana G. Gocheva-Ilieva et al., published by Sciendo

This work is licensed under the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.

High Performance Machine Learning Models of Large Scale Air Pollution Data in Urban Area

Published Online: Dec 31, 2020

Page range: 49 - 60

Received: Sep 10, 2020

Accepted: Nov 04, 2020

DOI: https://doi.org/10.2478/cait-2020-0060

KeywordsMachine learning, Random Forest, Autoregressive integrated moving average, error correction, time series, forecasting

© 2020 Snezhana G. Gocheva-Ilieva et al., published by Sciendo

This work is licensed under the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.

Keywords
Machine learning, Random Forest, Autoregressive integrated moving average, error correction, time series, forecasting