The aim of the study was to evaluate the possibility of applying different methods of data mining to model the inflow of sewage into the municipal sewage treatment plant. Prediction models were elaborated using methods of support vector machines (SVM), random forests (RF), k-nearest neighbour (k-NN) and of Kernel regression (K). Data consisted of the time series of daily rainfalls, water level measurements in the clarified sewage recipient and the wastewater inflow into the Rzeszow city plant. Results indicate that the best models with one input delayed by 1 day were obtained using the k-NN method while the worst with the K method. For the models with two input variables and one explanatory one the smallest errors were obtained if model inputs were sewage inflow and rainfall data delayed by 1 day and the best fit is provided using RF method while the worst with the K method. In the case of models with three inputs and two explanatory variables, the best results were reported for the SVM and the worst for the K method. In the most of the modelling runs the smallest prediction errors are obtained using the SVM method and the biggest ones with the K method. In the case of the simplest model with one input delayed by 1 day the best results are provided using k-NN method and by the models with two inputs in two modelling runs the RF method appeared as the best.
Abhart, R.J. & See L. (2002). Multi-model data fusion for river flow forecasting: an evaluation of six alternative methods based on two contrasting catchments, Hydrology and Earth System Sciences, 6, 4, pp. 655–670.
Abyaneh, H.Z. (2014). Evaluation of multivariate linear regression and artificial neural networks in prediction of water quality parameters, Journal of Environmental Health Science & Engineering, 12, 1, pp. 1–8.
Adamowski, J., Chan, H.F., Prasher, S.O. & Sharda, V.N. (2012). Comparison of multivariate adaptive regression splines with copuled wavelet transform artificial neural networks for runoff forecasting in Himalayan micro – watersheds with limited data, Journal of Hydroinformatics, 14, 3, pp. 731–744.
Banasik, K., Krajewski, A., Sikorska, A. & Hejduk, L. (2014). Curve number estimation for a small urban catchment from recorded rainfall – runoff events, Archives of Environmental Protection, 40, 3, pp. 75–86.
Bartkiewicz, L. & Studziński, J. (2010). Mathematical modeling of the hydraulic load of communal wastewater networks, in: Modeling and Simulation 2010, G.K. Janssens, K. Ramakers, A. Caris, (eds), EUROSIS-ETI, Hasselt Belgium 2010, pp. 156–160.
Bartkiewicz, L., Szeląg, B. & Studziński, J. (2016). Impact assessment of input variables and ANN model structure on forecasting wastewater inflow into sewage treatment plants, Ochrona Środowiska, 38, 2, pp. 29–36. (in Polish)
Borowa, A., Brdyś, M.A. & Mazur, K. (2007). Modeling of wastewater treatment plant for monitoring and control purposes by state-space wavelet networks, International Journal of Computers, Communications & Control, 2, 2, pp. 121–131.
Box, G.E.P. & Jenkins, G.M. (1976). Time series analysis: Forecasting and control, Holden-Day, San Francisco 1976.
Breiman, L. (2000). Random forests. Journal Machine Learning, 45, 1, pp. 5–32.
Chuchro, M. (2009). Prediction of the sewage treatement plant inflow parameters, Akademia Górniczo-Hutnicza, Wydział Geologii, Geofizyki i Ochrony Środowiska, Kraków 2009. (in Polish)
Dellana, S.A. & West, D. (2009). Predictive modeling for wastewater applications: Linear and nonlinear approaches, Environmental Modelling and Software, 24, 1, pp. 96–106.
El-Din A.G. & Smith D.W. (2002). Modelling approach for high flow rate in wastewater treatment operation, Journal of Environmental Engineering and Science, 1, 4, pp. 275–291.
Fernandez, F.J., Seco, A., Ferrer, J. & Rodrigo, M.A. (2009). Use of neurofuzzy networks to improve wastewater flow-rate forecasting, Environmental Modelling and Software, 24, 6, pp. 686–693.
Friedman, J.H. (2001). Greedy function approximation: A gradient boosting machine, The Annals of Statistics, 29, 5, pp. 1189–1232.
Friedman, J.H. (2002). Stochastic gradient boosting, Computational Statistics and Data Analysis, 38, 4, pp. 367–378.
Han, H., Li, Y., Guo, Y. & Qiao, J. (2016). A soft computing method to predict sludge volume index based on a recurrent self-organizing neural network, Applied Soft Computing, 38, pp. 477–486.
Henze, M., Gujer, W., Mino, T. & Loosdrecht, M. (2000). Activated Sludge Models, IWA Publishing, London 2000.
IMGW. The daily time series of precipitation of the Airport Meteorological Station Rzeszów from the period 2005–2008.
Jonsdottir, H., Nielse, H.A., Madsen, H., Eliasson, J., Palsson, O.P. & Nielse, M.K. (2007). Conditional parametric models for storm sewer runoff, Water Resources Research, 43, 5, pp. 1–9.
Koza, J.R. (1992). Genetic Programming: On the Programming of Computers by Natural Selection. MIT Press, Cambridge 1992.
Kulczycki, P. (2005). Nuclear estimators in system analysis, WNT, Warszawa 2005.
Licznar, P. (2004). Rainfall erosivity prediction in Poland on the basis of monthly precipitation totals, Archives of Environmental Protection, 30, 4, pp. 29–39. (in Polish)
Nesmerak, I. & Blazkova, S.D. (2014). Analysis of the time series of waste water quality at the inflow of the wastewater treatment plant and transfer functions, Journal of Hydrology and Hydromechanics, 62, 1, pp. 55–59.
Piotrowski, A., Napiorkowski, J.J. & Rowiński, P.M. (2006). Flash-flood forecasting by means of neural networks and nearest neighbour approach – a comparative study, Nonlinear Processes Geophysics, 13, 4, pp. 443–448.
Piotrowski, A., Osuch M., Napiórkowski, M.J., Rowiński P.M. & Napiórkowski, J.J. (2014). Comparing large number of metaheurestics for artificial neural networks training to predict water temperature in a natural river, Computers & Geosciences, 64, pp. 136–151.
Simonoff, J.S. (1996). Smoothing Methods in Statistics, Springer Series in Statistics, New York 1996.
Szeląg, B. & Gawdzik, J. (2016). Application of selected methods of artificial intelligence to activated sludge settleability predictions, Polish Journal of Environmental Studies, 25, 4, pp. 1709–1714.
Wei, X. & Kusiak, A. (2015). Short-term prediction of influent flow in wastewater treatment plant, Stochastic Environmental Research and Risk Assessment, 29, 1, pp. 241–249.
Young, P.C. (2001). Data-based mechanistic modeling and validation of rainfall-flow processes, in: Model validation: perspectives in hydrological science, M.G. Anderson, P.D. Bates, (eds). Wiley 2001.
Rutkowski, L. (2006). Computational Intelligence: Methods and Techniques, PWN, Warszawa 2006. (in Polish)
Vapnik, V. (1998). Statistical Learning Theory, John Wiley and Sons, New York, 1998.