Modeling Commuter’s Sociodemographic Characteristics to Predict Public Transport Usage Frequency by Applying Supervised Machine Learning Method

Open access


Predictive modeling is the key fundamental method to study passengers’ behavior in transportation research. One of the limited studied topic is modeling of public transport usage frequency, which can be used to estimate present and future demand and users’ trend toward public transport services. The artificial intelligence and machine learning methods are promising to be better substitute to statistical techniques. No doubt, traditionally been used econometrics models are better for causal relationship studies among variables, but they made rigid assumptions and unable to recognize the pattern in data. This paper aims to build a predictive model to solve passengers’ classification, and public transport usage frequency using socio-demographic survey data. The supervised machine learning algorithm, K-Nearest Neighbor (KNN) applied to build a predictive model, which is the better machine learning method for dealing with small datasets, because of its ability of having less parameter tuning. Survey data has been used to train and validate the model performance, which is able to predict public transport usage frequency of future users of public transport. This model can practically be used by public transport agencies and relevant government organizations to predict the public transport demand for new commuters before introducing any new transportation projects.

If the inline PDF is not rendering correctly, you can download the PDF file here.

  • Agard B. Morency C. and Trépanier M. (2006). Mining public transport user behaviour from smart card data. In: 12th IFAC Symposium on Information Control Problems in Manufacturing – INCOM 2006 Saint-Etienne France May 17–19.

  • Badoe D. A. and Yendeti M. K. (2007). Impact of transit-pass ownership on daily number of trips made by urban public transit. J. Urban Plan. Dev. 133(4): 242–249.

  • Bagchi M. and White P. R. (2005). The potential of public transport smart card data. Transport Policy 12 pp. 464-474.

  • Baig F. Rana I. A. and Talpur M. A. H. (2019). ‘Determining Factors Influencing Residents’ Satisfaction Regarding Urban Livability in Pakistan’ International Journal of Community Well-Being. doi: 10.1007/s42413-019-00026-w.

  • Bliss L. (2017). “What’s Behind Declining Transit Ridership Nationwide?” CityLab. February 24 2017. Accessed May 17 2019.

  • Buehler R. Lukacs K. and Zimmerman M. (2015). Regional Coordination in Public Transportation: Lessons from Germany Austria and Switzerland. Final Report VT 2103-04. Virginia Tech Urban Affairs and Planning. Accessed June 8 2019.

  • Cervero R. (2002). Built environments and mode choice: toward a normative framework. Transportation Research Part D Transport and Environment (7): 265-284.

  • Ermagun A. Rashidi T. H. and Lari Z. A. (2015). Mode Choice for School Trips Long-Term Planning and Impact of Modal Specification on Policy Assessments. Journal of the Transportation Research Board 97-105.

  • Farber A. Bartholomew K. Li X. Paez A. and Habib K. M. N. (2014). Social equity in distance based transit fares using a model of travel behavior. Transp. Res. Part A. Policy Pract. 67: 297–303.

  • Fellesson M. and Friman M. (2008). “Perceived Satisfaction with Public Transport Service in Nine European Cities.” Journal of Transportation Research Forum 47(3): 93-103.

  • Fujii S. and Kitamura R. (2003). “What does a one-month free bus ticket do to habitual drivers? An experimental analysis of habit and attitudes change.” Transportation 30(1): 81-95.

  • Friedman J. H. Baskett F. and Shustek L. J. (1975). An algorithm for finding nearest neighbor. IEEE TRANSACTIONS ON COMPUTERS 1000-1006.

  • Government of Sindh. (2019). Transport and Mass Transit Department. Available at: Accessed May 17 2019.

  • Habib K. H. and Hasnine S. (2019). An econometric investigation of the influence of transit passes on transit users’ behavior in Toronto Public Transport 11: 111–133.

  • Haibo L. H. and Chena X. (2016). Unifying Time Reference of Smart Card Data Using Dynamic Time Warping. Procedia Engineering 137: 513 – 522.

  • Hand D. Mannila M. and Smyth P. (2001). Principles of Data Mining. United States of America: The MIT Press.

  • Ho J. K. (2015) ‘A review of the notions of quality of life (QOL) and livability based on ackovian systems thinking’ American Research Thoughts 1(11) pp. 2513–2532.

  • Imaz A. Habib K. Shalaby A. and Idris A. (2015). “Investigating the factors affecting transit user loyalty.” Public Transport 7(1): 39-60.

  • Imran M. and Low N. (2003). Time to change the old paradigm: Promoting sustainable urban transport in Lahore Pakistan. World Transport Policy & Practice 9(1): 32-39.

  • Imran M. and Low N. (2007). Institutional technical and discursive path dependence in transport planning in Pakistan. International Development Planning Review 29(3): 319-352.

  • Imran M. (2009). Public Transport in Pakistan: A Critical Overview. Journal of Public Transportation 12(2): 53-83.

  • Jahangiri A. and Rakha H. A. (2015). Applying Machine Learning Techniques to Transportation Mode Recognition Using Mobile Phone Sensor Data. IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS 1-12.

  • James G. Witten D. Hastie T. and Tibshirani R. (2013). An Introduction to Statistical Learning with Application in R. New York: Springer.

  • Kohavi R. (1995). A Study of Cross Validation and Bootstrap for Accuracy Estimation and Model Selection. International Joint Conference on Artificial Intelligence. Stanford.

  • Lai W. and Chen C. (2011). “Behavioral intentions of public transit passengers — The role of service quality perceived value satisfaction and involvement.” Transport Policy 18(2): 318–325.

  • Lei M. and Mac L. (2005). “Service Quality and Customer Loyalty in a Chinese Context: Does Frequency of Usage Matter?” ANZMAC 2005 Conference: Services Marketing 138-145.

  • Levinson D. (2017). “On the Predictability of the Decline of Transit Ridership in the US.” Transportist. March 20 2017. Accessed May 17 2019.

  • Maheswari J. P. (2018). Towards Data Science. Accessed May 17 2019

  • Munzinga M. A. and Palma C. (2012). Estimation of a disaggregate multimodal public transport Origin-Destination matrix from passive smartcard data from Santiago de Chile Transportation Research Part C 24: 9-18.

  • National Transport Authority. (2016). Transport for Dublin: Investment Projects. Accessed June 4 2019.

  • Orcutt J. (2017). “Why Public Transit Ridership Is Down In Most U.S. Cities.” Here & Now WBUR. March 21 2017. May 17 2019.

  • Pakistan Bureau of Statistics. Goverment of Pakistan. (2017). Population of major cities census - 2017 population top 10 cities. Available at:

  • Park J. Y. and Kim D. J. (2008). The Potential of Using the Smart Card Data to Define the Use of Public Transit in Seoul. Transportation Research Record: Journal of the Transportation Research Board No. 2063 Transportation Research Board of the National Academies Washington DC pp. 3-9.

  • Ross K. N. (2005) Sample design for educational survey research. Module 3 Quantitative research methods in educational Planning. Module 3. UNESCO International Institute for Educational Planning. Available at:

  • Schiefelbusch M. and Dienel (Eds.) H. L. (2009). Public Transport and its Users: The Passenger’s Perspective in Planning and Customer Care. London: Routledge.

  • Seaborn C. Attanucci J. Wilson N. H. M. (2009). Using Smart Card Fare Payment Data to Analyze Multi-Modal Public Transport Journeys in London. Transportation Research Record: Journal of the Transportation Research Board 2121: 55-62.

  • Singh S. (2005). Review of urban transportation in India. Journal of Public Transportation 8(1): 79-97.

  • Sug H. (2012). Applying Randomness Effectively Based on Random Forests for Classification Task of Datasets of Insufficient Information. Journal of Applied Mathematics 1-13.

  • Talpur M. A. H. (2017). ‘Energy Crisis and Household’s Perception about Solar Energy Acceptance: District Hyderabad Pakistan’ SINDH UNIVERSITY RESEARCH JOURNAL (SCIENCE SERIES) 49(3) pp. 601–604.

  • Tao S. Rohde D. and Corcoran J. (2014). Examining the spatial–temporal dynamics of bus passenger travel behaviour using smart card data and the flow-comap. J. Transp. Geogr. 41 21–36.

  • Tao S. Corcoran J. Hickman M. and Stimson R. (2016). The influence of weather on local geographical patterns of bus usage. Journal of Transport Geography 54: 66-80.

  • Thomson M. (1977). Great cities and their traffic. Middlesex: Penguin Books Ltd.

  • Tiwari G. (2002). Urban transport priorities: Meeting the challenge of socio-economic diversity in cities a case study of Delhi India. Cities 19(2): 95-103.

  • Transport for London. (2016). “Improvements & projects.” Transport for London. Accessed June 5 2019.

  • Trépanier M. Morency C. (2010). Assessing transit loyalty with smart card data. In: Presented at the 12th World Conference on Transport Research Lisbon Paper No. 2341.

  • Tsai T. H. Lee C. K. & Wei C. H. (2009). Neural network based temporal feature models for short-term railway passenger demand forecasting. Expert Systems with Applications (36) 3728–3736.

  • Utsunomiya M. Attanucci J. and Wilson N. (2006). Potential Uses of Transit Smart Card Registration and Transaction Data to Improve Transit Planning Transportation Research Record: Journal of the Transportation Research Board No. 1971 Transportation Research Board of the National Academies Washington DC pp. 119–126.

  • Vicente P. and Reis E. (2018). Ex-regular Users of Public Transport: Their Reasons for Leaving and Returning. Journal of Public Transportation 21(2): 101-116.

  • Zhang Y. and Ling C. (2018). A strategy to apply machine learning to small datasets in materials science. npj Computational Materials 4:25

Journal information
All Time Past Year Past 30 Days
Abstract Views 0 0 0
Full Text Views 257 257 22
PDF Downloads 205 205 25