Transport Simulation Model Calibration with Two-Step Cluster Analysis Procedure

The calibration results of transport simulation model depend on selected parameters and their values. The aim of the present paper is to calibrate a transport simulation model by a two-step cluster analysis procedure to improve the reliability of simulation model results. Two global parameters have been considered: headway and simulation step. Normal, uniform and exponential headway generation models have been selected for headway. Application of two-step cluster analysis procedure to the calibration procedure has allowed reducing time needed for simulation step and headway generation model value selection.


I. INTRODUCTION
The calibration of transport microscopic simulation model is the critical step in the transport flow analysis.Forecast accuracy and transport situation analysis depend on calibration model quality.The aim of transport simulation model calibration is to reduce the difference among the observed transport situation on the road, network and the simulated one.The Federal Highway Administration guidelines (Division of the United States Department of Transportation that specialises in highway transportation [21], [22]) describe the calibration process in the following steps: junction capacity calibration, route choice calibration and model calibration for overall performance (calibration of model insight parameters that depends on the used simulation software).
In [8] calibration technique is divided into four categories.The first category is the calibration of driving behaviour models.Driver's behaviour on the road can differ depending on the travel time (day, night, peak, off-peak periods), road conditions, cities, countries (for example, driver's behaviour in the United States of America is not the same as in Latvia).Also different vehicles types, various traffic rules, signal timing, traffic signs affect the driver's behaviour.Drivers are more aggressive on the road in the morning and evening peak hours than any other time of the day.On the other hand, drivers mostly drive more efficiently in the morning rush hours than any other time of the day [3].Therefore, to obtain data close to the observed one, it is necessary to evaluate default transport simulation model parameters which have influence on driver's behaviour.
The second category is the calibration of route choice model.Vehicles are assigned to the transport network in accordance with the selected algorithm, for example, the shortest path algorithm, initial K-shortest path, link penalty method, link elimination method [10].
The third category is calibration of the origindestination matrices (OD), in most cases it is the "initial" matrix calibration.Initial matrix is the matrix created from the observed traffic counts.The initial matrix calibration is the most time-consuming and complex process; it involves initial transport data collection, data pre-processing and data adjustment to fit traffic volumes to designed rotes [11].
The fourth category is the calibration of transport model parameters (model fine-tuning).There are commercial tools at macro, micro and mesoscopic levels from 1990 for transport model simulation.Last generation of transport simulation software provides advanced forecasting features, various situations, events analysis, as well as transport management and control systems (Aimsun, Vissim etc).Commercial tools provide various default parameters for simulation model calibration; parameters divide into global and local ones.Global parameters refer to the whole system behaviour and local parameters refer only to specific road section or point at the transport network, in turn.In most cases, default parameters do not reproduce the observed situation pattern, so for each transport simulation model it is necessary to perform additional calibration and validation steps.
To perform model calibration, the initial data sets are divided into the calibration set and validation set.Calibration set is used for simulation model calibration, and the validation set is used after the model is calibrated to ensure the transport simulation model results.Simulation model validation is performed by evaluating model effectiveness measures for simulated and observed data.There are a number of indicators that are used in the simulation model calibration and validation phase.In [16], validation methods are divided into qualitative and quantitative ones.Qualitative method includes the comparison of graphical results (for example, animation, histograms and series plots), comparison with other simulation models and research of simulation model behaviour.Quantitative method includes output comparison with statistical tools and comparison with other simulation models using statistical tests and procedures.For example, percent error, mean squared error, mean percent error (MPE), root mean square error, GEH statistic and Theil's coefficient of inequality.
The selection of indicators or parameters depends on complexity of analysed task, quantity and quality of initial input data.Several parameters are used to evaluate simulation model effectiveness parameters and to obtain more acceptable and reliable results.
Parameter calibration is considered an optimisation problem.The important task is to decide which simulation model parameters should be used for model calibration and validation.In [6] the methodology for parameter selection is proposed without consideration of these parameter values.
The aim of this research is to perform transport simulation model calibration with headway (normal, uniforms and exponential distribution) and simulation step parameters; to evaluate how these parameters influence simulation outputs.Root mean square error is used to compare the data of simulated and observed traffic flows based on [4].It is proposed to use the two-step cluster analysis procedure as part of calibration process for headway and simulation step value selection.Headway is the time interval between two vehicles arriving at the transport network.

II. TRANSPORT SIMULATION MODEL CALIBRATION PROCEDURE
The calibration procedure of transport simulation model contains the following main elements: 1) Event (task) identification.It identifies events that will be checked and analysed (for example, parameter selection for congested road sections will differ from parameters chosen for an event where a lot of drivers return from the concert home).Calibration parameter selection depends on the analysed event and available transport simulation tools.
2) Analysed time and date selection for an event.Time (rush hours, non-rush hours) and date (weekday, weekend) when initial data were collected influence the simulation model parameters and their value selection.
3) Transport network point (street, road, intersection, section and lane) selection that will be analysed.It is advisable to choose the points that describe the unique and stable situations that occur regularly and without an accidental situation (do not select points with bottleneck) in the research network, for example, congestions, road narrows and weaving.
4) Calibration and validation parameter selection that will describe the current choice of transport routes.5) Limitation of calibration and validation parameters.There is no need to choose all possible parameters for calibration.For example, when a large number of parameters are used and all values of these parameters are changed in one iteration, then it can lead to unexpected and unstable model results.In this case, it is very difficult to exactly evaluate parameters that had a negative impact.Sensitivity analysis can help avoid such a problem.Sensitivity analysis determines how much model results change if the input data in the model are changed.
The calibration process of transport simulation model consists of the following main steps: 1) Initial input data collection and data pre-processing.The use of unprocessed data may lead to unreliable and unacceptable results.
2) Initial input data division into two sets.One data set for transport simulation model calibration and another for transport simulation model validation.

4) Calibration and verification parameter identification, sensitivity analysis.
There are a lot of different procedures for transport simulation model calibration.Based on literature review [1], [2], [17], [18], the most commonly procedures have been considered within the research.The calibration procedure contains the following steps [1], [2], [17]: 1) Initial input data preparation.Research task identification, analysed transport network point selection, performance measure selection, input data collection and transport network development.
2) Initial transport simulation model evaluation.Transport simulation model evaluation is based on default calibration parameters and the first step of calibration procedure.At this step, it is necessary to evaluate the quality and reliability of the simulation model and answer the question "How close are outputs from the simulation model to the observed data?"If the simulation and observed data difference is more than 15 %, then go to the following step.
3) Initial model calibration.Transport simulation model calibration parameter identification and model run definition, calibration parameter value selection.Calibration parameter selection can be carried out by using heuristics [2], genetic algorithm [17], [1] or based on expert opinion and knowledge.
4) Feasibility studies.Access how selected calibration parameters affect simulation model outputs conducting statistical tests; and choose the calibration parameters and their values based on statistical tests results.
5) The evaluation of received transport simulation model results.Evaluation can be performed by using animation, effectiveness measures and statistical tests.
In [18] the calibration procedure is divided into three steps.The first step includes the model run with default calibration parameter values.This applies to the global parameters (for example, speed, acceleration and deceleration rates; parameters that influence the whole system) and local parameters (for example, section speed; parameters that influence specific system points).The author recommends performing ten model runs for performance measure evaluation.The first step is successful if the root mean square, R statistics and Thiel's statistics results are acceptable.The second step is the speed calibration.This step includes the analysis of simulated and observed speeds and verification for known bottleneck points on the transport network.Since the global parameters affect the whole simulation network, for bottleneck point calibration it is necessary to perform local parameter calibration for specific road points.The third step is objective-based calibration.This step is optional for transport model calibration and depends on a calibration purpose.For example, if it is necessary to evaluate transport management system, accident management system or ramp metering, then addition calibration should be performed to obtain acceptable results.
The following transport model calibration procedure has been used for the case study in Adazi city (Fig. 1).Calibration procedure includes all steps mentioned in previous works:

III. TRANSPORT SIMULATION MODEL PARAMETERS FOR CALIBRATION
Assessing the impact of calibration parameters on the transport simulation model results is one of the most complicated steps in understanding how each transport simulation model parameter influences the model.Some parameters have more influence than others.Parameter analysis shows different results in case of single or multiple parameters.The parameter may not affect the result itself, but with the interaction with others may influence significantly.
The first recommended step in parameter selection is to use default simulation tool parameters.This step is also called "One Factor at Time" in the Sensitivity Analysis [4].Another way to select parameters is to use genetic algorithms or heuristics [1], [2], [17].However, in practice the most common alternative for parameter selection is expert opinion, because other methods are time-consuming.
In the research, two global parameters for transport model calibration have been considered: headway and simulation step.Additionally, it is proposed to use the two-step cluster analysis procedure that can process continuous and categorical variables to reduce time for calibration parameter value selection and improve reliability of transport simulation model results.

A. Headway Models
Headway is the time interval between two vehicles that arrive at the transport network (Fig. 2) [12].Headway is defined for each vehicle in the model, firstly, vehicle arrives at the virtual queue and when there is space at the link, the vehicle enters the link in the transport network in the model.For the headway analysis, Aimsun transport modelling software was used [7].Transport simulation software Aimsun was selected based on [14].Headway time interval between two successive vehicles in the network is selected by assigning the global parameter "system arrival" in Aimsun software.The exponential, uniform, normal, constant, "as soon as possible" and external arrival can be selected in Aimsun.Transport in dynamic assignment is distributed based on selected system arrival parameters (headway) and also system checks whether the physical space is available at the entrance link.For the present research, three headway model distributions were chosen: exponential, uniform and normal based on [15].Examples of headway generation model distribution application are shown in Table I.
In case of exponential distribution, headway between two arriving vehicles at the network is sampled from exponential distribution.Exponential distribution is widely used in transport modelling, for example, in Aimsun software exponential distribution is set by default.In [13] it is shown that negative exponential distribution is suitable for headway generation model modelling and can be used for event evaluation with different volumes of transport flows (up to 1700 vehicles per hour per lane).
When normal distribution is selected for headway, the time interval between arriving vehicles at the transport network is distributed by truncated normal distribution.In normal distribution, vehicles will be centred based on means and mean, mode and median will be equal to each other.Here, as well as for other distributions additional check for space is performed.If a link is full with vehicles and there is not an appropriate gap, then vehicle stays in a virtual queue until a gap appears.
In the uniform distribution, the time interval between two arriving vehicles at the transport network has an equal probability and, additionally, the system checks if there is physical space for vehicle in the transport network.

B. Simulation Step
Simulation step (Fig. 3) defines a time interval after which all the system elements (for example, traffic signals, vehicles and emissions) are updated and statistics recorded.Simulation step can range from 0.0 to 1.0 seconds, by default it is set to 0.75 in Aimsun software.Simulation step has influence not only on model behavior, but also on the model outputs.Count of lost vehicles can be reduced by changing a simulation step.Smaller simulation step means smaller reaction timedrivers will drive closer to each other, will be more aggressive, and more often will change the lane.If a simulation step will be too high, then vehicles will not react adequately to the transport situations, do not change the lane in case of an appropriate gap and will create or increase the congestion situation that in a real situation does not exist.As a result of such driver's behavior, the system blocking situations can occur.

C. Two-Step Cluster Analysis Procedure for Value Selection of Transport Simulation Model Parameters
A two-step cluster analysis procedure has been chosen for value selection of transport simulation model calibration parameters, because it can process a large amount of data, is fast and can work with both continuous and categorical variables [9] in cases when the number of appropriate clusters is not known.Algorithm is based on distance measure, the most reliable results perform when all variables are independent, continuous variables have normal distribution and categorical variables have multinomial distribution.
A two-step cluster analysis procedure includes the following steps: 1) Initially, observations are grouped with log-likelihood distances, creating the cluster feature (CF) cluster "tree".The first case is placed in the root of the leaf node, which contains variable information about a case.Each next case joins the existing node or forms a new one that is based on its similarity with the existing node and uses distance measure as a criterion of similarity.In the cluster feature development step, the algorithm identifies atypical (noises) cases and can exclude them from the analysis.Atypical cases are cases not suitable for any cluster.
2) Received subclusters are further grouped into a desired number of clusters by comparing their distance with a special threshold [19].If the distance exceeds a threshold, then both subclusters are combined.
3) The optimal number of clusters is defined by two criteria: Bayesian Information (BIC) [5] or Akaike Information Criterion (AIC) [20].Both information criteria are widely used to evaluate information content of the various statistical models.The smallest number of parameters leads to higher informative content and more accurate prediction.The lower the index value, the better the cluster decision.
where mj is the number of parameters or Bayesian information criteria, Pk is the number of categories for the k-th categorical variable.

IV. CASE STUDY FOR ADAZI CITY
Freeway section at the entrance to Adazi city has been selected to calibrate a simulation model with global parametersthe headway and simulation step.Within the analysed section, there is one lane in both transport directions with an additional half traffic lane (Fig. 4).

Adazi city center
Analyzed section Fig. 4. Analyzed "weaving" section at the entrance to Adazi city.
The section has been chosen because at this point some vehicles go right to the centre of Adazi city and others continue driving to a freeway; it is a weaving point at which drivers are looking for a gap to change lanes (Fig. 5).
To perform simulation model calibration, the following preprocessing steps have been performed: a) Data collection and initial model preparation.Initial transport data have been collected by a video recorder in the evening hours on working days and weekend.Then the collected data have been pre-processed, records with unusual data have been removed.Received data have been divided into two groups, one for transport simulation model calibration and the other for validation.Transport simulation software Aimsun 6.0 has been used to develop, calibrate and validate the simulation model.Based on the collected data, transport, origin-destination matrices for cars and trucks, transport network, public transport lanes have been developed and added to the transport simulation model.6) and calculating a root mean square error (1).The root mean square error between the observed and simulated traffic flows is in the range of 12-28 % in accordance with the time interval.At one point, the observed traffic flow significantly exceeds the simulated traffic flow.In a "real/observed" situation, it is considered that at this point there is a blocking situation, when one vehicle wants to change the lane, but cannot find an appropriate gap for this, and stays in the lane waiting for a gap delaying other vehicles.The next step is to select the headway model and simulation step value that will fit the RMSE less than 15 %.
where n is the number of records, wijthe predicted value and vijthe observed value.Information Technology and Management Science ________________________________________________________________________________________________2015 / 18 54 d) Parameter influence evaluation on simulation results.To evaluate the headway and simulation step influence on simulation outputs, 600 runs have been performed.For the analysed exponential, uniform and normal headway generation model, 10 runs have been performed for each simulation step in the range from 0.65 to 0.85.Received simulation outputs (travel time, simulated transport volumes, assigned transport volumes, transport volume distribution by routes) and parameters have been added to a two-step cluster analysis procedure.Both Bayesian and Akaike information criteria have been selected to divide data into groups (see Table II).The optimal number of clusters is three; the distance measure ratio has the largest value for three clusters.Table III and Table IV present the centroids and frequencies for considered variables in a two-step cluster analysis procedure.It can be seen that in each cluster the centroid value for simulation step is around 0.74.The first cluster is the medium flow condition that contains 11-12 vehicles per minute, and all these vehicles have only one route from origin to destination (% of vehicle O-D by route -100 %) and 82.6 % of vehicles have uniform distribution.The second cluster is the low flow condition that contains approximately 4 vehicles per minute; mostly 90 % of vehicles have one route from origin to destination and 67.3 % of vehicles have normal distribution.The third cluster is the high flow condition that contains 37-39 vehicles per minute, only 12.8 % of vehicles have one route from origin to destination (% of vehicle O-D by route -12.8 %) and 91.4 % of vehicles have exponential distribution.The third cluster with high volume of simulated vehicles in comparison with the assigned vehicles represents the blocking situation described in b) point.f) Simulation model output evaluation.After running the simulation model with exponential arrival and simulation step "0.74", simulated outputs have been compared with the observed ones (Fig. 7), root mean square error between observed and simulated traffic flows is 12.9 %.In addition, delay time per hour has been evaluated for simulation model outputs.Delay time is 18.9 sec/h that correspondents to the level of service D and describes the existing observed situation in the road lane.
After simulation model output evaluation, transport simulation model has been run one more time with validation data set to check whether the selected exponential arrival and simulation step will show appropriate results for another data set.
At this point, model calibration has been done; the exponential headway with simulation step 0.74 should be used for forecasts of further events.Two global parameters, headway generation model and simulation step have been selected for transport simulation model calibration.Three headway generation model distributions have been analysed: normal, uniform and exponential.More than 600 runs have been performed for the simulation model with a simulation step in the range of 0.65-0.85.
To select headway and simulation step values, firstly the model has been run with default calibration parameter values.The results of model run are unsuccessful, RMSE is >15 %.To reduce the time for calibration parameter value selection and to improve transport simulation model result reliability, a two-step cluster analysis procedure has been proposed.The proposed improvements have been tested on "weaving" section at the entrance to Adazi city.The application of twostep cluster analysis procedure has allowed reducing time needed for simulation step and headway generation model selection.In headway model selection, an important role has been played by the analysed event, but a two-step cluster analysis procedure has shown that for low flows with no congestions it is better to use normal distribution for arrival.In cases where high traffic flows occur, in the congested road section the exponential distribution is a more acceptable alternative.
Application of a two-step cluster analysis procedure to a calibration procedure has allowed reducing time needed for simulation step and headway generation model value selection.

Fig. 2 .
Fig. 2. Time interval between two vehicles that arrive at the transport network.

Fig. 3 .
Fig. 3. Example of too small simulation step (timid driving) and too big simulation step (aggressive driving).

Fig. 5 .
Fig. 5. Distance to different destinations in the research area.

Fig. 6 .
Fig. 6.Simulated and observed traffic flows for one-hour interval at the analysed point.

Fig. 7 .
Fig. 7. Simulated and observed traffic flow for exponential, normal and uniform arrival.
1)initial data division into two sets, one for model calibration and the other for model validation; 2) model run with Aimsun software default parameters; 3) model output evaluation based on RMSE; 4) global parameter evaluation on model outputs by running model n-times; and observed and simulated results comparison in case RMSE exceeds 15 %.Additionally, it is proposed to perform the two-step cluster analysis procedure for estimation of global parameter values and simulation of the received groups; 5) model run with selected global parameters and their values; 6) result evaluation with performance measures; 7) model run with selected parameters and value for validation set if performance measure results are accessible.