The classification of atmospheric states into separate circulation types is a well-known tool for describing and analysing climate conditions. The main idea behind this is to move from continuous information about an atmospheric state (e.g., the pressure field on a given day) towards discrete information. This involves ordering individual atmospheric states and assigning them to groups of types with certain similarities. This is how a circulation type catalogue is created – each type is described with a value on a nominal scale. The main advantage of such an approach is clearly that circulation type catalogues are easy to use and that classification results can be compared at different locations and times, as well as the ability to analyse how different climate elements depend on atmospheric circulation. However, moving from continuous to discrete information entails a certain amount of information loss concerning the atmospheric state, which can make it difficult to interpret results. On the one hand, we thus want to develop the simplest catalogue of types; while on the other, we need to retain as much information as possible on the state of atmosphere. As a result, a number of classifications are being developed worldwide that seek to find an optimal method for classifying atmospheric circulation types – one that would be simple yet, at the same time, provide the most important information about the climate elements’ variability. This results in existing classifications differing from one another, mainly in the number of types. European circulation type classifications are characterised by the large variation in the number of types: ranging from 13, (Péczely 1983) up to 40 (Schüepp 1979) or even 80 different predefined types (as in the classification developed by the Central Institute of Meteorology in Vienna). Most of the classifications used in Europe have between 9 and 40 types, while the most popular ones are those with 9, 18, or 27 types (Philipp et al. 2010).
According to Philipp et al. (2010), circulation type classification schemes can be divided into two main groups: subjective and automated. Subjective classifications include the very well-known
In the original work by Lityński (1969), the author outlines the main underlying principles of his method, that is, the method for constructing the zonal index, Ws; the meridional index, Wp; and the cyclonicity in Poland index, Cp. Lityński proposed dividing the distribution of the values of each index into three equally-likely classes. As a result, the frequency of each class under each index should amount to 33.33%, and all possible class combinations of the three indices produce 27 circulation types: 9 cyclonic types, 9 gradientless types, and 9 anticyclonic types.
Later, the source material used for establishing circulation types, and the algorithm for calculating the thresholds of index classes underwent a change (Pianko-Kluczyńska 2007; Philipp et al. 2010). At present, the Polish weather service uses the following principles for this classification: data from the NCEP/NCAR reanalysis of sea level pressure at 12:00 UTC are used (Kalnay et al. 1996) instead of the synoptic charts used by Lityński. Since reanalyses constitute grid data, the method for calculating the values of the Ws, Wp, and Cp indices has also undergone a change. The grid point, which is taken as the central point, is the one closest to Warsaw, that is,
The Ws index, as defined in this way, takes positive values for western circulation (western air mass advection) and negative ones for eastern circulation (eastern air mass advection).
The meridional index, Wp, is determined as a horizontal meridional pressure gradient. As with the Ws index, it equals the difference between spatially averaged sea level pressure at meridians
The Wp index, as defined in this way, takes positive values for southern circulation (southern air mass advection) and negative ones for northern circulation (northern air mass advection).
In turn, the Cp index indicates whether the domain is influenced by a cyclonic, anticyclonic, or gradientless circulation. It is determined by the sea level pressure at the central grid point (in hPa);inthiscase,
The algorithm for determining the thresholds of the circulation index classes has also undergone a change. The mean value
Lityński’s classification of atmospheric circulation types is one of the most important classifications used in climatological studies in Poland. The researchers who use it usually define it as an equally-likely classification – in reality, however, it differs from the equally-likely division. This paper proposes a number of small modifications to the algorithm currently in use for establishing atmospheric circulation types. The aim, therefore, is to compare the proposed algorithms, and indicate which one produces a catalogue of circulation types in which the division into three classes of the distribution of the values (of the three indices, Ws, Wp, and Cp) is closest to being equally-likely; thus actually fulfilling the principles of Lityński’s original classification.
In order to prepare circulation type catalogues, data originating from the NCEP/NCAR reanalysis on sea level pressure at 12:00 UTC for each day from 1948 to 2015 were used. The variants, which are described below, were compared for the period 1986 to 2015 (30 years). This circulation type classification is calculated for a spatial domain of a specified size. The central grid point, which is specified by the coordinates φ and λ, and the corner points of the domain are shown in Figure 1. In this paper the classification is carried out only for the grid point that is the closest to Warsaw, but the domain can be freely moved to any point in the European middle latitudes. Therefore, the data for the area 40–65ºN and 0–35ºE were used, and the grid point,
Five variants of the classification scheme were prepared. Their main underlying principles are described below.
ORG: the classification currently used by the Polish weather service. In this classification, the Wp and Ws indices are determined using a 5° step, and the thresholds of the three index classes are defined based on the above-described algorithm, which was introduced by Pianko-Kluczyńska (on the basis of all the material available at a given time; the lower threshold NEW: circulation types are determined in a similar way to the ORG classification (the Wp and Ws indices are determined using a 5° step, the lower threshold NEW1: circulation types are determined in the same way as the NEW variant (the class thresholds are determined for a 30 year period, the lower threshold is MOD30: circulation types are determined as for the NEW1 variant (the Wp and Ws indices are determined using a 2.5º step, the thresholds for the circulation index classes are determined for a 30 year period), but using a division based on the percentiles 33 and 67 for calculating the class thresholds. MOD20: circulation types are determined in the same way as for the MOD30 variant (the Wp and Ws indices are determined using a 2.5º step, the thresholds of the circulation index classes are calculated using a division based on the percentiles 33 and 67), but with the period on which the thresholds of the circulation index classes are calculated, limited to 20 years.
In all variants the calculated threshold values are assigned to the middle date of each month. For days that are between the middle dates of two adjacent months, the threshold values are linearly interpolated.
We carried out an evaluation of classification schemes in order to decide which variant results in a catalogue of circulation types where the distribution of index values in the three classes is the closest to being equally-likely. The comparison was made by calculating the mean absolute error (MAE). It was adopted as a way of measuring, for each classification scheme, how close the distribution of values (of an individual index) within the three classes, was to being equally-likely, as shown in Equation 3:
The empirical frequency of each of the nine classes is indicated by
The percentage of agreement between the types, determined using the ORG and NEW classifications, stands at 89.3% (Table 1); which means that, within the 30 year period, this percentage of days was classified identically for both of these variants. Both classifications were prepared from the same source material and using the same algorithm for determining their thresholds. It can therefore be concluded that the almost 11% of classification cases that were different resulted from using a different period (a period extended each year, or ‘running normals’) to calculate class thresholds.
The matrix of agreement between the results of individual classification schemes (%). Variant names are explained in the text. Source: own elaboration.ORG 89.3 NEW 88.0 95.6 NEW1 84.9 91.1 93.0 MOD30 80.9 86.3 87.6 89.2 MOD20
Calculating the mean value,
A minor change was introduced in the next step, namely, an increase in the density of the grid points on the basis of which the circulation indices are calculated. The NEW1 variant is, therefore, very similar to the NEW variant, and the only difference is the application of a 2.5º step (rather than a 5º step) when calculating the Ws and Wp indices. Consequently, the percentage of agreement between the types determined using these two variants is the highest of all variant pairs and amounts to 95.6% (Table 1).
One more change has been introduced to the classification algorithm: percentiles have been used instead of the mean value and standard deviation to calculate class thresholds (the MOD30 variant). When using the formula
Assuming that the distribution of values is divided into equally-likely classes means that the frequency of each class should be the same, that is, 33.33%. However, in reality the frequency of individual classes is not identical. In the ORG classification, the ‘western’ class was much more frequent (36.3%) than the ‘eastern’ (32.3%) and ‘zero’ classes (31.4%) (Figure 4b). Similarly, the frequency of the ‘northern’ class stood at 32.2%, while that of the ‘southern’ class was 34.5%, and of the ‘zero’ class, 33.2% (Figure 4a). In the NEW classification, the differences in the frequency of individual classes have been reduced. The spread of the W and E classes’ frequency amounted to only just under 1.0 percentage point, while the spread of N and S classes was only 1.7 percentage points. Cyclonic circulation was also the least frequent in the NEW classification, but the differences in the frequency of individual circulation types have been reduced in this classification: cyclonic circulation was observed almost 32.1% of the time, while anticyclonic circulation was for 35.3% of the time. These differences in the frequency of the individual classes of circulation indices within these two classifications result from the fact that in the ORG variant the threshold values are calculated on the basis of an increasingly longer period each year. However, even though in the NEW variant running normals is used instead, the frequency of each class is also not identical in this case. This is due, among other things, to the method adopted for defining class thresholds (using the mean value and standard deviation), as well as to the use of linear interpolation. Using linear interpolation to determine class thresholds for individual days between the middle dates of two adjacent months, results in the distribution of the indices being divided into three classes that are never equally-likely. Yet this method of calculating the thresholds has been adopted so as to avoid a situation where two consecutive days that belong to adjacent months and have identical circulation parameters (identical values of all three indices) are classified as being completely different types. In subsequent variants, the differences in the frequencies of individual index classes have been reduced even more. In the MOD30 variant, the frequency spread of the W and E classes amounts to only 1.1 percentage points, while the spread of the C and A classes stands at only 0.6 percentage point (Figure 4). The differences in the frequencies of classes are even smaller in the MOD20 variant (described below).
Taking into account all the variants compared so far, a synthetic comparison of all classification schemes has been made using the mean absolute error (MAE). In this case, the theoretical equally-likely frequency (where the frequency of each class amounts to 33.33%) has been used as a model value. Table 2 shows that the smallest mean absolute error is found in the MOD30 classification.
Mean absolute error (MAE) for the different variants of the classification algorithm. Variant names are explained in the text. Source: own elaboration.Variant ORG NEW NEW1 MOD30 MAB (pp) 1.82 1.05 1.01 0.74
In Table 2, three variants (NEW, NEW1, MOD30) adopted a 30 year period for determining class thresholds; however, using a 30 year period does not automatically mean that the classification generated on this basis will be closest to being equally-likely. This is why the MOD30 variant has been tested at various time period lengths on the basis of which the class thresholds are calculated. Periods of 5, 10, 15, 20, 25, 30 and 35 years were chosen for comparison; they are referred to, respectively, as MOD5, MOD10, MOD15, MOD20, MOD25, MOD30, and MOD35. The smallest MAE calculated was for the MOD20 variant, 0.55 percentage points (Figure 5). This means that the MOD20 classification is closest to being equally-likely.
Changing the period on which the class thresholds are determined, as well as changing the method of their determination (e.g., percentiles, grid density) results in a situation where, on a given day, individual classification variants may designate completely different circulation types. Figure 6 includes several examples that illustrate this issue; it shows the circulation types determined on eight different days in the ORG and MOD20 classifications. For example, on 28 April 2011 (Figure 6d), the circulation type according to the ORG classification (EA) was totally different to the one determined in the MOD20 classification (NE0). Air pressure in Warsaw stood at below 1017 hPa and both the isobars’ structure and the location of the cold front indicated air mass inflow from the north-east. On 3 August 2013 (Figure 6f), the col was located above central Europe and the air pressure in Warsaw amounted to 1018 hPa. On that day, the ORG classification indicated a type WA, while the MOD20 classification identified gradientless circulation without air mass inflow from any direction (00). Another example is 8 October 2008 (Figure 6g), when Poland was under the influence of a high-pressure system centred over the Baltic Sea. Both classifications showed no air mass inflow on that day; however, the MOD20 variant, in contrast to the ORG classification, indicated a ‘zero’ (gradientless) circulation type (air pressure in Warsaw stood at 1021 hPa). These are only some of the examples of days that were classified differently between the ORG and MOD20 classifications, but it should be noted that in total, 19.1% of days were classified in a different way (Table 1). However, the differences between the types that were calculated in the ORG and the MOD20 classifications for any given day are not greater than one class (e.g., S may change to 0 but not to N, or C may change to 0 but not to A).
Finally, let us add a few words on the difference between the determined and actual direction of air mass inflow. In the equally-likely distribution, both thresholds that indicate the division into the three classes may happen to be on the same side of zero. In such situations, the ‘zero’ class will not contain ‘neutral’ values that are on both sides of zero, and an additionally designated negative class may also contain positive values or vice versa (e.g., low positive values of the Ws index may also be found in January in class E). During the period under consideration, a positive value of the Ws index’s lower threshold in the MOD20 variant was observed in 45.6% of cases, mainly during the cold half of the year from the end of August or the beginning of September to January or February. This means that during this time period, the ‘eastern’ circulation types are not necessarily linked with an inflow of air from the east. It should be noted, however, that in the whole 30 year period there were only 413 days (3.8%) when the value of the Ws index was positive, yet despite this, the ‘eastern’ circulation type was determined. On the other hand, the upper threshold of the Ws index was always positive. As for the Wp index, its lower threshold took a positive value in only 16 cases (0.2%), and in the case of all circulation types classified as ‘northern’ on these days, the Wp index was far below zero. It should therefore be concluded that the ‘northern’ types are always linked with an inflow of air from the north. On the other hand, the upper threshold of the Wp index took values below zero in 17.1% of cases, mainly in the summer months (June, July, August); while the minimum value of this threshold was not lower than -0.054. This means that in summer, ‘southern’ types are not necessarily connected with the inflow of air from the south; but in the whole 30 year period only 87 cases (0.8%) were classified as ‘southern’ types, while the actual value of the W index on these days was negative.
This paper compares the different variants of circulation type classification by Lityński. Five different variants of the classification algorithm that were based on different underlying principles, were compared. This made it possible to indicate which algorithm generates a catalogue of circulation types in which the division into three classes of the distribution of the values of the Ws, Wp and Cp indices is the closest to being equally-likely. During 1986–2015, the MOD20 classification differed from the ORG classification in more than 19% of cases. In subsequent variants of the classification algorithm, the differences in the frequencies of each class (in relation to individual circulation indices) became smaller and smaller. Ultimately, the use of the MOD20 classification resulted in generating a catalogue of circulation types in which the division of the distribution of the values in the Ws, Wp and Cp indices into three classes was the closest to being equally-likely.
MOD20 is a threshold-based classification. In general, threshold-based methods perform surprisingly well in comparison to more complicated algorithms (Huth 2010). The relative simplicity of the MOD20 classification algorithm is its big advantage, in contradiction to methods using principal component analysis or optimisation methods. Even though, in recent times, computing time and algorithm complexity no longer constitute a problem, by having a simpler method, it is also easier to understand what exactly is being produced by the classification algorithm. According to Beck and Philipp (2010) the original Lityński classification (ORG) is one of the threshold-based methods that produces the best results in Central Europe.
The advantage of the MOD20 classification is its use of the empirical distribution of index values – the percentiles 33 and 67 – instead of the theoretical distribution
The algorithm for generating atmospheric circulation types according to the MOD20 method (with the thresholds based on either moving 20 year period or the fixed period, 1951–2000) is available at MATLAB File Exchange’s web page.