The Spatial Fay-Herriot Model in Poverty Estimation

Abstract Counteracting poverty is one of the objectives of the European Commission clearly emphasized in the Europe 2020 strategy. Conducting appropriate social policy requires knowledge of the extent of this phenomenon. Such information is provided through surveys on living conditions conducted by, among others, the Central Statistical Office (CSO). Nevertheless, the sample size in these surveys allows for a precise estimation of poverty rate only at a very general level - the whole country and regions. Small sample size at the lower level of spatial aggregation results in a large variance of obtained estimates and hence lower reliability. To obtain information in sparsely represented territorial sections, methods of small area estimation are used. Through using the information from other sources, such as censuses and administrative registers, it is possible to estimate distribution parameters with smaller variance than in the case of direct estimation. This paper attempts to estimate the poverty rate at LAU 1 level of Poland. This estimation will be possible through the use of data from different sources describing the living conditions of households and the use of the Fay-Herriot model with spatial correlation. As a result, estimates for previously unpublished levels of aggregation will be obtained.


Introduction
In recent years growing demand for information available at the local level has been observed. An example of an area where such data are particularly desirable are living conditions, especially knowledge of the poverty rate level. In Poland the measurement of this indicator is based on two sample surveys conducted by the Central Statistical Office: Household Budget Survey (HBS) and European Survey on Income and Living Conditions (EU-SILC) (CSO, 2015). The methodology of both surveys allows publishing obtained results only at the national and regional level. Information for more detailed sections is not available because of the too small sample size, which leads to large mean square errors (MSE) of the obtained estimates (CSO, 2012).
An application of small area estimation methods allows to obtain estimates at lower than the published level of aggregation and meet information needs. The first attempt of estimating poverty rate in unplanned, within the sample survey, domains took place in 2014. In the project entitled Poverty maps at subregional level in Poland based on indirect estimation (CSO, 2014) the Fay-Herriot model was applied and estimates of poverty rate at subregional level (NUTS 3) were obtained.
This work aims to estimate poverty rate at a much more detailed level -LAU 1. To obtain estimates at this level, the spatial Fay-Herriot model (Pratesi, Salvati, 2008) was used. The basis of the analysis is EU-SILC 2011 survey, aggregated data from the 2011 National Census of Population and Housing (NSP) and the Local Data Bank (LDB). As a consequence, estimates of poverty rate at a so far unpublished level are shown. Moreover, these estimates have been assessed statistically and substantially. The paper is organized as follows: firstly, the poverty phenomena background is presented.
Then, there is the methodological part containing characteristics of methods used in the research.
The results sections show the outcome of the application and assessment of the obtained results.
The conclusion summarizes the most important findings and indicates possible further research into the area.

Poverty rate
Poverty as many other socio-economic phenomena can be measured, but it is not an easy task. A discussion over a definition of these phenomena has lasted for many years.
In literature poverty is connected with the fact, that some needs are not satisfied to a sufficient degree (Drewnowski, 1977). The Indian economist and the laureate of the Sveriges Riksbank Prize in Economic Sciences in Memory of Alfred Nobel indicates that poverty is not only the unavailability of selected goods or services but also a lack of opportunity of making decisions and participation in social and cultural life (Sen, 1992). Other example of a definition of poverty have been proposed by the European Council and sees poverty as "individuals or families whose resources (goods, cash income, plus services from public and private resources) are so small as to exclude them from the minimum acceptable way of life of the Member State in which they live" (EEC, 1985).
The most important aspect of poverty analysis is a defining criterion according to which a given unit (person or household) can be classified as being poor. The most frequented criterion, in practice, is an income or expenditure connected with a set threshold of income (expenditure) below which can be an analysed unit is considered as being poor. Eurostat recommend taking the poverty threshold as a 60% median of equalized income distribution. Such a way is applied, among others, in the European Survey on Income and Living Conditions (CSO, 2012).
To measure poverty, Foster and others proposed a group of indicators (Foster, Greer and Thornbecke, 1984). According to this work, poverty rate is defined as a ratio of units below the poverty threshold in the whole population. There is a known finite population U = 1, ..., i, ..., N divided into D domains or areas P 1 , ..., P D with counties N 1 , ..., N D . Let z define the value of the poverty threshold and E di is the income of i-th unit in d-th domain. If E di < z, then the unit from the i-th and d-th area is considered as being poor. General formula for poverty indicators from the FGT family is defined as follows: where: I(E di < z) = 1, if E di < z and I(E di < z) = 0 in an opposite case.
For α = 0 poverty rate called also poverty incidence or headcount ratio is obtained. Taking α = 1 gives poverty gap which measure distance of incomes of poor people from poverty threshold and thus inform about poverty depth among this group (Panek, 2011).

Direct estimator of poverty rate
The basic estimator used in sample surveys is the direct estimator proposed by Horvitz and Thompson (1952). Let s denote the sample from population U and s d will be a subsample from area d with counties n d < N d . The sample weight is the inversion of the first order inclusion probability and it is denoted as 1 .
To estimate poverty rate from the FGT family using direct estimation, the formula (2) must be modified in the following way: The direct estimator use data only from the sample and for large enough values of n d is unbiased and effecitve. Nevertheless, small sample size implicates huge variance values.
Moreover, such a way of estimation cannot be used in the case if a given domain is not sampled (n d = 0) (Rao, 2015).

Model based estimators of poverty rate
To gain the precision of direct estimates and obtain values in unsampled areas, small area estimation methods are used. They are based on "borrowing strength" from other areas and use of alternative available sources of data e.g. censuses or administrative registers (Dehnel, 2003).
Among many available methods, the most frequently used are those based on the model.
They allow explaining the complexity of analysed phenomena through variables in the regression model.

The Fay-Herriot model
The Fay-Herriot model (1979) is an area level model, which means that it needs only data available at a given area level of application. Unit data is not necessary. It is an unmistakeable advantage of this approach because access to unit level data is rather difficult. Moreover, there is a lot of data available for specific areas -they can be accessed from e.g. Local Data Bank and other public access databases.
The model proposed by Fay and Herriot is a variant of a linear model with random (area) effect. With reference to poverty rate it has the following form: , 1, ... , Best linear unbiased predictor (BLUP) is equal: (1 ) , 1, .ˆ.. , where: σ + ψ and β  is calculated with weighted least squared methods: σ a bigger or smaller share will be assigned to direct the estimator.
In practice 2 u σ is unknown and it is estimated. For this purpose the Fay-Herriot or Prasad-Rao (Rao, 2015) method can be used as a maximum likelihood or restricted maximum likelihood method. By replacing 2 u σ through the 2 u σ empirical best liner the unbiased predictor (EBLUP) is obtained.
For unsampled domains poverty rate is estimated using only auxiliary variables without sample data: Estimates obtained in this way are called synthetic (Rao, 2015).

Spatial Fay-Herriot model
The Spatial Fay-Herriot model (Pratesi, Salvati, 2007)  Matrix W is a proximity matrix between analysed areas while ρ measures the strength of spatial relationships between random effects in neighbouring areas. First the W 0 proximity matrix is created and the main diagonal of this matrix is equal 0, while other elements are equal 1 in the case when two areas alongside each other and 0 in the opposite case. Matrix W is based on W 0 through dividing each element of the row by the row sum. In such a way a rowstandardized matrix, where the sum of each row is equal to 1 is obtained.
The estimator of the considered model is a spatial best linear unbiased predictor (Spatial BLUP): where: b d T is vector 1 × D with value 1 on d-th position and 1 1 1D where: Estimator (9) depends on two unknown values 2 u σ and ρ. Replacing those values by its estimates results in obtaining the empirical estimator -SEBLUP: Similarly, as in the case of the classic Fay-Herriot model, also in the spatial variant to estimate poverty rate in unsampled domains a synthetic estimation is used: The mean square error of the above described estimators can be written as a sum:

Small area estimates of poverty rate at LAU 1 level
In this section previously described methods were used to estimate poverty rate at LAU 1 level in Poland. The starting point was to obtain a poverty rate using the Horvitz-Thompson estimator. Among 379 local administrative units, 4 of them were not sampled in EU-SILC 2011: wieruszowski (łódzkie district), proszowicki (małopolskie), moniecki (podlaskie), włoszczowski (świętokrzyskie). Moreover, in the next 12 there were no poverty households, which also prevents direct estimation.
In the final analysis 363 small areas for which poverty rate estimation was possible were found. Firstly, the proximity matrix for LAU 1 units was created. To verify the presence of spatial autocorrelation based on the obtained estimates and proximity matrix the Moran I statistics were calculated. This measure is standardized in 1, 1 − interval; where -1 denote a strong negative spatial autocorrelation and 1 strong positive spatial autocorrelation (Bivand, Pebesma and Gómez-Rubio, 2008). In the conducted analysis the Moran I statistics were equal 0.1293 with the p-value 0.0001. The calculated measure indicates that there is a positive spatial autocorrelation of the direct poverty rate estimates.
Next linear regression was used as a tool for the variables selection for the Fay-Herriot model. Using indicators from sources an unbiased sample error concerning demographics, economic activity and living conditions regression models were built. The aim was to provide the best explanation variability of the analysed variable -direct estimates of poverty rate.
The obtained model parameters were verified in a substantive and statistical way. The sign of a particular parameter was compared with knowledge of the analysed phenomena. For example, it is expected that a higher unemployment level can lead to higher poverty, so the sign next to the variable should be positive. Moreover, in the model only covariates with statistical significance of at least 0.1 were taken into account. All of the model parameters are presented in Table 1.
In the model there were three covariates: unemployment rate (X 1 ), share of newly registered unemployed people in the total of unemployed people (X 2 ) and share of households where the main source of income is in agriculture (X 3 ). All β parameters (except intercept) have a positive sign, which means that the increase of a particular indicator in an area affects an increase of the poverty rate in this unit.
Using an elaborated model (cf . Table 1) and estimator EBLUP (5) and SEBLUP (11) poverty rate for all LAU 1 units -both sampled and unsampled -in Poland was estimated. Table 2 presents some descriptive statistics of the obtained mean square errors.   Taking into account additional information connected with the autocorrelation of random effect the reduction of the MSE of the poverty rate was achieved.
A comparison of both estimators with the likelihood logarithm and Akaike information criterion indicate SEBLUP better than EBLUP (cf. Table 3). The obtained values of the poverty rates were placed on the map to illustrate the spatial distribution of analysed phenomena. The cartogram also indicates that big cities demonstrate lower levels of poverty than those units surrounding them. For example, in Poznań, the main city of the region, poverty rate was equal to 7.65% while in the poznański area, which surrounds Poznań, it was equal to 9.17%. Among 10, of the most at risk to poverty are units from the following districts: lubuskie, małopolskie, mazowieckie, podkarpackie and świętokrzyskie, which indicates poverty region occurrence in the eastern part of Poland.

Conclusions
One of the biggest challenges in small area estimation is the substantive appraisal of the obtained results. For this purpose, poverty rate estimates obtained using SEBLUP were compared to the share of people using social benefits and long-term unemployment rate (the share of people being unemployed for over 12 months). These indicators are strictly connected with the analysed phenomena. The correlation analysis clearly indicates that the structure of the poverty rate obtained using SEBLUP is more similar to the compared social indicators than direct estimates. The Spearman correlation coefficient between the share of people using social benefits and the SEBLUP poverty rate is equal r s = 0.5653 compared to direct estimates All of them aim to reduce poverty, so data about these phenomena at the local level will help to identify the poorest areas and proposed methodology would be treated as an evaluation tool in these projects.
In proposed approach it is also possible to use the proximity matrix based on the multivariate approach instead of considering the classical typicality matrix. Results of the simulations performed in the SAMPLE project show that such a matrix form would improve the estimates of MSE (Pratesi et al., 2010). Future research could also be focused on the application of unit level models (Guadarrama, Molina, Rao, 2016). In these models the distribution of household's income could be estimated instead of poverty rate. This class of models have better empirical properties but require unit data from census or administrative registers. The proposed ideas aim to elaborate the best method that could be used to estimate poverty indicators in Poland.