Browse

You are looking at 61 - 70 of 627 items for :

  • Probability and Statistics x
Clear All
Open access

H. Kumar Sharma, K. Kumari and S. Kar

Abstract

Accurate and reliable air passenger demand is very important for policy-making and planning by tourism management as well as by airline authorities. Therefore, this article proposed a novel hybrid method based on rough set theory (RST) to construct decision rules for long-term forecasting of air passengers. Level (mean) and trend components are first estimated from the air passengers time series data using DES model in the formulation of the proposed hybrid method. Then the rough set theory is employed to combine the output of DES model and generated decision rules is used to forecasting air passengers. We compare the proposed approach with other time series models using a corrected classified accuracy (CCA) criterion. For the empirical analysis, yearly air transport passenger from 1992 to 2004 is used. Empirical results show that the proposed method is highly accurate with the higher corrected classified accuracy. Also, forecasting accuracy of the proposed method is better than the other time series approaches.

Open access

Thomas Suesse and Ray Chambers

Abstract

Model-based and model-assisted methods of survey estimation aim to improve the precision of estimators of the population total or mean relative to methods based on the nonparametric Horvitz-Thompson estimator. These methods often use a linear regression model defined in terms of auxiliary variables whose values are assumed known for all population units. Information on networks represents another form of auxiliary information that might increase the precision of these estimators, particularly if it is reasonable to assume that networked population units have similar values of the survey variable. Linear models that use networks as a source of auxiliary information include autocorrelation, disturbance, and contextual models. In this article we focus on social networks, and investigate how much of the population structure of the network needs to be known for estimation methods based on these models to be useful. In particular, we use simulation to compare the performance of the best linear unbiased predictor under a model that ignores the network with model-based estimators that incorporate network information. Our results show that incorporating network information via a contextual model seems to be the most appropriate approach. We also show that one does not need to know the full population network, but that knowledge of the partial network linking the sampled population units to the non-sampled population units is necessary. Finally, we also provide an estimator for the mean-squared error to make an informed decision about using the contextual information, as well as the results showing that this adaptive strategy leads to higher precision.

Open access

Francisco Goerlich and Francisco Ruiz

Abstract

This article proposes a typology of boundary changes in territorial units at two points in time. The different types of changes are organized in a hierarchy and represented homogeneously, independently of the number of territorial units involved and of the changes to them. Each alteration is described precisely and unambiguously, and it is codified to allow the information to be treated automatically. In addition to providing efficient storage of the information about these changes, a canonical representation facilitates the automatic detection of inconsistencies in the database. At the same time, the typology allows us to define backward and forward equivalence rules, which helps in the task of generating homogeneous time series about territorial unit characteristics, such as population or surface area, or generating the full genealogy of a territorial unit over time. We also offer an application of the proposal to inconsistencies and error detection in the database Alterations to the Municipalities in the Population Censuses since 1842 from the Spanish National Statistical Institute (INE).

Open access

Peter G.M. van der Heijden, Paul A. Smith, Maarten Cruyff and Bart Bakker

Abstract

We consider the linkage of two or more registers in the situation where the registers do not cover the whole target population, and relevant categorical auxiliary variables (unique to one of the registers; although different variables could be present on each register) are available in addition to the usual matching variable(s). The linked registers therefore do not contain full information on either the observations (often individuals) or the variables. By treating this as a missing data problem it is possible to construct a linked data set, adjusted to estimate the part of the population missed by both registers, and containing completed covariate information for all the registers. This is achieved using an Expectation-Maximization (EM)-algorithm. We elucidate the properties of this approach where the model is appropriate and in situations corresponding with real applications in official statistics, and also where the model conditions are violated. The approach is applied to data on road accidents in the Netherlands, where the cause of the accident is denoted by the police and by the hospital. Here the cause of the accident denoted by the police is considered as missing information for the statistical units only registered by the hospital, and the other way around. The method needs to be widely applied to give a better impression of the range of problems where it can be beneficial.

Open access

Jonathan Lisic, Hejian Sang, Zhengyuan Zhu and Stephanie Zimmer

Abstract

A computational approach to optimal multivariate designs with respect to stratification and allocation is investigated under the assumptions of fixed total allocation, known number of strata, and the availability of administrative data correlated with thevariables of interest under coefficient-of-variation constraints. This approach uses a penalized objective function that is optimized by simulated annealing through exchanging sampling units and sample allocations among strata. Computational speed is improved through the use of a computationally efficient machine learning method such as K-means to create an initial stratification close to the optimal stratification. The numeric stability of the algorithm has been investigated and parallel processing has been employed where appropriate. Results are presented for both simulated data and USDA’s June Agricultural Survey. An R package has also been made available for evaluation.

Open access

Paula Carroll, Tadhg Murphy, Michael Hanley, Daniel Dempsey and John Dunne

Abstract

This article describes a project conducted in conjunction with the Central Statistics Office of Ireland in response to a planned national rollout of smart electricity metering in Ireland. We investigate how this new data source might be used for the purpose of official statistics production. This study specifically looks at the question of determining household composition from electricity smart meter data using both Neural Networks (a supervised machine learning approach) and Elastic Net Logistic regression. An overview of both classification techniques is given. Results for both approaches are presented with analysis. We find that the smart meter data alone is limited in its capability to distinguish between household categories but that it does provide some useful insights.

Open access

Roland Weigand, Susanne Wanger and Ines Zapf

Abstract

We introduce a high-dimensional structural time series model, where co-movement between the components is due to common factors. A two-step estimation strategy is presented, which is based on principal components in differences in a first step and state space methods in a second step. The methods add to the toolbox of official statisticians, constructing timely regular statistics from different data sources. In this context, we discuss typical measurement features such as survey errors, statistical breaks, different sampling frequencies and irregular observation patterns, and describe their statistical treatment. The methods are applied to the estimation of paid and unpaid overtime work as well as flows on working-time accounts in Germany, which enter the statistics on hours worked in the national accounts.

Open access

Abel Dasylva

Abstract

This article looks at the estimation of an association parameter between two variables in a finite population, when the variables are separately recorded in two population registers that are also imperfectly linked. The main problem is the occurrence of linkage errors that include bad links and missing links. A methodology is proposed when clerical-reviews may reliably determine the match status of a record-pair, for example using names, demographic and address information. It features clerical-reviews on a probability sample of pairs and regression estimators that are assisted by a statistical model of comparison outcomes in a pair. Like other regression estimators, this estimator is design-consistent regardless of the model validity. It is also more efficient when the model holds.

Open access

Jacco Daalmans

Abstract

Data editing is the process of checking and correcting data. In practise, these processes are often automated. A large number of constraints needs to be handled in many applications. This article shows that data editing can benefit from automated constraint simplification techniques. Performance can be improved, which broadens the scope of applicability of automatic data editing. Flaws in edit rule formulation may be detected, which improves the quality of automatic edited data.

Open access

Leo Pasquazzi and Michele Zenga

Abstract

In this work we apply a new approach to assess contributions from factor components to income inequality. The new approach is based on the insight that most (synthetic) inequality indexes may be viewed as (weighted) averages of point inequality measures, which measure inequality between population subgroups identified by income. Assessing contributions of factor components to point inequality measures is usually an easy task, and based on these contributions it is straightforward to define contributions to the corresponding (synthetic) overall inequality indexes as well. As we shall show through an analysis of income data from Eurostat’s European Community Household Panel Survey (ECHP), the approach based on point inequality measures gives rise to readily interpretable results, which, we believe, is an advantage over other methods that have been proposed in literature.