The Use of Official Statistics in Self-Selection Bias Modeling

Open access


Official statistics are a fundamental source of publicly available information that periodically provides a great amount of data on all major areas of citizens’ lives, such as economics, social development, education, and the environment. However, these extraordinary sources of information are often neglected, especially by business and industrial statisticians. In particular, data collected from small businesses, like small and medium-sized enterprizes (SMEs), are rarely integrated with official statistics data.

In official statistics data integration, the quality of data is essential to guarantee reliable results. Considering the analysis of surveys on SMEs, one of the most common issues related to data quality is the high proportion of nonresponses that leads to self-selection bias.

This work illustrates a flexible methodology to deal with self-selection bias, based on the generalization of Heckman’s two-step method with the introduction of copulas. This approach allows us to assume different distributions for the marginals and to express various dependence structures. The methodology is illustrated through a real data application, where the parameters are estimated according to the Bayesian approach and official statistics data are incorporated into the model via informative priors.

If the inline PDF is not rendering correctly, you can download the PDF file here.

  • Albert J.H. and S. Chib. 1993. “Bayesian Analysis of Binary and Polychotomous Response Data.” Journal of the American Statistical Association 88: 669-679. Doi:

  • Ali M.M. N.N. Mikhail and M.S. Haq. 1978. “A Class of Bivariate Distributions Including the Bivariate Logistic.” Journal of Multivariate Analysis 8: 405-412. Doi:

  • Armero C. A. Lo´pez-Quı´lez and R. Lo´pez-Sa´nchez. 2008. “Bayesian Assessment of Times to Diagnosis in Breast Cancer Screening.” Journal of Applied Statistics 35: 997-1009. Doi:

  • Bhat C.R. and N. Eluru. 2009. “A Copula-Based Approach to Accommodate Residential Self-Selection Effects in Travel Behavior Modeling.” Transportation Research Part B 43: 749-765. Doi:

  • Clayton D.G. 1978. “A Model for Association in Bivariate Life Tables and its Application in Epidemiological Studies of Family Tendency in Chronic Disease Incidence.” Biometrika 65: 141-151. Doi:

  • European Commission. 2010. “Innovation Union Scoreboard 2010.” Available at: (accessed February 2014).

  • Farlie D.J.G. 1960. “The Performance of Some Correlation Coefficients for a General Bivariate Distribution.” Biometrika 47: 307-323. Doi:

  • Frank M.J. 1979. “On the Simultaneous Associativity of F(x y) and x þ y 2 F(x y).” Aequationes Mathematicae 19: 194-226.

  • Gamerman D. and H. Lopes. 2006. “Markov Chain Monte Carlo.” Texts in Statistical Science 2nd ed. New York: Chapman & Hall.

  • Gumbel E.J. 1960. “Bivariate Exponential Distributions.” Journal of the American Statistical Association 55: 698-707.

  • Hamilton B.H. and J.A. Nickerson. 2003. “Correcting for Endogeneity in Strategic Management Research.” Strategic Organization 1: 51-78. Doi:

  • Heckman J.J. 1979. “Sample Selection Bias as a Specification Error.” Econometrica 47: 153-161.

  • Huard D. G. Evin and A.-C. Favre. 2006. “Bayesian Copula Selection.” Computational Statistics and Data Analysis 51: 809-822. Doi:

  • ISTAT. 2013. Italian Innovation Survey. Available at: (accessed February 2014).

  • Joe H. 1993. “Parametric Families of Multivariate Distributions with Given Marginals.” Journal of Multivariate Analysis 46: 262-282. Doi:

  • Joe H. 1997. Multivariate Models and Dependence Concepts. London: Chapman & Hall.

  • Lee L.-F. 1983. “Generalized Econometric Models with Selectivity.” Econometrica 51: 507-512. Doi:

  • Lucchetti R. and C. Pigini. 2013. “A Test for Bivariate Normality with Applications in Microeconometric Models.” Statistical Methods and Applications 22: 535-572. Doi:

  • Morgenstern D. 1956. “Einfache Beispiele zweidimensionaler Verteilungen.” Mitteilungsblatt für Mathematische Statistik 8: 234-235.

  • Nelsen R.B. 1999. An Introduction to Copulas. New York: Springer.

  • Nicolini G. and L. Dalla Valle. 2011. “Errors in Customer Satisfaction Surveys and Methods to Correct Self-Selection Bias.” Quality Technology & Quantitative Management 8: 167-181. Doi:

  • Nicolini G. and L. Dalla Valle. 2012. “Census and Sample Surveys.” In Modern Analysis of Customer Surveys with Applications Using R edited by Ron S. Kenett and Silvia Salini 37-53. Statistics in Practice Series. Hoboken NJ: John Wiley & Sons.

  • Rosenbaum P. and D. Rubin. 1983. “The Central Role of the Propensity Score in Observational Studies for Causal Effects.” Biometrika 70: 41-55. Doi:

  • Rubin D.B. 1974. “Estimating Causal Effects of Treatments in Randomized and Non- Randomized Studies.” Journal of Educational Psychology 66: 688-701. Doi:

  • Sklar A. 1959. “Fonctions de re´partition a n dimensions et leurs marges.” Publications de l’Institut de Statistique de l’Universite de Paris 8: 229-231.

  • Smith M. 2003. “Modelling Sample Selection Using Archimedean Copulas.” Econometrics Journal 6: 99-123. Doi:

  • Smith M.D. 2005. “Using Copulas to Model Switching Regimes with an Application to Child Labour.” The Economic Record 81: S47-S57.

  • Tobin J. 1958. “Estimation of Relationships for Limited Dependent Variables.” Econometrica 26: 24-36. Doi:

Journal information
Impact Factor

IMPACT FACTOR 2018: 0.837
5-year IMPACT FACTOR: 0.934

CiteScore 2018: 1.04

SCImago Journal Rank (SJR) 2018: 0.963
Source Normalized Impact per Paper (SNIP) 2018: 1.020

Cited By
All Time Past Year Past 30 Days
Abstract Views 0 0 0
Full Text Views 374 214 7
PDF Downloads 149 93 2