A Lexical Approach to Estimating Environmental Goods and Services Output in the Construction Sector via Soft Classification of Enterprise Activity Descriptions Using Latent Dirichlet Allocation

Open access


The research question addressed here is whether the semantic value implicit in environmental terms in an activity description text string, can be translated into economic value for firms in the construction sector. We address this question using a relatively new applied statistical method called Latent Dirichlet Allocation (LDA). We first identify a satellite register of firms in construction sector that engage in some form of environmental work. From these we construct a vocabulary of meaningful words. Then, for each firm in turn on this satellite register we take its activity description text string and process this string with LDA. This softly-classifies the descriptions on the satellite register into just seven environmentally relevant topics. With this seven-topic classification we proceed to extract a statistically meaningful weight of evidence associated with environmental terms in each activity description. This weight is applied to the associated firm’s overall output value recorded on our national Business Register to arrive at a supply side estimate of the firm’s EGSS value. On this basis we find the EGSS estimate for construction in Ireland in 2013 is about EURO 229m. We contrast this estimate with estimates from other countries obtained by demand side methods and show it compares satisfactorily, thereby enhancing its credibility. Our method also has the advantage that it provides a breakdown of EGSS output by EU environmental classifications (CEPA/CReMA) as these align closely to discovered topics. We stress the success of this application of LDA relies greatly on our small vocabulary which is constructed directly from the satellite register.

If the inline PDF is not rendering correctly, you can download the PDF file here.

  • Blei D. A.Y. Ng and M. Jordan. 2003. “Latent Dirichlet Allocation.” Journal of Machine Learning Research 3: 993–1022. Available at: http://www.jmlr.org/papers/volume3/blei03a/blei03a.pdf (accessed May 2016).

  • Blei D. and J. Lafferty. 2006. “Dynamic Topic Models.” Proceedings of the 23rd International Conference on Machine Learning 113–120 Pittsburgh Pennsylvania U.S.A. June 25 – 29 2006. Doi: https://doi.org/10.1145/1143844.1143859.

  • Blei D. and J. Lafferty. 2007. “A Correlated Topic Model of Science.” Annals of Applied Statistics 1(1): 17–35. Doi: https://doi.org/10.1214/07-AOAS114.

  • Blei D. and J. Lafferty. 2009. “Topic Models.” Available at: http://www.cs.columbia.edu/~blei/papers/BleiLafferty2009.pdf (accessed April 2016).

  • BLS. 2011. “Green Goods and Services Survey.” Available at: http://www.bls.gov/ggs/ BLS USA (accessed May 2016).

  • Carpenter B. 2010. “Integrating Out Multinomial Parameters in Latent Dirichlet Allocation and Naive Bayes for Collapsed Gibbs Sampling.” Available at: https://lingpipe.files.wordpress.com/2010/07/lda3.pdf (accessed May 2016).

  • Chang J. and A. Dai. 2015. “‘Package-lda’: Collapsed Gibbs Sampling Methods for Topic Models.” Available at: https://cran.r-project.org/web/packages/lda/lda.pdf (accessed May 2016).

  • Department of Environment. 2013. “Construction Activity Completion Statistics.” Available at: http://www.housing.gov.ie/housing/statistics/house-building-and-private-rented/construction-activity-completions Ireland (accessed April 2016).

  • Eurostat. 2015. “A Practical Guide for the Compilation of Environmental Goods and Services (EGSS) Accounts.” Unit E2 Eurostat Luxembourg. Doi: https://doi.org/10.2785/688181.

  • EU-691. 2011. “EU REGULATION (EU) No 691/2011 OF THE EUROPEAN PARLIAMENT AND OF THE COUNCIL of 6 July 2011 on European environmental economic accounts.” Available at: https://eur-lex.europa.eu/legal-content/EN/ALL/?uri=CELEX%3A32011R0691 (accessed 2013).

  • Geman S. and D. Geman. 1984. “Stochastic Relaxation Gibbs Distributions and the Bayesian Restoration of Images.” IEEE Transactions on Pattern Analysis and Machine Intelligence 6: 721–741.

  • Griffiths T. and M. Steyvers. 2004. “Finding Scientific Topics.” Proceedings of the National Academy of Science USA 101 5228–5235.

  • Heinrich G. 2009. “Parameter Estimation for Text Analysis.” Technical Report Fraunhofer IGD Darmstadt Germany. Available at: http://www.arbylon.net/publications/text-est2.pdf (accessed April 2016).

  • Hornik K. and B. Grün. 2011. “topicmodels: An R Package for Fitting Topic Models.” Journal of Statistical Software 40(13): 1 – 30. Doi: https://doi.org/10.18637/jss.v040.i13.

  • McCallum A. 2002. “MALLET: A Machine Learning for Language Toolkit.” Available at: http://mallet.cs.umass.edu (access May 2016).

  • Minka T. and J. Lafferty. 2002. “Expectation-propagation for the Generative Aspect Model.” Proceedings of the Eighteenth Conference on Uncertainty in Artificial Intelligence 352–359 Alberta Canada August 1–4 2002. Available at: https://dl.acm.org/citation.cfm?id=2073918 (accessed April 2016).

  • OECD. 1999. “THE ENVIRONMENTAL GOODS AND SERVICES INDUSTRY Manual for Data Collection and Analysis.” OECD Paris. Available at: https://unstats.un.org/unsd/envaccounting/ceea/archive/EPEA/EnvIndustry_Manual_for_data_collection.PDF (accessed May 2016).

  • ONS. 2015. “UK Environmental Goods and Services Sector (EGSS): 2010–2012.” Available at: http://www.ons.gov.uk/economy/environmentalaccounts/bulletins/ukenvironmentalaccounts/2015-04-15 (accessed October 2014).

  • Ramage D. D. Hall R. Nallapati and C.D. Manning. 2009. “Labeled LDA: A Supervised Topic Model for Credit Attribution in Multi-labeled Corpora.” Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing 248–256 Singapore August 6–7 2009. Available at: https://www.aclweb.org/anthology/D09-1026 (accessed May 2016).

  • RICS. 2016. “The Real Cost of New House Delivery Royal Institute of Charter Surveyors.” Dublin Ireland. Available at: https://www.scsi.ie/documents/get_lob?id=885&field=file (accessed May 2016).

  • SEAI. 2013. “Sustainable Energy Authority of Ireland – Annual Report 2013.” Available at: https://www.seai.ie/Publications/SEAI_Corporate_Publications_/Annual_Reports/SEAI-Annual-Report-2013.pdf (accessed May 2016).

  • Spärck Jones K. 1972. “A Statistical Interpretation of Term Specificity and its Application in Retrieval.” Journal of Documentation 28: 11–21. Doi: https://doi.org/

  • Statistics Estonia. 2015. “Development of the Methodology for the Compilation of Statistics of the Environmental Goods and Services Sector (EGSS) in Estonia.” Statistics Estonia.

  • Wallach H.M. I. Murray R. Salakhutdinov and D. Mimno. 2009. “Evaluation Methods for Topic Models.” Proceedings of the 26-th International Conference on Machine Learning” 1105–1112 Montreal Canada June 14–18 2009.

  • Wang Y. 2008. “Distributed Gibbs Sampling of Latent Topic Models: The Gritty Details.” Available at: https://cxwangyi.files.wordpress.com/2012/01/llt.pdf (accessed May 2016).

Journal information
Impact Factor

IMPACT FACTOR 2018: 0,837
5-year IMPACT FACTOR: 0,934

CiteScore 2018: 1.04

SCImago Journal Rank (SJR) 2018: 0.963
Source Normalized Impact per Paper (SNIP) 2018: 1.020

All Time Past Year Past 30 Days
Abstract Views 0 0 0
Full Text Views 42 42 42
PDF Downloads 41 41 41