Augmenting Statistical Data Dissemination by Short Quantified Sentences of Natural Language

Open access

Abstract

Data from National Statistical Institutes is generally considered an important source of credible evidence for a variety of users. Summarization and dissemination via traditional methods is a convenient approach for providing this evidence. However, this is usually comprehensible only for users with a considerable level of statistical literacy. A promising alternative lies in augmenting the summarization linguistically. Less statistically literate users (e.g., domain experts and the general public), as well as disabled people can benefit from such a summarization. This article studies the potential of summaries expressed in short quantified sentences. Summaries including, for example, “most visits from remote countries are of a short duration” can be immediately understood by diverse users. Linguistic summaries are not intended to replace existing dissemination approaches, but can augment them by providing alternatives for the benefit of diverse users of official statistics. Linguistic summarization can be achieved via mathematical formalization of linguistic terms and relative quantifiers by fuzzy sets. To avoid summaries based on outliers or data with low coverage, a quality criterion is applied. The concept based on linguistic summaries is demonstrated on test interfaces, interpreting summaries from real municipal statistical data. The article identifies a number of further research opportunities, and demonstrates ways to explore those.

If the inline PDF is not rendering correctly, you can download the PDF file here.

  • Adolfsson C. G. Arvidson P. Gidlund A. Norberg and L. Nordberg. 2010. “Development and Implementation of Selective Data Editing at Statistics Sweden.” In Proceedings of the European Conference on Quality in Official Statistics May 4 2010. Helsinki Available at: https://q2010.stat.fi/media//presentations/Norberg_et_all__Statistics_Sweden_slutversion.pdf (accessed April 2017).

  • Almeida R.J. M-J. Lesot B. Bouchon-Meunier U. Kaymak and G. Moyse. 2013. “Linguistic Summaries of Categorical Time Series Septic Shock Patient Data.” In Proceedings of the 2013 IEEE International Conference on Fuzzy Systems (FUZZ-IEEE 2013) July 7–10 2013. 1–8. Hyderabad.

  • Altin L. M. Tiru E. Saluveer and A. Puura. 2015. “Using Passive Mobile Positioning Data in Tourism and Population Statistics.” In Proceedings of the New Techniques and Technologies in Statistics (NTTS 2015) March 10–12 2015. Brussels. Available at: https://ec.europa.eu/eurostat/cros/system/files/Altin-etal_abstract_ntts_2301LA_0.pdf (accessed January 2017).

  • Arguelles L. and G. Triviño. 2013. “I-struve: Automatic Linguistic Descriptions of Visual Double Stars.” Engineering Applications of Artificial Intelligence 26: 2083–2092. Doi: http://dx.doi.org/10.1016/j.engappai.2013.05.005.

  • Barcaroli G. M. Scannapieco D. Summa and M. Scarnò. 2015. “Using Internet as a Data Source for Official Statistics: a Comparative Analysis of Web Scraping Technologies.” In Proceedings of the New Techniques and Technologies in Statistics (NTTS 2015) March 10–12 2015. Brussels. Available at: https://ec.europa.eu/eurostat/cros/system/files/Barcaroli-etal_WebScraping_Final_unblinded.pdf (accessed February 2017).

  • Bavdaž M. (editor). 2011. Final Report Integrating Findings on Business Perspectives Related to NSIs Statistics. Brussels: European Commission. (Deliverable 3.2 from FP7 project BLUE-Enterprise and Trade Statistics). Blue-Ets Project: SSH-CT-2010-244767.

  • Bier V. and P. Nymand-Andersen. 2011. “Communicating Statistics to Frequent Users – One Size Fits All?” In Proceedings of the Committee for the Coordination of Statistical Activities (CCSA Special Session) September 8 2011. Luxembourg.

  • Boran F.E. D. Akay and R.R. Yager. 2016. “An Overview of Methods for Linguistic Summarization with Fuzzy Sets.” Expert Systems with Applications 61: 356–377. Doi: http://dx.doi.org/10.1016/j.eswa.2016.05.044.

  • Bosc P. and O. Pivert. 1995. “SQLf: a Relational Database Language for Fuzzy Querying.” IEEE Transactions on Fuzzy Systems 3: 1–17. Doi: http://dx.doi.org/10.1109/91.366566.

  • Coddington M. 2015. “Clarifying Journalism’s Quantitative Turn.” Digital Journalism 3: 331–348. Doi: http://dx.doi.org/10.1080/21670811.2014.976400.

  • Disability Rights Commission. 2004. The Web Access and Inclusion for Disabled People – A Formal Investigation conducted by the Disability Rights Commission. London: TSO. Available at: https://www.city.ac.uk/__data/assets/pdf_file/0004/72670/DRC_Report.pdf (accessed May 2018).

  • Duraj A. P.S. Szczepaniak and J. Ochelska-Mierzejewska. 2015. “Detection of Outlier Information Using Linguistic Summarization.” In Proceedings of the 11th International Conference Flexible Query Answering Systems (FQAS 2015) October 26–28 2015. 101–113. Cracow.

  • EU Guide. 2015. User guide to the SME Definition. Luxembourg: Publications Office of the European Union. Available at: http://ec.europa.eu/growth/tools-databases/newsroom/cf/itemdetail.cfm?item_id=8274&lang=en (accessed November 2016).

  • Galindo J. A. Urrutia and M. Piattini. 2006. Fuzzy Databases––Modeling. Design and Implementation. Hershey: Idea Group Publishing.

  • George R. and R. Srikanth. 1996. “Data Summarization Using Genetic Algorithms and Fuzzy Logic.” In Genetic Algorithms and Soft Computing edited by F. Herrera and J.L. Verdegay 599–611. Heidelberg: Physica–Verlag.

  • Glöckner I. 2006. Fuzzy Quantifiers – A Computational Theory. Berlin Heidelberg: Springer-Verlag.

  • GSIM. 2013. Generic Statistical Information Model (GSIM): Specification. Geneva: United Nations Economic Commission for Europe (UNECE). Available at: http://www1.unece.org/stat/platform/display/gsim/GSIM+Specification (accessed February 2017).

  • Goebel R. A. Chander K. Holzinger F. Lecue Z. Akata S. Stumpf P. Kieseberg and A. Holzinger. 2018. “Explainable AI: The New 42?” In Machine Learning and Knowledge Extraction Springer Lecture Notes in Computer Science LNCS 11015 edited by A. Holzinger P. Kieseberg A. Tjoa and E. Weippl 295–303. Cham: Springer.

  • Graefe A. 2016. Guide to Automated Journalism. New York: Tow Center for Digital Journalism. Available at: https://www.cjr.org/tow_center_reports/guide_to_automated_journalism.php (accessed April 2018).

  • Heimgärtner R. A. Holzinger and R. Adams. 2008. “From Cultural to Individual Adaptive End-User Interfaces: Helping People with Special Needs.” In Proceedings of the 11th International Conference on Computers Helping People with Special Needs (ICCHP 2008) July 9–11 2008. 82–89. Linz.

  • Hirota K. and W. Pedrycz. 1999. “Fuzzy Computing for Data Mining.” Proceedings of IEEE 87: 1575–1600. Doi: http://dx.doi.org/10.1109/5.784240.

  • Holzinger A. 2002. “User-Centered Interface Design for Disabled and Elderly People: First Experiences with Designing a Patient Communication System (PACOSY).” In Proceedings of the 8th International Conference on Computer Helping People with Special Needs (ICCHP 2002) July 15–20 2002. 33–40. Linz.

  • Holzinger A. B. Malle P. Kieseberg P.M. Roth H. Müller R. Reihs and K. Zatloukal. 2017. “Machine Learning and Knowledge Extraction in Digital Pathology needs an integrative approach.” In Towards Integrative Machine Learning and Knowledge Extraction edited by A. Holzinger R. Goebel M. Ferri and V. Palade 13–50. Cham: Springer.

  • Hudec M. 2013. “Improvement of Data Collection and Dissemination by Fuzzy Logic.” In Proceedings of the Joint UNECE/Eurostat/OECD Meeting on the Management of Statistical Information Systems (MSIS) April 22–24 2013. Paris and Bangkok. Available at: http://www.unece.org/fileadmin/DAM/stats/documents/ece/ces/ge.50/2013/Topic_3_Slovakia.pdf (accessed January 2017).

  • Hudec M. 2016. Fuzziness in Information SystemsHow to Deal with Crisp and Fuzzy Data in Selection Classification and Summarization. Cham: Springer.

  • Hudec M. 2017. “Merging Validity and Coverage for Measuring Quality of Data Summaries.” In Information Technology and Computational Physics edited by P. Kulczycki L.T. Kóczy R. Mesiar and J. Kacprzyk 71–85. Cham: Springer.

  • Hudec M. and D. Praženka. 2016. “Collecting and Managing Fuzzy Data in Statistical Relational Databases.” Statistical Journal of the IAOS 32: 245 – 255. Doi: http://dx.doi.org/10.3233/SJI-160956.

  • Hudec M. and V. Torres Van Grinsven. 2013. “Business’ Participants Motivation in Official Surveys by Fuzzy Logic.” In Proceedings of the 1st Eurasian Multidisciplinary Forum (EMF 2013) October 24–26 2013. 42–52. Tbilisi.

  • Kacprzyk J. and P. Strykowski. 1999. “Linguistic Data Summaries for Intelligent Decision Support.” In Proceedings of the fourth European Workshop on Fuzzy Decision Analysis and Recognition Technology for Management Planning and Optimization (EFDAN 1999) June 14–15 1999. 3–12. Dortmund.

  • Kacprzyk J. A. Wilbik and S. Zadroz˙ny. 2006. “Linguistic Summarization of Trends: A Fuzzy Logic Based Approach.” In Proceedings of the 11th Information Processing and Management of Uncertainty in Knowledge Based Systems (IPMU 2006) July 2–7 2006. 2166–2172. Paris.

  • Kacprzyk J. and R.R. Yager. 2001. “Linguistic Summaries of Data Using Fuzzy Logic.” International Journal of General Systems 30: 133–154. Doi: http://dx.doi.org/10.1080/03081070108960702.

  • Kacprzyk J. and S. Zadroz˙ny. 1995. “FQUERY for Access: Fuzzy Querying for Windows-Based DBMS.” In Fuzziness in Database Management Systems edited by P. Bosc and J. Kacprzyk 415–433. Heidelberg: Physica-Verlag.

  • Kacprzyk J. and S. Zadroz˙ny. 2005. “Linguistic Database Summaries and Their Protoforms: Towards Natural Language Based Knowledge Discovery Tools.” Information Sciences 173: 281–304. Doi: http://dx.doi.org/10.1016/j.ins.2005.03.002.

  • Kacprzyk J. and A. Ziółkowski. 1986. “Database Queries with Fuzzy Linguistic Quantifiers.” IEEE Transactions Systems Man and Cybernetics SMC-16 3: 474–479. Doi: http://dx.doi.org/10.1109/tsmc.1986.4308982.

  • Klement E.P. R. Mesiar and E. Pap. 2005. “Triangular Norms: Basic Notions and Properties.” In Logical Algebraic Analytic and Probabilistic Aspects of triangular Norms edited by E.P. Klement and R. Mesiar 17–60. Amsterdam: Elsevier.

  • Lesot M-J. G. Moyse and B. Bouchon-Meunier. 2016. “Interpretability of Fuzzy Linguistic Summaries.” Fuzzy Sets and Systems 292: 307 – 317. Doi: http://dx.doi.org/10.1016/j.fss.2014.10.019.

  • Liu B. 2011. “Uncertain Logic for Modeling Human Language.” Journal of Uncertain Systems 5: 3–20. Available at: www.jus.org.uk (accessed September 2012).

  • Meyer A. and H.J. Zimmermann. 2011. “Applications of Fuzzy Technology in Business Intelligence.” International Journal of Computers Communications & Control VI(3): 428–441. Doi: http://dx.doi.org/10.15837/ijccc.2011.3.2128.

  • Moyse G. M-J. Lesot and B. Bouchon-Meunier. 2013. “Mathematical Morphology Tools to Evaluate Periodic Linguistic Summaries.” In Flexible Query Answering Systems edited by H.L. Larsen 257–268. Berlin Heidelberg: Springer-Verlag.

  • Niewiadomski A. 2002. “Appliance of Fuzzy Relations for Text Documents Comparing.” In Proceedings of the 6th Conference on Neural Networks and Soft Computing (ICNNSC’ 2002) June 11–15 2002. Zakopane.

  • Niewiadomski A. J. Ochelska and P.S. Szczepaniak. 2006. “Interval-Valued Linguistic Summaries of Databases.” Control and Cybernetics 35: 415–443. Available at: http://matwbn.icm.edu.pl/ksiazki/cc/cc35/cc35212.pdf (accessed June 2016).

  • Raschia G. and N. Mouaddib. 2002. “SAINTETIQ: A Fuzzy Set-Based Approach to Database Summarization.” Fuzzy Sets and Systems 129: 137–162. Doi: https://doi.org/10.1016/S0165-0114(01)00197-X.

  • Rasmussen D. and R.R. Yager. 1997. “Summary SQL – A Fuzzy Tool for Data Mining.” Intelligent Data Analysis 1: 49 – 58. Doi: http://dx.doi.org/10.1016/S1088-467X(98)00009-2.

  • Ross M.P. 2009. “Official Statistics in Malta – Implications of Membership of the European Statistical System for a Small Country/NSI.” In Proceedings of the 95th DGINS Conference October 1 2009. Malta. Available at: https://ec.europa.eu/eurostat/documents/1001617/4339944/MPR-opening-address-00909.pdf/7c298770-0869-415c-9833-d702e8b3ce9e (accessed October 2016).

  • Scanu M. and C. Casagrande. 2016. “The Generic Statistical Information Model (GSIM): State of Application of the Standard.” In Workshop on Implementing Standards for Statistical Modernisation 21 – 23 September 2016. Geneva. Available at: https://www.unece.org/fileadmin/DAM/stats/documents/ece/ces/ge.58/2016/mtg4/Paper_17_Italy_-_The_Generic_Statistical_Information_Model__GSIM__and_the_Sistema_Unitario.pdf (accessed March 2017).

  • SDMX. 2012. SDMX 2.1 User Guide SDMX 2.1 Documentation. SDMX Consortium. Available at: https://sdmx.org/?page_id=1119 (Accessed January 2017).

  • Schweizer B. and A. Sklar. 1983. Probabilistic Metric Spaces. Amsterdam: North-Holland.

  • Schield M. 2011. “Statistical Literacy: A New Mission for Data Producers.” Statistical Journal of the IAOS 27: 173–183. Doi: http://dx.doi.org/10.3233/SJI-2011-0732.

  • Smits G. O. Pivert and T. Girault. 2013. “ReqFlex: Fuzzy Queries for Everyone.” In Proceedings of the 39th International Conference on Very Large Data Bases 26–30 August Trento.

  • Torres van Grinsven V. and G. Snijkers. 2015. “Sentiments and Perceptions of Business Respondents on Social Media: An Exploratory Analysis.” Journal of Official Statistics 31: 283–304. Doi: http://dx.doi.org/10.1515/jos-2015-0018.

  • Wu D. J.M. Mendel and J. Joo. 2010. “Linguistic Summarization Using If-Then Rules.” In Proceedings of the 2010 IEEE International Conference on Fuzzy Systems July 18–23 2010. 1–8. Barcelona.

  • Yager R.R. 1982. “A New Approach to the Summarization of Data.” Information Sciences 28: 69–86. Doi: http://dx.doi.org/10.1016/0020-0255(82)90033-0.

  • Yager R.R. 1984. “General Multiple-Objective Decision Functions and Linguistically Quantified Statements.” International Journal of Man-Machine Studies 21: 389–400. Doi: http://dx.doi.org/10.1016/S0020-7373(84)80066-8.

  • Yager R.R. 1988. “On Ordered Weighted Averaging Operators in Multicritera Decision Making.” IEEE Transactions on Systems Man and Cybernetics SMC-18: 183–190. Doi: http://dx.doi.org/10.1080/03081070108960702.

  • Yager R.R. M. Ford and A.J. Canas. 1990. “An Approach to the Linguistic Summarization of Data.” In Proceedings of the 3rd International Conference of Information Processing and Management of Uncertainty in Knowledge-based Systems (IPMU 1990) July 2–6 1990. 456–468. Paris.

  • Zadeh L.A. 1965. “Fuzzy Sets.” Information and Control 8: 338 – 353. Doi: http://dx.doi.org/10.1016/S0019-9958(65)90241-X.

  • Zadeh L.A. 1975. “The Concept of a Linguistic Variable and Its Application to Approximate Reasoning: Part I.” Information Sciences 8: 199 – 249. Doi: http://dx.doi.org/10.1016/0020-0255(75)90036-5.

  • Zadeh L.A. 1983. “A Computational Approach to Fuzzy Quantifiers in Natural Languages.” Computers & Mathematics with Applications 9: 149 –184. Doi: http://dx.doi.org/10.1016/0898-1221(83)90013-5.

  • Zadeh L.A. 2001. “From Computing With Numbers to Computing With Words––From Manipulation of Measurements to Manipulation of Perceptions.” In Computing with Words edited by P. Wang 35–68. New York: Wiley.

  • Zottoli M. S. Laurita and F. Monteleone. 2017. “Contestina: A Visibly Understandable Path toward More Effective Data Dissemination.” In Proceedings of the New Techniques and Technologies in Statistics (NTTS 2017) March 14–16 2017. Brussels. Available at: https://www.conference-service.com/NTTS2017/documents/agenda/data/abstracts/abstract_151.html (accessed May 2017).

Search
Journal information
Impact Factor

IMPACT FACTOR 2018: 0.837
5-year IMPACT FACTOR: 0.934

CiteScore 2018: 1.04

SCImago Journal Rank (SJR) 2018: 0.963
Source Normalized Impact per Paper (SNIP) 2018: 1.020

Cited By
Metrics
All Time Past Year Past 30 Days
Abstract Views 0 0 0
Full Text Views 563 564 58
PDF Downloads 395 395 12