Practical Aspects of Log-ratio Coordinate Representations in Regression with Compositional Response

Open access

Abstract

Regression analysis with compositional response, observations carrying relative information, is an appropriate tool for statistical modelling in many scientific areas (e.g. medicine, geochemistry, geology, economics). Even though this technique has been recently intensively studied, there are still some practical aspects that deserve to be further analysed. Here we discuss the issue related to the coordinate representation of compositional data. It is shown that linear relation between particular orthonormal coordinates and centred log-ratio coordinates can be utilized to simplify the computation concerning regression parameters estimation and hypothesis testing. To enhance interpretation of regression parameters, the orthogonal coordinates and their relation with orthonormal and centred log-ratio coordinates are presented. Further we discuss the quality of prediction in different coordinate system. It is shown that the mean squared error (MSE) for orthonormal coordinates is less or equal to the MSE for log-transformed data. Finally, an illustrative real-world example from geology is presented.

[1] Aitchison, J. (1986). The Statistical Analysis of Compositional Data. Chapman and Hall (Reprinted in 2003 with additional material by The Blackburn Press).

[2] Aitchison, J., Barceló-Vidal, C., Martín-Fernández, J.A., Pawlowsky-Glahn, V. (2000). Logratio analysis and compositional distance. Mathematical Geology, 32(3), 271–275.

[3] Bábek, O., Matys Grygar, T., Faměra, M., Hron, K., Nováková, T., Sedláček, J. (2015). Geochemical background in polluted river sediments: How to separate the effects of sediment provenance and grain size with statistical rigour? Catena, 135, 240–253.

[4] Billheimer, D., Guttorp, P., Fagan, W. (2001). Statistical interpretation of species composition. Journal of the American Statistical Association, 96(456), 1205–1214.

[5] Eaton, M.L. (1983). Multivariate Statistics: A Vector Space Approach. John Wiley & Sons.

[6] Egozcue, J.J., Pawlowsky-Glahn, V., Mateu-Figueras, G., Barceló-Vidal, C. (2003). Isometric logratio transformations for compositional data analysis. Mathematical Geology, 35, 279–300.

[7] Egozcue, J.J., Pawlowsky-Glahn, V. (2005). Groups of parts and their balances in compositional data analysis. Mathematical Geology, 37(7), 795–828.

[8] Egozcue, J.J., Pawlowsky-Glahn, V., Daunis-i-Estadella, J., Hron, K., Filzmoser, P. (2012). Simplicial regression. The normal model. Journal of Applied Probability and Statistics, 6, 87–106.

[9] Filzmoser, P., Hron, K. (2015). Robust coordinate for compositional data using weighted balances. In Modern nonparametric, robust and multivariate methods. Springer, 167–184.

[10] Ferrer-Rosell, B., Coenders, G., Mateu-Figueras, G., Pawlowsky-Glahn, V. (2016). Understanding low cost airline users’ expenditure pattern and volume. Tourism Economics, 22, 269–291.

[11] Harville, D.A. (1997). Matrix Algebra From a Statistician’s Perspective. Springer.

[12] Hron, K., Filzmoser, P., Thompson, K. (2012). Linear regression with compositional explanatory variables. Journal of Applied Statistics, 39(5), 1115–1128.

[13] Johnson, R.A, Wichern, D.W (2007). Applied Multivariate Statistical Analysis (6th Edition). Pearson.

[14] Kalivodová, A., Hron, K., Filzmoser, P., Najdekr, L., Janečková, H., Adam, T. (2015). PLS-DA for compositional data with application to metabolomics. Journal of Chemometrics, 29(1), 21–28.

[15] Kleinman, D.L., Athans, M. (1968). The design of suboptimal linear time-varying systems. IEEE Transactions on Automatic Control, AC-13, 150–159.

[16] Kubáček, L. (2008). Multivariate statistical models revisited. Olomouc, Czech Republic: Palacký University.

[17] Lovell, D., Müller, W., Taylor, J., Zwart, A., Helliwell, C. (2011). Proportions, percentages, PPM: Do the molecular biosciences treat compositional data right? In Compositional data analysis: Theory and applications. Wiley, 193–207.

[18] Martín-Fernández, J.A., Daunis-i-Estadella, J., Mateu-Figueras, G. (2015). On the interpretation of differences between groups for compositional data. Statistics and Operations Research Transactions, 39, 231–252.

[19] Mateu-Figueras, G., Pawlowsky-Glahn, V. (2008). Critical approach to probability laws in geochemistry. Mathematical Geosciences, 40(5), 489–502.

[20] Mateu-Figueras, G., Pawlowsky-Glahn, V., Egozcue, J.J. (2011). The principle of working on coordinates. In Compositional data analysis: Theory and applications. Wiley, 31–42.

[21] Matys Grygar, T., Elznicová, J., Bábek, O., Hošek, M., Engel, Z., Kiss, T. (2014). Obtaining isochrones from pollution signals in a fluvial sediment record: A case study in a uranium-polluted floodplain of the Ploučnice River, Czech Republic. Appl Geochem, 48, 1–15.

[22] Müller, I., Hron, K., Fišerová, E., Šmahaj, J., Cakirpaloglu, P., Vančáková, J. (2016). Time budget analysis using logratio methods. arXiv:1609.07887 [math.ST].

[23] Pawlowsky-Glahn, V., Egozcue, J.J. (2001). Geometric approach to statistical analysis on the simplex. Stochastic Environmental Research and Risk Assessment (SERRA), 15(5), 384–398.

[24] Pawlowsky-Glahn, V., Egozcue, J.J., Lovell, D. (2015). Tools for compositional data with a total. Statistical Modelling, 15(2), 175–190.

[25] Pawlowsky-Glahn, V., Egozcue, J.J., Tolosana-Delgado, R. (2015). Modeling and analysis of compositional data. Wiley.

[26] Sedláček, J., Bábek, O., Kielar, O. (2016). Sediment accumulation rates and high-resolution stratigraphy of recent fluvial suspension deposits in various fluvial settings, Morava River catchment area, Czech Republic. Geomorphology, 254, 73–87.

[27] Sedláček, J., Bábek, O., Nováková, T. (2016). Sedimentary record and anthropogenic pollution of a step-wise filled, multiple source fed dam reservoir: An example from Nové Mlýny reservoir, Czech Republic. Science of the Total Environment, DOI: 10.1016/j.scitotenv.2016.08.127.

[28] Templ, M., Hron, K., Filzmoser, P. (2016). Exploratory tools for outlier detection in compositional data with structural zeros. Journal of Applied Statistics. DOI: 10.1080/02664763.2016.1182135.

[29] van den Boogaart, K.G., Tolosana-Delgado, R. (2013). Analyzing Compositional Data with R. Springer.

[30] Wang, H., Shangguan, L., Wu, J., Guan, R. (2013). Multiple linear regression modeling for compositional data. Neurocomputing, 122, 490–500.

Measurement Science Review

The Journal of Institute of Measurement Science of Slovak Academy of Sciences

Journal Information


IMPACT FACTOR 2017: 1.345
5-year IMPACT FACTOR: 1.253



CiteScore 2017: 1.61

SCImago Journal Rank (SJR) 2017: 0.441
Source Normalized Impact per Paper (SNIP) 2017: 0.936

Cited By

Metrics

All Time Past Year Past 30 Days
Abstract Views 0 0 0
Full Text Views 103 103 20
PDF Downloads 34 34 5