An alternative methodology for imputing missing data in trials with genotype-by-environment interaction: some new aspects

Open access

Abstract

A common problem in multi-environment trials arises when some genotypeby- environment combinations are missing. In Arciniegas-Alarcón et al. (2010) we outlined a method of data imputation to estimate the missing values, the computational algorithm for which was a mixture of regression and lower-rank approximation of a matrix based on its singular value decomposition (SVD). In the present paper we provide two extensions to this methodology, by including weights chosen by cross-validation and allowing multiple as well as simple imputation. The three methods are assessed and compared in a simulation study, using a complete set of real data in which values are deleted randomly at different rates. The quality of the imputations is evaluated using three measures: the Procrustes statistic, the squared correlation between matrices and the normalised root mean squared error between these estimates and the true observed values. None of the methods makes any distributional or structural assumptions, and all of them can be used for any pattern or mechanism of the missing values.

References

  • Arciniegas-Alarcón S., García-Peña M., Dias C.T.S. (2011): Data imputation in trials with genotype×environment interaction. Interciencia 36(6): 444-449.

  • Arciniegas-Alarcón S., García-Peña M., Dias C.T.S., Krzanowski W.J. (2010): An alternative methodology for imputing missing data in trials with genotypeby- environment interaction. Biometrical Letters 47(1): 1-14.

  • Bergamo G.C., Dias C.T.S., Krzanowski W.J. (2008): Distribution-free multiple imputation in an interaction matrix through singular value decomposition. Scientia Agricola 65(4): 422-427.

  • Calinski T., Czajka S., Kaczmarek Z., Krajewski P., Pilarczyk W. (2009): Analyzing the Genotype-by-Environment Interactions Under a Randomization- Derived Mixed Model. Journal of Agricultural, Biological and Environmental Statistics 14(2): 224-241.

  • Ching W., Li L., Tsing N., Tai C., Ng T. (2010): A weighted local least squares imputation method for missing value estimation in microarray gene expression data. International Journal of Data Mining and Bioinformatics 4(3): 331-347.

  • Denis J.B., Baril C.P. (1992): Sophisticated models with numerous missing values: the multiplicative interaction model as an example. Biuletyn Oceny Odmian 24-25: 33-45.

  • Di Ciaccio A. (2011): Bootstrap and nonparametric predictors to impute missing data. In: B. Fichet et al. (eds.), Classification and Multivariate Analysis for Complex Data Structures, Studies in Classification, Data Analysis, and Knowledge Organization. Springer-Verlag Berlin Heidelberg.

  • Dias C.T.S., Krzanowski W.J. (2003): Model selection and cross validation in additive main effect and multiplicative interaction models. Crop Science 43: 865-873.

  • Gabriel K.R. (2002): Le biplot - outil d’exploration de données multidimensionelles. Journal de la Société Française de Statistique 143(3-4): 5-55.

  • García-Peña M., Dias C.T.S. (2009): Analysis of bivariate additive models with multiplicative interaction (AMMI). Biometric Brazilian Journal 27(4): 586-602.

  • Gauch H.G. (2013): A simple protocol for AMMI analysis of yield trials. Crop Science 53: 1860-1869.

  • Gauch H.G., Zobel R.W. (1990): Imputing missing yield trial data. Theoretical and Applied Genetics 79: 753-761.

  • Josse J., Pagès J., Husson F. (2011): Multiple imputation in PCA. Advances in data analysis and classification 5(3): 231-246.

  • Josse J., Husson F. (2012): Handling missing values in exploratory multivariate data analysis methods. Journal de la Société Française de Statistique 153(2): 79-99.

  • Krzanowski W.J. (1988): Missing value imputation in multivariate data using the singular value decomposition of a matrix. Biometrical Letters XXV(1-2): 31-39.

  • Krzanowski W.J. (2000): Principles of multivariate analysis: A user’s perspective. Oxford: University Press.

  • Kroonenberg P.M. (2008): Applied multiway data analysis. John Wiley & Sons.

  • Kumar A., Verulkar S.B., Mandal N.P., Variar M., Shukla V.D., Dwivedi J.L., Singh B.N., Singh O.N., Swain P., Mall A.K., Robin S., Chandrababu R., Jain A., Haefele S.M., Piepho H.P., Raman A. (2012): High-yielding, droughttolerant, stable rice genotypes for the shallow rainfed lowland droughtprone ecosystem. Field Crops Research 133: 37-47.

  • Little R., Rubin D. (2002): Statistical analysis with missing data. 2nd ed. John Wiley & Sons, New York, NY.

  • Paderewski J., Rodrigues P.C. (2014): The usefulness of EM-AMMI to study the influence of missing data pattern and application to Polish post-registration winter wheat data. Australian Journal of Crop Science 8: 640-645.

  • Piepho H.P. (1995): Methods for estimating missing genotype-location combinations in multilocation trials - an empirical comparison. Informatik Biometrie und Epidemiologie in Medizin und Biologie 26: 335-349.

  • Piepho H.P., Möhring J. (2006): Selection in cultivar trials - Is it ignorable? Crop Science 46: 192-201.

  • R Development Core Team (2013): R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. ISBN 3-900051-07-0. http://www.R-project.org/

  • Rodrigues P., Pereira D.G.S., Mexia J.T. (2011): A comparison between joint regression analysis and the additive main and multiplicative interaction model: the robustness with increasing amounts of missing data. Scientia Agricola 68(6): 679-686.

  • Rubin D.B. (1978): Multiple imputation in sample surveys: a phenomenological Bayesian approach to nonresponse. In: Survey Research Methods Section Of The American Statistical Association. Proceedings: 20-34.

  • Sabaghnia N., Karimizadeh R., Mohammadi M. (2012): Model selection in additive main effect and multiplicative interaction model in durum wheat. Genetika 44(2): 325-339.

  • Schafer J.L., Graham J.W. (2002): Missing data: our view of the state of the art. Psychological Methods 7(2): 147-177.

  • van Buuren S. (2012): Flexible imputation of missing data. CRC press.

  • Wright K. (2012): agridat: Agricultural datasets. R package version 1.4. http://CRAN.R-project.org/package=agridat>

  • Yan W., Pageau D., Frégeau-Reid J., Durand J. (2011): Assessing the representativeness and repeatability of test locations for genotype evaluation. Crop Science 51: 1603-1610.

  • Yan W. (2013): Biplot analysis of incomplete two-way data. Crop Science 53(1): 48-57.

Biometrical Letters

The Journal of Polish Biometric Society

Journal Information

Metrics

All Time Past Year Past 30 Days
Abstract Views 0 0 0
Full Text Views 11 11 11
PDF Downloads 3 3 3