Ewa Skotarczak, Anita Dobek and Krzysztof Moliński
In the literature there can be found a wide collection of correlation and association coefficients used for different structures of data. Generally, some of the correlation coefficients are conventionally used for continuous data and others for categorical or ordinal observations. The aim of this paper is to verify the performance of various approaches to correlation coefficient estimation for several types of observations. Both simulated and real data were analysed. For continuous variables, Pearson’s r2 and MIC were determined, whereas for categorized data three approaches were compared: Cramér’s V, Joe’s estimator, and the regression-based estimator. Two method of discretization for continuous data were used. The following conclusions were drawn: the regression-based approach yielded the best results for data with the highest assumed r2 coefficient, whereas Joe’s estimator was the better approximation of true correlation when the assumed r2 was small; and the MIC estimator detected the maximal level of dependency for data having a quadratic relation. Moreover, the discretization method applied to data with a non-linear dependency can cause loss of dependency information. The calculations were supported by the R packages arules and minerva.
Adam Mieldzioc, Monika Mokrzycka and Aneta Sawikowska
Modern chromatography largely uses the technique of gas chromatography coupled with mass spectrometry (GC–MS). For a set of data concerning the drought resistance of barley, the problem of the characterization of a covariance structure is investigated with the use of two methods. The first is based on the Frobenius norm and the second on the entropy loss function. For the four considered covariance structures – compound symmetry, three-diagonal and penta-diagonal Toeplitz and autoregression of order one – the Frobenius norm indicates the compound symmetry matrix and autoregression of order one as the most relevant, whilst the entropy loss function gives a slight indication in favor of the compound symmetry structure.
Triticale (Triticosecale Wittmack) is obtained through the crossing of wheat (Triticum ssp.) and rye (Secale cereale L.) and is characterized by high yield potential, good health and grain value, and high tolerance to biotic and abiotic stress. Poland is a very important region for progress in triticale breeding, since it is home to most cultivars, and numerous genetic studies on triticale have been carried out. Despite the tremendous interest in triticale among both breeders and researchers, there are no studies assessing the adaptation of cultivars to environmental conditions across growing seasons. This study was conducted to investigate the influence of cultivar, management, location and growing season on grain yield. At the same time, this approach provides a new way to determine whether there is any dependency between the eight seasons, and to find the cause of the yield response to environmental conditions in a given growing season.
Drought reduces crop yields not only in areas of arid climate. The impact of droughts depends on the crop growth stage and soil properties. The frequency of droughts will increase due to climate change. It is important to determine the environmental variables that have the strongest effect on wheat yields in dry years. The effect of soil and weather on wheat yield was evaluated in 2018, which was considered a very dry year in Europe. The winter wheat yield data from 19 trial locations of the Research Center of Cultivar Testing (COBORU), Poland, were used. Soil data from the trial locations, mean air temperature (T) and precipitation (P) were considered as environmental factors, as well as the climatic water balance (CWB). The hydrothermal coefficient (HTC), which is based on P and T, was also used. The effect of these factors on winter wheat yield was related to the weather conditions at particular growth stages. The soil had a greater effect than the weather conditions. CWB, P, T and HTC showed a clear relationship with winter wheat yield. Soil data and HTC are the factors most recommended for models predicting crop yields. In the selection of drought-tolerant genotypes, the plants should be subjected to stress especially during the heading and grain filling growth stages.
The usefulness of combining methods is examined using the example of microarray cancer data sets, where expression levels of huge numbers of genes are reported. Problems of discrimination into two groups are examined on three data sets relating to the expression of huge numbers of genes. For the three examined microarray data sets, the cross-validation errors evaluated on the remaining half of the whole data set, not used earlier for the selection of genes, were used as measures of classifier performance. Common single procedures for the selection of genes—Prediction Analysis of Microarrays (PAM) and Significance Analysis of Microarrays (SAM)—were compared with the fusion of eight selection procedures, or of a smaller subset of five of them, excluding SAM or PAM. Merging five or eight selection methods gave similar results. Based on the misclassification rates for the three examined microarray data sets, for any examined ensemble of classifiers, the combining of gene selection methods was not superior to single PAM or SAM selection for two of the examined data sets. Additionally, the procedure of heterogeneous combining of five base classifiers—k-nearest neighbors, SVM linear and SVM radial with parameter c=1, shrunken centroids regularized classifier (SCRDA) and nearest mean classifier—proved to significantly outperform resampling classifiers such as bagging decision trees. Heterogeneously combined classifiers also outperformed double bagging for some ranges of gene numbers and data sets, but merging is generally not superior to random forests. The preliminary step of combining gene rankings was generally not essential for the performance for either heterogeneously or homogeneously combined classifiers.
Coinfection by Plasmodium species and Toxoplasma gondii in humans is widespread, with its endemic impact mostly felt in the tropics. A mathematical model is formulated as a first-order nonlinear system of ordinary differential equations to describe the coinfection dynamics of malaria-toxoplasmosis in the mainly human and feline susceptible host population in tropical regions. Comprehensive mathematical techniques are applied to show that the model system is bounded, positive and realistic in an epidemiological sense. Also, the basic reproduction number (Romt) of the coinfection model is obtained. It is shown that if Romt < 1, the model system at its malaria-toxoplasmosis absent equilibrium is both locally and globally asymptotically stable. The impact of toxoplasmosis and its treatment on malaria, and vice versa, is studied and analyzed. Sensitivity analysis was performed to investigate the impact of the model system parameters on the reproduction number of the transmission of malaria-toxoplasmosis coinfection. Simulations and graphical illustrations were made to validate the results obtained from the theoretical model.
Tadeusz Caliński, Agnieszka Łacka and Idzi Siatkowski
The main estimation and hypothesis testing procedures are presented for experiments conducted in row-column designs of a certain desirable type. It is shown that, under appropriate randomization, these experiments have the convenient orthogonal block structure. Due to this property, the analysis of experimental data can be performed in a comparatively simple way. Relevant simplifying procedures are indicated. The main advantage of the presented methodology concerns the analysis of variance and related hypothesis testing procedures. Under the adopted approach one can perform these analytical methods directly, not by combining results from analyses based on some stratum submodels. Practical application of the presented theory is illustrated by four examples of real experiments in the relevant row-column designs. The present paper is the third in the projected series of publications concerning the analysis of experiments with orthogonal block structure.
This paper presents some constructions of regular D-optimal weighing designs based on the incidence matrices of a balanced incomplete block design, balanced bipartite weighing design and ternary balanced block design. We determine optimality conditions and relations between the parameters of the design, and give an example.
For square contingency tables with ordered categories, Iki, Tahata and Tomizawa (2012) considered a measure to represent the degree of departure from marginal homogeneity. However, the maximum value of this measure cannot distinguish two kinds of marginal inhomogeneity. The present paper proposes a measure which can distinguish two kinds of marginal inhomogeneity. In particular, the proposed measure is useful for representing the degree of departure from marginal homogeneity when the marginal cumulative logistic model holds.
The effective dose of six herbicidal ionic liquids containing glyphosate [N-(phosphonomethyl)glycine] was investigated. Varied biological activity of the tested compounds was observed depending on the type of cation and targeted plant species. In the case of common lambsquarters, the lowest effective dose was obtained for compounds containing didecyldimethylammonium and di(hydrogenated tallow)dimethylammonium cations. In the case of white mustard, the lowest ED50 and ED90 values were obtained for the reference compound, which contained glyphosate isopropylamine salt. These parameters were determined using dose efficiency curves based on log-logistic models with three or four parameters. The study indicates that ionic liquids with glyphosate may be used as a new form of this herbicide in the future.