The Use of Log-linear Analysis for Pregnancy Prediction

Abstract Log-linear analysis is a practical tool for examining relationships, successfully applied in many fields of science. This paper discusses the topic of estimation of the chance of getting pregnant in couples that underwent ART insemination. The authors focus on finding significant interactions between variables, on the basis of which statistical models are built. With the use of results of log-linear analysis, a model predicting the chances of achieving a clinical pregnancy that contained interactions was successfully built. Moreover, it was more complete than the model obtained with the use of logistic regression alone.


Introduction
There are numerous phenomena in medicine that cannot be explained by means of simple relationships. Some of them, such as the occurrence of a pregnancy in reproductive medicine, are exceptionally complex and in many cases it is still impossible to answer the question why a woman fails to become pregnant despite a wide range of diagnostics methods and the support of state-of-the-art medicine. According to Jerzy Radwan (Radwan & Wołczyński, 2011), insemination can be likened to a lottery, in which the probability of a win changes monthly and may vary in different couples. A large number of analyses aimed at the estimation of the probability of achieving a pregnancy have been conducted. More advanced statistical Anna Justyna Milewska et al. methods such as data-mining (Milewska et al., 2014) and statistical models (Milewska et al., , 2017 are used in such cases. This paper discusses the problem of estimating the chances of getting pregnant in couples that underwent ART insemination. The authors focus on finding significant interactions between variables on the basis of which statistical models are built.

Log-linear Analysis
Log-linear analysis is one of the methods of multi-dimensional data analysis, often referred to as log-linear model. The method is used when all variables of interest are presented on a qualitative scale, while their relationship is presented in the form of a contingency table.
In the analysis in question, log-linear model is understood as an expression of expected frequencies (E ij ) in the form of a function of parameters that represent the characteristics of discrete variables and the interactions taking place between them. The aim is to choose the model with the lowest possible number of parameters, which would at the same time be characterized with a good fit to data (Brzezińska, 2012).
Let as assume that X and Y are discreet variables with empirical frequencies n ij in a contingency table with i rows and j columns. In this case, a complete additive model for two variables can be described with the following equation (Stanisz, 2007): where: E ij -expected values M -total logarithmic mean for all cells, calculated according to the following formula: ij -effect of interaction between i-th category of variable X and j-th category of variable Y .
Moreover, the parameters of this model must meet the following conditions:

The Use of Log-linear Analysis for Pregnancy Prediction
Hierarchical log-linear models are selected according to one of the stepwise procedures: forward selection or backward elimination. The basic statistics used for the assessment of the goodness of fit of a log-linear model is the chi-square statistics and likelihood ratio. Other criteria are information criteria, i.e. AIC and BIC. The lowest value of the aforementioned measures indicates the model best fitting the data (Brzezińska, 2012).

The Use of Log-linear Analysis in Scientific Research
Log-linear analysis is a practical tool for examining relationships, successfully applied in many fields of science. Log-linear models have been used in demographic studies, inter alia, for the analysis of the causes of mortality in European Union countries (Brzezińska, 2012). Their use allowed to identify three groups of countries according to the causes of mortality: those where the main causes of mortality are diseases of the nervous system, cancers, suicides, and alcoholism; countries where the main causes of mortality are diabetes, AIDS, and drugs; and countries where the main causes of mortality are cardiovascular diseases, accidents, and homelessness. Moreover, a complete model, best fitting the data, showed a significant role of interactions between variables. In addition, the obtained results of loglinear analysis were verified by means of correspondence analysis and taxonomic methods. Log-linear analysis has also been successfully used in the study of socioeconomic and demographic factors influencing child mortality in Nigeria (Adarabioyo, 2014). The results showed that those children whose parents are better educated and have a higher income have better chances of survival, as well as inhabitants of urban areas or the southern part of Nigeria, and girls. Furthermore, log-linear analysis methods found their use in psychology, for the analysis of the relationship between selected psychological factors and the intensity of alcohol consumption among students, among others (Półtorak, 2007). In this case, only after a large number of log-linear models were considered was it possible to study the problem in greater depth and level of detail and to avoid hasty conclusions. In another study, the use of the log-linear model for the analysis of the quality of students' lives allowed to identify not only the main factors influencing it, i.e. the variables that characterize students' professional plans and those related to social situation, but mainly the significant influence of two-factor interactions on the quality of students' lives (Depta & Staniec, 2014). Log-linear analysis has been used numerous times in economic studies, for the selection of the factors that determine cooper-ation in the area of technological entrepreneurship, among others (Staniec & Żółtowski, 2016). The models presented in the paper showed that what is important as far as choosing a partner for cooperation is concerned are not so much the principal factors as interaction of the second and third order. In the case of technological entrepreneurship, it is the perception of interactions between selected factors that has a significant impact on decisions concerning cooperation. Log-linear models have also been used for the selection of factors describing the economic situation of households (Salamaga, 2008); the selection of factors determining expenditures of pensioners' households on recreation and culture (Bąk, 2013); the selection of factors determining the price attractiveness of apartments on the secondary market (Foryś, 2012); the estimation of models of residential property prices (Tomczyk & Widłak, 2010); or the analysis of unemployment in Poland (Brzezińska, 2014).
Log-linear analysis has also found its use in veterinary medicine (Kass et al., 1985) and medicine (Ogus & Yazici, 2011). Log-linear models, used in studies of primary and/or secondary cleft palate, made it possible to show interdependencies of the occurrence of cleft palate in congenital defects syndromes or in isolation as well as the place of residence and category of defect (Kaczmarek & Małkiewicz, 2005). The use of log-linear analysis for the examination of health condition, functioning, and disability of elderly persons allowed to indicate not only the factors, but mainly the interactions between the variables that determine the occurrence of disability (Ćwirlej-Sozańska et al., 2016). Log-linear models have also been successfully used for the determination of relationships between clinical and individual factors in pneumonia patients (Zam & Tiensuwan, 2018). Various three-dimensional log-linear models were created for this purpose and then the best model was selected. On the basis of the obtained results, it was shown that most of the variable pairs are significantly interdependent. A hospital ward is closely associated with the pneumonia type, age, status of last contact, and region; moreover, the length of hospital stay has association with status of last contact. Log-linear analysis has also been used to examine the relationships between clinical and individual variables in cancer patients (Tiensuwan et al., 2005). Both in the case of men and women, cancer location is strongly associated with marital status, diagnostic evidence, and treatment. What is more, cancer location also influences the method of diagnostics and treatment. As cancer location is sex-dependent, preventive care of locations that vary in either sex should be considered. Log-linear models have also been used to detect interactions in a multi-dimensional set of genomic data (Hu et al., 2009). The used methods allowed to detect gene interactions that play a key role in the development and progression of breast cancer.

Log-linear Analysis as a Tool for Modelling the Chance of Getting Pregnant in the Infertility Treatment Process
The aim of this paper was to create possibly the best model estimating the chances of biochemical and clinical pregnancy in patients treated for infertility.
STATA 12.0 and STATISTICA 13.1 software were used for the statistical analysis. Pearson's chi-square test was used for the assessment of relationships between qualitative variables. Log-linear analysis was used to build a model and indicate interactions between variables; a logistic regression model was also built. Results at a level of p < 0.05 were considered statistically significant.
In the course of the study, data describing 1268 in vitro fertilization cycles with the ICSI method, conducted in an infertility treatment centre in the USA, was analysed. All the analysed variables were of the qualitative character: woman's age, number of mature oocytes collected during puncture and ready for insemination -MII oocytes, number of transferred embryos (ET), biochemical pregnancy, clinical pregnancy (definitions, Radwan & Wołczyński, 2011) - Table 1. The data was characterized by a high It is currently assumed that the percentage of pregnancies achieved after ICSI fertilization oscillates around 40% . The initial analysis of relationships was performed by means of Pearson's chi-square test. A statistically significant relationship between pregnancy and other variables, i.e. mother's age and the number of collected oocytes, was obtained both for biochemical and clinical pregnancy. On the other hand, there was no statistically significant relationship with the number of transferred embryos in either type of pregnancy (Table 2). What is worth emphasizing is the highly significant negative influence of mother's age on pregnancy (Milewski et al., 2008). In the first approach, log-linear analysis was used for the following variables: age, MII oocytes, ET, and clinical pregnancy. The first step of the analysis consists of obtaining the results of fitting k-factor interactions to the model (Table 3). The obtained significances for the individual rows make it possible to include the main effects (first row), two-factor interactions (second row), and three-factor interactions (third row) in the model. The second step in building the log-linear model consists of tests of all marginal and partial associations. On the basis of the results presented in Table 4, the following effects should be included in the model: (12), (13), (14), (23), (34), (123), (234). The individual effects indicate relationships between variables numbered 1 to 4. In this case, variable 1 is age, variable 2 is ET, variable 3 are MII oocytes, while variable 4 is clinical pregnancy. The best model was selected by means of the following option available in the statistical package: Automatic selection of the best model. The obtained log-linear model contained the following effects: (321), (432), (41). This result can be interpreted as an indication of the existence of an interaction between the number of MII oocytes and the number of transferred embryos, and age (321); an interaction between clinical pregnancy and the number of MII oocytes, and the number of transferred embryos (432); and an interaction between clinical pregnancy and age (41).
When creating a logistic regression model, the interactions of interest are those associated with clinical pregnancy (variable 4). In this case, these will be the following relationships: clinical pregnancy with the number of MII oocytes, and the number of transferred embryos (432); clinical pregnancy with age (41). At the beginning, univariate logistic regression models were built as well as a multivariate model without interactions containing only two statistically significant independent variables: age and MII oocytes ( Figure 1). Then, after including the interaction indicated by log-linear analysis in the model, a multivariate logistic regression model with interaction, incorporating all the analysed variables, was obtained ( Figure 2).

The Use of Log-linear Analysis for Pregnancy Prediction
In the second approach, log-linear analysis was used for the following variables: age, MII oocytes, ET, and biochemical pregnancy. Table 5 shows the results of fitting k-factor interactions to the model. The obtained significances make it possible to include the main effects (first row), twofactor interactions (second row), and three-factor interactions (third row) in the model.  Table 6 shows the results of assessment of marginal and partial associations. On their basis, the following effects should be included in the model: (12), (13), (14), (23), (34), (123). In this case, variable 1 is age, variable 2 is ET, variable 3 are MII oocytes, while variable 4 is biochemical pregnancy. The obtained best log-linear model contained the following effects: (321), (41), (43). Log-linear analysis indicated the existence of interaction between the number of MII oocytes and the number of transferred embryos, and age (321); an interaction between biochemical pregnancy and age (41); and an interaction between biochemical pregnancy and the number of MII oocytes (43).
When creating a logistic regression model, it is important to have interactions that are associated with biochemical pregnancy (variable 4). In this case, these are only the following relationships: biochemical pregnancy with age (41) and biochemical pregnancy with the number of MII oocytes (43). Unfortunately, two-factor associations containing the pregnancy variable (dependent variable) cannot be included into the logistic regression model. Figure 3 shows univariate logistic regression models and a multivariate model, which unfortunately does not contain any interaction.

Conclusions
Log-linear analysis is a helpful tool for multi-dimensional data analysis. Information about interaction obtained with the use of the method may be applied in further modelling. Thanks to the use of the results of log-linear