Limitations of Cox Proportional Hazards Analysis in Mortality Prediction of Patients with Acute Coronary Syndrome

Abstract The aim of this study was to evaluate the possibility of incorrect assessment of mortality risk factors in a group of patients affected by acute coronary syndrome, due to the lack of hazard proportionality in the Cox regression model. One hundred and fifty consecutive patients with acute coronary syndrome (ACS) and no age limit were enrolled. Univariable and multivariable Cox proportional hazard analyses were performed. The proportional hazard assumptions were verified using Schoenfeld residuals, χ2 test and rank correlation coefficient t between residuals and time. In the total group of 150 patients, 33 (22.0%) deaths from any cause were registered in the follow-up time period of 64 months. The non-survivors were significantly older and had increased prevalence of diabetes and erythrocyturia, longer history of coronary artery disease, higher concentrations of serum creatinine, cystatin C, uric acid, glucose, C-reactive protein (CRP), homocysteine and B-type natriuretic peptide (NT-proBNP), and lower concentrations of serum sodium. No significant differences in echocardiography parameters were observed between groups. The following factors were risk of death factors and fulfilled the proportional hazard assumption in the univariable model: smoking, occurrence of diabetes and anaemia, duration of coronary artery disease, and abnormal serum concentrations of uric acid, sodium, homocysteine, cystatin C and NT-proBNP, while in the multivariable model, the risk of death factors were: smoking and elevated concentrations of homocysteine and NT-proBNP. The study has demonstrated that violation of the proportional hazard assumption in the Cox regression model may lead to creating a false model that does not include only time-independent predictive factors.


Introduction
Despite considerable progress in pharmacological and interventional therapies, cardiovascular diseases (CVD) are the main cause of patient mortality and a great burden on health care systems (National Institutes of Health, 2008). It is estimated that in Poland, CVDs are the cause of death in 43.0% of males and 54.0% of females. Every third male and every tenth female dies for this reason before age 65 (EUROSTAT, 2015). About one million inhabitants in Poland suffer from CVDs and over 168,000 cardiovascular deaths are reported yearly (Minister Zdrowia, 2015). Standardized Mortality Rate (SMR) trends for acute coronary syndrome in Poland, both among males and females, are higher, despite a decreasing trend in the rest of the European Union (EUROSTAT, 2015).
Increased cardiovascular mortality is related to traditional (low physical activity, smoking, diabetes, arterial hypertension, obesity, dyslipidaemia) as well as non-traditional risk factors, such as: chronic kidney disease, anaemia, chronic microinflammation, elevated levels of homocysteine and total bilirubin, and intima-media thickness (IMT) of the carotid artery (Babińska et al., 2005;Gul et al., 2013;Park et al., 2013;Shroff et. al., 2012). There is an ongoing search for new risk factors that increase the precision of cardiovascular risk estimation.
Analyses of the impact of potential factors on patients' survival are typically based on the Cox proportional hazards model, which is one of the most popular regression models in medical research. The model enables one to assess (in the form of an equation) a relationship between patients' survival time and one (univariable) or multiple (multivariable) explanatory variables. Additionally, it allows an estimated risk of an event occurrence for an individual to take into account statistically significant prognostic variables. Such a model does not require any assumptions concerning the hidden survival time distribution and it expresses hazard in time t for a set of explanatory variables. The hazard for a given person at time t is given by the following equation: h(t, x 1 , x 2 , . . . , x n ) = h 0 (t) · e (β 1 ·x 1 +β 2 ·x 2 +...+β n ·x n ) where, h 0 (t) is an arbitrary baseline hazard and x 1 , x 2 , . . . , x n are explanatory variables included in the model. Coefficients e β i are estimated hazard risk values (a value greater than 1 implies increased risk of event occurrence, while a value less than 1 implies that there is a reduced risk). The Cox regression model differs from the classical regression model because it introduces time-dependent censored variables for cases when no analysed end point has occurred during the follow-up. As a result of that, it is impossible to calculate the so-called residuals of the model, that is to say the differences between the real and expected values of dependent variables (the difference between the real value and the value calculated based on the given regression equation).

Limitations of Cox Proportional Hazards Analysis...
The main assumption of the classical Cox hazards analysis is the proportionality of the hazards. In the event of a violation of the proportionality of hazard assumption, the use of a simple Cox regression model is incorrect. Treating variables that strengthen as a hazard factor changes and/or disappear during follow-up as constant, significant risk factors of death, may result in a false inference. Unfortunately, this important assumption has been checked and properly reported in very few scientific publications.
The aim of this study was to evaluate the possibility of incorrect assessment of mortality risk factors in a group of patients affected by acute coronary syndrome, due to the lack of hazard proportionality in Cox regression.

Materials and Methods
One hundred and fifty consecutive patients with acute coronary syndrome (ACS) and no age limit, admitted to the emergency room of the Department of Cardiology of the Community Hospital in Tychy, Poland, between October 2005 and March 2006, were enrolled. All patients were from the Silesian Voivodeship. The study protocol was approved by the Bioethical Committee of the Medical University of Silesia. Written consent was obtained from each participant. Cases of ACS were classified based on the TIMI Risk Score, which was helpful in determining disposition, including: a) urgent invasive diagnostics and transfer of a patient from the emergency room to catheterization laboratories, b) admission to the department of cardiology for further treatment and diagnostics of coronary artery disease.
The following factors were analysed: atherosclerosis risk factors, myocardial necrosis markers, concentrations of blood haemoglobin and serum creatinine, sodium, potassium, and albumin in urine analysis, and serum concentrations of homocysteine, cystatin C, folic acid, vitamin B 12 , and Btype natriuretic peptide (NT-proBNP). High troponin was defined as a troponin level in blood above 0.5 µg/L; hypercholesterolemia was defined as a serum cholesterol level above 5.2 mmol/l, while hypertriglyceridemia was defined as a serum triglyceride level above 2.3 mmol/l. An NT-proBNP level above 2000 pg/mL identified patients with heart failure, according to Dickstein et al. (2010).
Echocardiography was performed for each patient hospitalized in the Department of Cardiology, with evaluation of left ventricular hypertro-phy, ejection fraction, and left ventricular muscle mass (according to the Penn formula). Transthoracic echocardiography was performed using the Aloka 4000 device using 2D projection, M-mode projection, Doppler conventional M-mode echocardiography, and Color M-mode echocardiography. Kidney function (estimated glomerular filtration rate -eGFR) was calculated on the basis of the MDRD formula (Levey et al., 2009). Patients' unique identification numbers (PESELs) were used to track annual survival of patients based on data collected by the Department of National Central Evidence run by the Ministry of Internal Affairs and Administration.
Verification of the distribution normality of analysed variables was evaluated with the Shapiro-Wilk test and visually based on histogram and quantile-quantile plot. The following tests were used to verify hypotheses: -Non-parametric χ 2 test and χ 2 test with Yates correction, to compare percentages shared between separate groups (e.g. as in the case of frequency of obesity occurrence). -Parametric test for two independent samples (the Student's t test) to compare mean values through interval scale measurement with normal distribution (e.g. as in the case of haemoglobin concentration in blood serum); homogeneity of variances was assessed with the Fisher-Snedecor F test. -Non-parametric test for two independent samples (the Mann-Whitney U test) to compare values in ordinal scale and in interval scale for heavily skewed distributions (e.g. as for NT-proBNP concentration in blood serum).
To assess the impact of the factors on risk of death, the univariable and multivariable Cox proportional regression models adjusted for age were used. In backward stepwise analysis, variables for which significance level p exceeded 0.05 were eliminated in order, from the highest to the lowest p value. The proportional hazard assumptions were verified using Schoenfeld residuals, χ 2 test and rank correlation coefficient Rho between residuals and time (Abeysekera et al., 2009;Schoenfeld, 1982). Results of Cox analysis were presented as hazard rate (HR) with a 95% confidence interval, z-statistic and corresponding p value.
Nominal and ordinal data were expressed as percentages, and interval data were expressed as mean values with standard deviations or as medians with lower and upper quartiles (in cases of non-normal distribution or heavily-skewed data). Statistical analysis was performed with Statistica 10.0 software, R environment, and Excel of MS Office. Values of p < 0.05 were considered as statistically significant.

Results
The study group characteristics. In the total group of 150 patients, at a mean age of 65±11.5 (range: 21-92 years), 33 (22.0%) deaths were registered during the follow-up period of 64 months. Participants belonging to the group of non-survivors were significantly older and had increased prevalence of diabetes, anaemia, and erythrocyturia, a longer history of coronary artery disease, higher concentrations of serum creatinine (and at the same time lower eGFR values), cystatine C, uric acid, glucose, C-reactive protein (CRP), homocysteine, (and frequency of hyperhomocysteinemia) and NT-proBNP, and lower concentrations of serum sodium (Tables 1 and 2). No significant differences in echocardiography parameters were observed between groups (Table 3).  Eighty-four (56.0%) patients were using angiotensin converting enzyme inhibitors (ACE-I), 81 (54.0%) β-adrenolytics, 29 (19.3%) calcium channel blockers, 9 (6.0%) anti-arrhythmic agents, 39 (26.0%) diuretics, 47 (31.3%) statins, and 76 (50.7%) antiplatelet agents. The structures of prior pharmacological therapies to treat ACS episodes were similar in patients who died and survived during the follow-up period.
Cox proportional hazard model. The univariable Cox proportional hazard model showed that smoking (increases death risk by more than two times), length of coronary artery disease history, diabetes (increases death risk by more than two times), anaemia (increases death risk by more than two and a half times), elevated serum concentrations of uric acid, cystatin C, NT-proBNP and homocysteine, and NT-proBNP above 2000 pg/mL, and decreased serum sodium are significant factors for death (Table 4). One should see that all of these factors, with the exception of sodium, increased the risk of death. Table 4 presents the results of testing the proportional hazard assumptions based on Schoenfeld residuals. Rho value is a correlation factor between survival time and scaled Schoenfeld residuals with a corresponding χ 2 value (the bigger the values, the higher probability that the analysed risk factor is not constant in time). In the case of serum sodium and homocysteine concentrations, the violation of assumptions of hazard proportionality was revealed and time-dependent models were taken into consideration (Na·t and homocysteine·t). Obviously, in such situations, testing the proportional hazard assumptions should no longer be performed.
Multivariable (stepwise backward) Cox proportional hazards regression (Table 5) showed that smoking, serum concentration of homocysteine and NT-proBNP are significant factors that influence the occurrence of death.
Smoking increases the risk of death by more than six times, with the risk of death increasing as the concentration of homocysteine in blood serum elevates (by 1.13 times in each µmol/l) as well as with the concentration of NT-proBNP increasing (by 1.09 times in each 1000 pg/ml). There were no violations of the proportional hazard estimates in the case of all variables or in the model as a whole.

Discussion
This study showed that violation of proportional hazard assumptions have been revealed in some cases during multivariable regression analysis (concentrations of homocysteine and sodium). Without verification of this assumption, these factors would be included as factors significantly increasing risk of death in Cox regression analysis.
Of the analysed group of patients with ACS, during the average followup of 49 months, 22.0% (95% CI: 16.1-29.3%) died from any cause. These results are comparable to the Filipiak et al. (2014) study, performed with a group of 906 patients suffering from ACS and with an average age of 63 years, where the death rate was 29%, as well as with another study, conducted by Ezekowitz et al. (2009), with a population-based cohort of 7,733 patients above 65 years, where the death rate was 28%. The multicentre research GRACE performed on 3,721 patients in Great Britain (2065 patients) and Belgium (1656 patients) showed that 20.0% of deaths had followed myocardial infarction, which was comparable with the result obtained in our research during a 5-year follow-up, including cases with STEMI -19.0%, NSTEMI -22.0%, and unstable angina -18.0% (Fox et al., 2010).
Factors influencing long-term prognosis in the Cox proportional hazards model. The Cox proportional hazards model is one of the most popular regression models used in survival analysis. Its great advantage is that it does not require any assumptions concerning the shape of a hidden survival distribution. In comparison with the equally common model of logistic regression, the hazards model makes it possible to consider survival time and occurrence of truncated variables (connected with the loss of information about a patient's fate during follow-up). The main assumption of the classical Cox hazards model is the proportionality of hazard. In this model, it is assumed that the hazard function for a unit (i.e. observation in the analysis) depends on the values of covariates and the value of the baseline hazard h 0 (t). For two units of defined values of covariates, the ratio of time-estimated hazard values should be constant. Failure to fulfill this assumption leads to a false model and to treating all hazard factors obtained in the model as essential and time-independent predictive factors (Abeysekera et al., 2009;Grambsch et al., 1994;Keele, 2010;Schoenfeld, 1982;Winett et al., 2001).
Let us assume that h 1 (t) and h 2 (t) are the hazard functions in group one and two respectively, when t > 0. It is said that those two groups have proportional hazard when the hazard ratio defined as h 1 (t)/h 2 (t), for all t, is constant over time.
The Cox hazards model differs from the classical regression model because it introduces time-dependent censored variables for cases when the analysed end point does not occur during observation. As a result, it is not possible to calculate the so-called "model residuals", i.e. the differences between the real and expected values of dependent variables. However, it is possible to check the assumptions in a non-parametric hazards model if one uses Schoenfeld residuals, test χ 2 and the Rho coefficient of rank correlation between residuals and time. Obtaining a horizontal regression line of residuals across the time axis confirms the proportionality of hazard (Abeysekera et al., 2009;Grambsch et al., 1994;Keele, 2010;Schoenfeld, 1982;Winett et al., 2001). Unfortunately, this important assumption has been checked and properly reported in very few scientific publications.
In the presented research, the following factors were risk of death factors and fulfilled the proportional hazard assumption in the univariable model: smoking, occurrence of diabetes and anaemia, duration of coronary artery disease and abnormal serum concentrations of uric acid, sodium, homocysteine, cystatine C and NT-proBNP. In the multivariable model, the following were considered risk of death factors that fulfilled the proportional hazard assumption: smoking and elevated concentrations of homocysteine and NT-proBNP. The influence of these factors on long-term prediction in patients with ACS has been confirmed in numerous works (Akerblom et al., 2013;Bellomo et al., 2003;Bjorklund, 2005;Danaei et al., 2006;De Sutter, 2005;Fácila et al., 2005;Jernberg, 2004;Lawler et al., 2013;Panichi et al., 2002;Rutkowski et al., 2005;Sitkiewicz, 2007;Walker et al., 2006).
Cox regression model in the case of violation of the assumption of proportional hazards. It is improper to use a simple Cox regression model with regard to the violation of proportional hazard assumptions as it can lead to false deductions. Using a regression model with stratification of variables (the so-called stratified model) or an extended regression model with timedependent variables, which is one of the so-called non-proportional hazards models, is recommended.
The Cox stratified regression model is a modification of the standard regression model in which stratification (categorization) of a variable not fulfilling the assumption of the proportionality of hazard has been introduced. This variable is not taken into consideration in the model, which includes the remaining parameters fulfilling the required assumption. In this analysis we allow, in advance, different hazard functions in each group determined by two levels of a stratifying variable. The advantages of such an approach are lack of need to define the interaction relationship between a stratified variable and the observation time as well as simplicity of realization (Ata et al., 2007). Its disadvantage, however, is that there is no possibility to assess the influence of a stratifying variable in the Cox regression model. Thus, this approach should be chosen if we are not directly interested in a quantitative evaluation of the influence of a stratifying variable used for categorization and when such interaction between a variable and the observation time is very complicated and difficult to interpret (Bellera et al., 2010).
In our study, during univariable analysis, a violation of assumptions of the hazard proportionality in the case of serum homocysteine and sodium concentrations was found. Without verification of this assumption, a conclusion concerning the significant influence of homocysteine concentration on increased risk of death would have been drawn. However, as possible changes in homocysteine concentration over time were taken into consideration, the statistical significance of this variable was cancelled out. It remained significant in the case of a multivariable model in which, and this should be stressed, all variables fulfilled the proportional hazard assumptions. Decreased serum sodium concentration is a risk factor only in the initial period of observation. After two weeks, the lowered concentration of sodium stops being a factor increasing death risk in the examined group of patients.
Application of the hazard proportionality control leads either to removal of some variables from the Cox regression model or allows discovery of time-dependent influences about the analysed factor on the risk of endpoint occurrence. Models achieved in this way are more reliable not only from the statistical (methodological) point of view, but also from the clinical one. The results obtained in our research are consistent with numerous tests which have revealed that one of the prognostic factors of ACS is a lowered serum sodium concentration. In the Goldberg et al. (2004) study, for example, sodium concentration was estimated for 1,047 patients with STEMI within 24, 48, and 72 hours from ACS onset. Hyponatremia was a significant risk factor in a 30-day observation (Goldberg et al., 2004). Similarly, Singla et al. (2007) noticed that in their research, performed with a group of 1,478 patients with NSTEMI, 341 patients had had hyponatremia, which is a risk factor of 30-day mortality. Moreover, a paper published by Lazzeri et al. (2012) showed that in a group of 1,231 patients with STEMI, 23.2% (286 patients) had a serum sodium concentration below 135 mEq/l. Patients in this group were senior patients and they more frequently suffered from diabetes, anterior wall infarction, triple artery disease, and belonged to the advanced class of the Killip-Kimball classification. These patients had a considerably higher short-term and long-term mortality hazard (Lazzeri et al., 2012). Furthermore, in the Aronson et al. (2014) study, hyponatremia occurred in 156 (19.7%) patients with decompensated heart failure and in 461 (17.7%) patients with ACS. Lowered sodium concentration was a predictor of kidney disease development and it led to an acute cardio-renal syndrome (Aronson et al., 2014). It should also be stressed that hyponatremia occurrence was considered in predictive algorithms of death events following ACS (Plakht et al., 2012(Plakht et al., , 2013.

Conclusions
In this study, it has been demonstrated that violation of the proportional hazard assumption in the Cox regression model may lead to creating a false model that does not include only time-independent predictive factors. For this reason, before the application of a simple hazard model is made, one has to check the above assumption and apply the Cox stratified regression model or time-dependent variable model if necessary.
Applying the appropriate statistical models, we demonstrated that smoking, concentration of serum homocysteine and NT-proBNP are important risk factors of death in ACS patients.