Survival analysis and logistic regression

Ghost
Jun 4, 2019
3 min read

Survival analysis and logistic regression share certain similarities. The use of logistic regression for survival analysis, although not strictly correct due to data censoring in survival data, provides intuitive understanding.

Logistic regression

Let's recall how logistic regression is done. It takes the form of logit function. The logarithm here makes the odds symmetrical to 1 and normally distributed.

The coefficient is estimated using MLE which maximizes the log likelihood of all data point while rotating coefficient of logit function. Below is how probability - logit function converted.

The coefficient of the logit function represents the likelihood changes by one unit changes of variable. The larger the coefficient, the more the variable affects the outcome.

Survival analysis

The main goal of survival analysis is to see if the candidate variable affects patient survival. The hypothesis test is performed on coefficient which reflects the hazard ratio of the given variable. Intuitively, this is similar to the coefficient of logistic regressor: the deviation of coefficient from zero indicates that two groups of value are significantly different.

KM-estimator

KM-estimator and Cox model are usually used for survival analysis. KM-estimator as a non-parametric test uses Logrank test to determine the significance of variable's influence on survival. Here the Logrank is used instead of t-test or Wilcoxon rank sum test because data is censored and parametric assumption is not guaranteed. However, KM-estimator has the following defects:

KM estimator and Logrank test are non-parametric
KM only works on single / categorical strata. Continuous factors have to be categorized. Objective categorization may be biologically justified.
When data is forced into too many strata, you may loss statistical power.
Logrank test in KM estimator only tell the weight of evidence that strata are different in their risk (pvalue), but not the magnitude of difference (hazard ratio)

Cox model

Cox model on other hand simultaneously performs multiple regression on the following assumptions:

Independent observation (individual)
Independent censoring
proportional hazard

Wald test, likelihood test and Score test are used to test the null hypothesis that two models (regressions) are nested and are not different model. In another word, the null hypothesis is that all of the coefficients are 0. Intuitively, these tests tell how well the combination of given variables predicts patients' survival.

All these three tests follow chi-squared distribution and asymptotically equivalent.

Wald test compares squared difference between the estimated and hypothesized parameter values

Which, as the squared z-statistics under normal distribution, follows chi-squared distribution. Since SE of coefficient is estimated by MLE in the form of square roots of the diagonal elements of the variance-covariance matrix, the z-value (df = n) should be used instead of t-value. When sample size is small, the uncertainty of SE may lead to inaccuracy.

In univariate cox model, the Wald test in the variate equal to the Wald test of overall model.

Score test compares the slope of estimated and hypothesized parameter values in the log likelihood distribution. Simplified as:

Where the numerator is the slope of log likelihood of estimated parameter value a.k.a. the coefficient.

In the univariate Cox model, the Score test is equal to the Logrank test.

Likelihood test compares the log likelihood of estimated and hypothesized parameter value. When sample size is small, likelihood test is preferred (accordingto Neyman–Pearson lemma). This simplified version is:

Note that these tests should be distinguished from hypothesis test done on each variable. The later test determines, while fixing other variables, whether the given variable affects survival. When cox model is tested on single variable, the Wald test on variable and on overall model is the same (df=1).

Last thing worth mentioning is that Cox model does not draw its own survival curve. Although KM-estimator and Cox model are implemented differently, sometimes KM curve is used as approximation of univariate Cox model for display purpose.

The power of biological replicates in statistical analysis

MCMC II: Applying MCMC in somatic variant calling

MCMC: Monte Carlo sampling and Markov Chain

Survival analysis and logistic regression

Comments