When working with survival data it is important to
remember that the effects may change in time. This is rarely something
you notice in datasets of a few hundreds but as the datasets grow larger
the chances of this happening increase. When doing a simple Cox
regression with the survival package this can be easily checked for
using the cox.zph
function.
The main effect of non-proportional hazards is that you will have a mean estimate over the study time. This mean does not distribute evenly throughout the studied time period but evenly according to the observed time, i.e. if you have the longest observation period being 20 years while > 50% of your patients have a follow-up of less than 10 years the effect. Note that this may further depend on how the events are distributed. Thus it is useful to be able to deal with non-proportional hazards.
strata
The Cox model allows you to estimate effects in different strata and
then average them together. If your confounder that breaks the
proportionality assumption is a non-continuous variable this is a
convenient method that allows you to set-up the necessary strata. With
one variable this is simple,
Surv(time, event) ~ x_1 + x_2 + strata(x_np)
, but with
multiple variables you risk of ending up with small strata and the
non-informative error:
attempt to select less than one element
Note that you should not multiple strata but combine the variables,
e.g. if you have two variables called x_np1
and
x_np2
you would set up your model with the strata as
strata(x_np1, x_np2)
. The strata is then handled as
interactions generating nlevels(x_np1) * nlevels(x_np2)
which seems also to be the core reason for why this fails.
tt
argumentThe survival package has an excellent vignette om time-dependent
variables and time-dependent coefficients, see
vignette("timedep", package = "survival")
. Prof. Therneau
explains there common pitfalls and how to use a time-transforming option
provided by the coxph
function through the tt
argument. It is a neat an simple solution that transforms those
variables that you have indicated for transformation using the
tt
function. The vignette provides some simple approaches
but it also allows for a rather sophisticated use:
library(survival)
coxph(Surv(time, event) ~ age + sex +
type_of_surgery + tt(type_of_surgery) +
tt(surgery_length),
data = my_surgical_data,
tt = list(
function(type_of_surgery, time, ...){
# As type_of_surgery is a categorical variable
# we must transform it into a design matrix that
# we can use for multiplication with time
# Note: the [,-1] is for dropping the intercept
mtrx <- model.matrix(~type_of_surgery)[,-1]
mtrx * time
},
function(surgery_length, time, ...){
# Note that the surgery_length and the time
# should probably have similar ranges. If you
# report operating time in minutes, follow-up
# in years the t will be dwarfed by the
pspline(surgery_length + time)
}
))
The main problem is that it doesn’t scale all that well to larger datasets. A common error unless you have a large amount of memory is:
Could not allocate vector of *** MB
The tt
approach is based upon the idea of time
splitting. Time splitting is possible since the Cox proportional hazards
model studies the derivative of the survival function, i.e. the hazard,
and thus doesn’t care how many observations were present before the
current derivative. This allows including patients/observations after 2
years by ignoring them prior to 2 years. The method is referred to as
interval time where the Surv(time, event)
simply
changes into Surv(Start_time, Stop_time, event)
.
This allows us to adjust for time interactions as the
Start_time
is independent of the event. In our standard
setting the Start_time
is always 0 but if we split an
observation into multiple time steps and use the interval time
the variable will become meaningful in an interaction setting. Note that
[!ref] suggested that one uses the End_time
after splitting
the observation time, while I’ve found that it in practice doesn’t
matter that much - it makes intuitive sense to use the
Start_time
if we make the time interval too large the
End_time
will convey information about the event status and
thereby by nature become significant. The approach explained in more
detail below.
An alternative is to split the data into a few intervals, select one
interval at the time and perform separate models on each. This will
result in multiple estimates per variable, poorer statistical power and
most likely an incomplete handling of the non-proportional hazards.
Despite these downsides it may still be a viable solution when
presenting to a less statistically savvy audience in order to gain
acceptance for above methods. The timeSplitter
can help
generating the datasets necessary, here’s a convenient example using
dplyr
:
These approaches are probably just a subset of possible approaches. I know of the timereg package that has some very fancy time coefficient handling. My personal experience with the package is limited and I’ve been discouraged by the visually less appealing graphs provided in the examples and there isn’t a proper vignette explaining the details (Last checked v 1.8.9). Similarly the flexsurv should be able to deal with the proportional hazards assumption. If you know any other then please send me an e-mail.
First we generate a short survival dataset with 4 observations.
library(tidyverse)
library(Greg)
test_data <- tibble(id = 1:4,
time = c(4, 3.5, 1, 5),
event = c("censored", "dead", "alive", "dead"),
age = c(62.2, 55.3, 73.7, 46.3),
date = as.Date(
c("2003-01-01",
"2010-04-01",
"2013-09-20",
"2002-02-23")),
stringsAsFactors = TRUE) |>
mutate(time_str = sprintf("0 to %.1f", time))
Using some grid-graphics we can illustrate the dataset graphically:
Now we apply a split that splits the data into 2 year chunks. Note: 2 years as in this example is probably not optimal, only chosen in order to make it easier to display.
library(dplyr)
split_data <- test_data |>
select(id, event, time, age, date) |>
timeSplitter(by = 2, # The time that we want to split by
event_var = "event",
time_var = "time",
event_start_status = "alive",
time_related_vars = c("age", "date"))
knitr::kable(head(split_data, 10))
id | event | age | date | Start_time | Stop_time |
---|---|---|---|---|---|
1 | alive | 62.2 | 2002.999 | 0 | 2.0 |
1 | censored | 64.2 | 2004.999 | 2 | 4.0 |
2 | alive | 55.3 | 2010.246 | 0 | 2.0 |
2 | dead | 57.3 | 2012.246 | 2 | 3.5 |
3 | alive | 73.7 | 2013.718 | 0 | 1.0 |
4 | alive | 46.3 | 2002.145 | 0 | 2.0 |
4 | alive | 48.3 | 2004.145 | 2 | 4.0 |
4 | dead | 50.3 | 2006.145 | 4 | 5.0 |
Now if we plot each individual’s interval times below the original see multiple observation times where only the last observation time is related to the actual event. All prior are assumed to have unchanged event status from the original status.
I haven’t found any good datasets with non-proportional hazards but the melanoma dataset is largish and allows some exploration.
# First we start with loading the dataset
data("melanoma", package = "boot")
# Then we munge it according to ?boot::melanoma
melanoma <- mutate(melanoma,
status = factor(status,
levels = 1:3,
labels = c("Died from melanoma",
"Alive",
"Died from other causes")),
ulcer = factor(ulcer,
levels = 0:1,
labels = c("Absent", "Present")),
time = time/365.25, # All variables should be in the same time unit
sex = factor(sex,
levels = 0:1,
labels = c("Female", "Male")))
Now we can fit a regular cox regression:
library(survival)
regular_model <- coxph(Surv(time, status == "Died from melanoma") ~
age + sex + year + thickness + ulcer,
data = melanoma,
x = TRUE, y = TRUE)
summary(regular_model)
## Call:
## coxph(formula = Surv(time, status == "Died from melanoma") ~
## age + sex + year + thickness + ulcer, data = melanoma, x = TRUE,
## y = TRUE)
##
## n= 205, number of events= 57
##
## coef exp(coef) se(coef) z Pr(>|z|)
## age 0.016805 1.016947 0.008578 1.959 0.050094 .
## sexMale 0.448121 1.565368 0.266861 1.679 0.093107 .
## year -0.102566 0.902518 0.061007 -1.681 0.092719 .
## thickness 0.100312 1.105516 0.038212 2.625 0.008660 **
## ulcerPresent 1.194555 3.302087 0.309254 3.863 0.000112 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## exp(coef) exp(-coef) lower .95 upper .95
## age 1.0169 0.9833 1.0000 1.034
## sexMale 1.5654 0.6388 0.9278 2.641
## year 0.9025 1.1080 0.8008 1.017
## thickness 1.1055 0.9046 1.0257 1.191
## ulcerPresent 3.3021 0.3028 1.8012 6.054
##
## Concordance= 0.757 (se = 0.031 )
## Likelihood ratio test= 44.4 on 5 df, p=2e-08
## Wald test = 40.89 on 5 df, p=1e-07
## Score (logrank) test = 48.14 on 5 df, p=3e-09
If we do the same with a split dataset:
spl_melanoma <-
melanoma |>
timeSplitter(by = .5,
event_var = "status",
event_start_status = "Alive",
time_var = "time",
time_related_vars = c("age", "year"))
interval_model <-
update(regular_model,
Surv(Start_time, Stop_time, status == "Died from melanoma") ~ .,
data = spl_melanoma)
summary(interval_model)
## Call:
## coxph(formula = Surv(Start_time, Stop_time, status == "Died from melanoma") ~
## age + sex + year + thickness + ulcer, data = spl_melanoma,
## x = TRUE, y = TRUE)
##
## n= 2522, number of events= 57
##
## coef exp(coef) se(coef) z Pr(>|z|)
## age 0.016805 1.016947 0.008578 1.959 0.050094 .
## sexMale 0.448121 1.565368 0.266861 1.679 0.093107 .
## year -0.102566 0.902518 0.061007 -1.681 0.092719 .
## thickness 0.100312 1.105516 0.038212 2.625 0.008660 **
## ulcerPresent 1.194555 3.302087 0.309254 3.863 0.000112 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## exp(coef) exp(-coef) lower .95 upper .95
## age 1.0169 0.9833 1.0000 1.034
## sexMale 1.5654 0.6388 0.9278 2.641
## year 0.9025 1.1080 0.8008 1.017
## thickness 1.1055 0.9046 1.0257 1.191
## ulcerPresent 3.3021 0.3028 1.8012 6.054
##
## Concordance= 0.757 (se = 0.03 )
## Likelihood ratio test= 44.4 on 5 df, p=2e-08
## Wald test = 40.89 on 5 df, p=1e-07
## Score (logrank) test = 48.14 on 5 df, p=3e-09
As you can see the difference between the models is negligible:
library(htmlTable)
cbind(Regular = coef(regular_model),
Interval = coef(interval_model),
Difference = coef(regular_model) - coef(interval_model)) |>
txtRound(digits = 5) |>
knitr::kable(align = "r")
Regular | Interval | Difference | |
---|---|---|---|
age | 0.01681 | 0.01681 | 0.00000 |
sexMale | 0.44812 | 0.44812 | 0.00000 |
year | -0.10257 | -0.10257 | 0.00000 |
thickness | 0.10031 | 0.10031 | 0.00000 |
ulcerPresent | 1.19455 | 1.19455 | 0.00000 |
Now we can look for time varying coefficients using the
survival::cox.zph()
function:
cox.zph(regular_model) |>
purrr::pluck("table") |>
txtRound(digits = 2) |>
knitr::kable(align = "r")
chisq | df | p | |
---|---|---|---|
age | 2.51 | 1.00 | 0.11 |
sex | 1.50 | 1.00 | 0.22 |
year | 1.26 | 1.00 | 0.26 |
thickness | 4.40 | 1.00 | 0.04 |
ulcer | 3.26 | 1.00 | 0.07 |
GLOBAL | 9.97 | 5.00 | 0.08 |
The two variable that give a hint of time variation are age and
thickness. It seems reasonable that melanoma thickness is less important
as time increases, either the tumor was adequately removed or there was
some remaining piece that caused the patient to die within a few years.
We will therefore add a time interaction using the :
variant (note using the *
for interactions
gives a separate variable for the time and that is not of interest in
this case):
## Call:
## coxph(formula = Surv(Start_time, Stop_time, status == "Died from melanoma") ~
## age + sex + year + thickness + ulcer + thickness:Start_time,
## data = spl_melanoma, x = TRUE, y = TRUE)
##
## n= 2522, number of events= 57
##
## coef exp(coef) se(coef) z Pr(>|z|)
## age 0.014126 1.014226 0.008591 1.644 0.100115
## sexMale 0.511881 1.668427 0.268960 1.903 0.057016 .
## year -0.098459 0.906233 0.061241 -1.608 0.107896
## thickness 0.209025 1.232476 0.061820 3.381 0.000722 ***
## ulcerPresent 1.286214 3.619060 0.313838 4.098 4.16e-05 ***
## thickness:Start_time -0.045630 0.955395 0.022909 -1.992 0.046388 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## exp(coef) exp(-coef) lower .95 upper .95
## age 1.0142 0.9860 0.9973 1.0314
## sexMale 1.6684 0.5994 0.9848 2.8265
## year 0.9062 1.1035 0.8037 1.0218
## thickness 1.2325 0.8114 1.0918 1.3912
## ulcerPresent 3.6191 0.2763 1.9564 6.6948
## thickness:Start_time 0.9554 1.0467 0.9134 0.9993
##
## Concordance= 0.762 (se = 0.03 )
## Likelihood ratio test= 48.96 on 6 df, p=8e-09
## Wald test = 45.28 on 6 df, p=4e-08
## Score (logrank) test = 56.29 on 6 df, p=3e-10
As suspected the thickness effect is reduced with time. A linear model is hard to explain from a biological standpoint, we may want to see if we can detect if the interaction follows a non-linear trajectory by adding a quadratic variable:
# First we need to manually add an interaction term
spl_melanoma <- mutate(spl_melanoma,
thickness_start = thickness * Start_time)
anova(time_int_model,
update(time_int_model, .~.+I(thickness_start^2)))
## Analysis of Deviance Table
## Cox model: response is Surv(Start_time, Stop_time, status == "Died from melanoma")
## Model 1: ~ age + sex + year + thickness + ulcer + thickness:Start_time
## Model 2: ~ age + sex + year + thickness + ulcer + I(thickness_start^2) + thickness:Start_time
## loglik Chisq Df Pr(>|Chi|)
## 1 -258.72
## 2 -258.51 0.4178 1 0.518
As you can see this doesn’t support that the variable is non-linear.
An alternative would be to use the survival::pspline
method:
## Call:
## coxph(formula = Surv(Start_time, Stop_time, status == "Died from melanoma") ~
## age + sex + year + thickness + ulcer + pspline(thickness_start),
## data = spl_melanoma, x = TRUE, y = TRUE)
##
## coef se(coef) se2 Chisq DF p
## age 0.01291 0.00857 0.00857 2.26993 1.00 0.13191
## sexMale 0.49378 0.27138 0.27125 3.31055 1.00 0.06884
## year -0.09235 0.06070 0.06068 2.31457 1.00 0.12817
## thickness 0.15334 0.07679 0.07525 3.98710 1.00 0.04585
## ulcerPresent 1.10814 0.32615 0.32381 11.54370 1.00 0.00068
## pspline(thickness_start), -0.04236 0.02321 0.02314 3.32997 1.00 0.06803
## pspline(thickness_start), 4.19818 2.99 0.23948
##
## Iterations: 4 outer, 14 Newton-Raphson
## Theta= 0.107
## Degrees of freedom for terms= 1 1 1 1 1 4
## Likelihood ratio test=54 on 8.93 df, p=2e-08
## n= 2522, number of events= 57
If you are only investigating confounders that you want to adjust for we are done. If you actually want to convey the results to your readers then we need to think about how to display the interaction, especially if they turn out to follow a non-linear pattern. If you have two continuous variables I you have basically two options, go with a 3-dimensional graph that where confidence interval are hard to illustrate or categorize one of the variables and use a regular 2-dimensional graph. I usually go for the latter:
# Lets create an evenly distributed categorical thickness variable
# and interactions
spl_melanoma <- mutate(spl_melanoma,
thickness_cat = cut(thickness,
breaks = c(0, 1, 5, Inf),
labels = c("less than 1.0",
"1.0 to 4.9",
"at least 5.0")))
# Now create interaction variables
for (l in levels(spl_melanoma$thickness_cat)[-1]) {
spl_melanoma[[sprintf("thickness_%s_time", gsub(" ", "_", l))]] <-
(spl_melanoma$thickness_cat == l)*spl_melanoma$Start_time
}
# Now for the model specification where we use a
# pspline for the two interaction variables
adv_int_model <-
coxph(Surv(Start_time, Stop_time, status == "Died from melanoma") ~
age + sex + year + ulcer +
thickness_cat + pspline(thickness_1.0_to_4.9_time) + pspline(thickness_at_least_5.0_time),
data = spl_melanoma,
x = TRUE, y = TRUE,
iter.max = 1000)
# To get the estimates we use the predict function
new_data <- data.frame(thickness_cat = rep(levels(spl_melanoma$thickness_cat)[-1],
each = 100),
Start_time = 2^seq(-3, 3, length.out = 100),
stringsAsFactors = FALSE) |>
mutate(thickness_1.0_to_4.9_time = (thickness_cat == levels(spl_melanoma$thickness_cat)[2]) *
Start_time,
thickness_at_least_5.0_time = (thickness_cat == levels(spl_melanoma$thickness_cat)[3]) *
Start_time)
new_data$sex = "Female"
new_data$age = median(melanoma$age)
new_data$year = median(melanoma$year)
new_data$ulcer = "Absent"
adv_pred <- predict(adv_int_model,
newdata = new_data,
type = "terms",
terms = c("thickness_cat",
"pspline(thickness_1.0_to_4.9_time)",
"pspline(thickness_at_least_5.0_time)"),
se.fit = TRUE)
new_data$fit <- rowSums(adv_pred$fit)
new_data$se.fit <- apply(adv_pred$se.fit, 1, function(x) x^2) |>
colSums() |>
sqrt()
new_data <- mutate(new_data,
risk = exp(fit),
upper = exp(fit + 1.96*se.fit),
lower = exp(fit - 1.96*se.fit))
library(ggplot2)
new_data |>
mutate(adapted_risk = sapply(risk, function(x) min(max(2^-4, x), 2^5)),
adapted_upper = sapply(upper, function(x) min(max(2^-4, x), 2^5)),
adapted_lower = sapply(lower, function(x) min(max(2^-4, x), 2^5))) |>
ggplot(aes(y = adapted_risk,
x = new_data$Start_time,
group = thickness_cat,
col = thickness_cat,
fill = thickness_cat)) +
# The confidence intervals are too wide to display in this case
# geom_ribbon(aes(ymax = adapted_upper, ymin = adapted_lower), fill = "red") +
geom_line() +
scale_x_log10(breaks = 2^(-3:4),
limits = c(2^-3, 8),
expand = c(0, 0)) +
scale_y_log10(breaks = 2^(-4:5),
labels = txtRound(2^(-4:5), 2),
expand = c(0,0)) +
scale_color_brewer(type = "qual", guide = guide_legend(title = "Thickness (mm)")) +
ylab("Hazard ratio") +
xlab("Time (years)") +
theme_bw()
The main problem is the memory usage with both the tt
and the timeSplitter
approach. Make therefore sure to drop
all variables that you won’t be using before doing your
regression. I’ve found that dropping variables not only limits the risk
of running out of memory but also considerably speeds up the
regressions.
Longer interval lengths will reduce the size of the split dataset but
will increase the risk of residual non-proportional hazards. When I
consulted a statistician on a dataset containing followup 0 to 21 years,
she recommended that I have ½ year intervals. I think this was slightly
overdoing it, I guess an alternative would have been to simply redo the
cox.zph
call in order to see how well the new model takes
care of the non-proportionality problem.
Explaining the time coefficient can be demanding. I often rely on the
rms::contrast
function but this can be tricky since the
Start_time
can confuse the contrast
function.
I()
optionJust to be crystal clear, the I()
option should never be
used. It will provide spuriously low p-values and doesn’t solve the
non-proportionality issue. See Therneau’s vignette for more on this.