Main Features
1. Summarise variables/factors by a categorical variable
summary_factorlist()
is a wrapper used to aggregate any
number of explanatory variables by a single variable of
interest. This is often “Table 1” of a published study. When
categorical, the variable of interest can have a maximum of five levels.
It uses Hmisc::summary.formula()
.
library(finalfit)
library(dplyr)
# Load example dataset, modified version of survival::colon
data(colon_s)
# Table 1 - Patient demographics by variable of interest ----
= c("age", "age.factor", "sex.factor", "obstruct.factor")
explanatory = "perfor.factor" # Bowel perforation
dependent %>%
colon_s summary_factorlist(dependent, explanatory,
p=TRUE, add_dependent_label=TRUE) -> t1
::kable(t1, row.names=FALSE, align=c("l", "l", "r", "r", "r")) knitr
Dependent: Perforation | No | Yes | p | |
---|---|---|---|---|
Age (years) | Mean (SD) | 59.8 (11.9) | 58.4 (13.3) | 0.542 |
Age | <40 years | 68 (7.5) | 2 (7.4) | 1.000 |
40-59 years | 334 (37.0) | 10 (37.0) | ||
60+ years | 500 (55.4) | 15 (55.6) | ||
Sex | Female | 432 (47.9) | 13 (48.1) | 1.000 |
Male | 470 (52.1) | 14 (51.9) | ||
Obstruction | No | 715 (81.2) | 17 (63.0) | 0.035 |
Yes | 166 (18.8) | 10 (37.0) |
When exported to PDF:
See other options relating to inclusion of missing data, mean vs. median for continuous variables, column vs. row proportions, include a total column etc.
summary_factorlist()
is also commonly used to summarise
any number of variables by an outcome variable (say
dead yes/no).
# Table 2 - 5 yr mortality ----
= c("age.factor", "sex.factor", "obstruct.factor")
explanatory = 'mort_5yr'
dependent %>%
colon_s summary_factorlist(dependent, explanatory,
p=TRUE, add_dependent_label=TRUE) -> t2
::kable(t2, row.names=FALSE, align=c("l", "l", "r", "r", "r")) knitr
Dependent: Mortality 5 year | Alive | Died | p | |
---|---|---|---|---|
Age | <40 years | 31 (6.1) | 36 (8.9) | 0.020 |
40-59 years | 208 (40.7) | 131 (32.4) | ||
60+ years | 272 (53.2) | 237 (58.7) | ||
Sex | Female | 243 (47.6) | 194 (48.0) | 0.941 |
Male | 268 (52.4) | 210 (52.0) | ||
Obstruction | No | 408 (82.1) | 312 (78.6) | 0.219 |
Yes | 89 (17.9) | 85 (21.4) |
Tables can be knitted to PDF, Word or html documents. We do this in RStudio from a .Rmd document.
2. Summarise regression model results in final table format
The second main feature is the ability to create final tables for
linear lm()
, logistic glm()
, hierarchical
logistic lme4::glmer()
and Cox proportional hazards
survival::coxph()
regression models.
The finalfit()
“all-in-one” function takes a single
dependent variable with a vector of explanatory variable names
(continuous or categorical variables) to produce a final table for
publication including summary statistics, univariable and multivariable
regression analyses. The first columns are those produced by
summary_factorist()
. The appropriate regression model is
chosen on the basis of the dependent variable type and other arguments
passed.
Logistic regression: glm()
Of the form:
glm(depdendent ~ explanatory, family="binomial")
= c("age.factor", "sex.factor", "obstruct.factor", "perfor.factor")
explanatory = 'mort_5yr'
dependent %>%
colon_s finalfit(dependent, explanatory) -> t3
::kable(t3, row.names=FALSE, align=c("l", "l", "r", "r", "r", "r")) knitr
Dependent: Mortality 5 year | Alive | Died | OR (univariable) | OR (multivariable) | |
---|---|---|---|---|---|
Age | <40 years | 31 (46.3) | 36 (53.7) | - | - |
40-59 years | 208 (61.4) | 131 (38.6) | 0.54 (0.32-0.92, p=0.023) | 0.57 (0.34-0.98, p=0.041) | |
60+ years | 272 (53.4) | 237 (46.6) | 0.75 (0.45-1.25, p=0.270) | 0.81 (0.48-1.36, p=0.426) | |
Sex | Female | 243 (55.6) | 194 (44.4) | - | - |
Male | 268 (56.1) | 210 (43.9) | 0.98 (0.76-1.27, p=0.889) | 0.98 (0.75-1.28, p=0.902) | |
Obstruction | No | 408 (56.7) | 312 (43.3) | - | - |
Yes | 89 (51.1) | 85 (48.9) | 1.25 (0.90-1.74, p=0.189) | 1.25 (0.90-1.76, p=0.186) | |
Perforation | No | 497 (56.0) | 391 (44.0) | - | - |
Yes | 14 (51.9) | 13 (48.1) | 1.18 (0.54-2.55, p=0.672) | 1.12 (0.51-2.44, p=0.770) |
Logistic regression with reduced model: glm()
Where a multivariable model contains a subset of the variables included specified in the full univariable set, this can be specified.
= c("age.factor", "sex.factor", "obstruct.factor", "perfor.factor")
explanatory = c("age.factor", "obstruct.factor")
explanatory_multi = 'mort_5yr'
dependent %>%
colon_s finalfit(dependent, explanatory, explanatory_multi) -> t4
::kable(t4, row.names=FALSE, align=c("l", "l", "r", "r", "r", "r")) knitr
Dependent: Mortality 5 year | Alive | Died | OR (univariable) | OR (multivariable) | |
---|---|---|---|---|---|
Age | <40 years | 31 (46.3) | 36 (53.7) | - | - |
40-59 years | 208 (61.4) | 131 (38.6) | 0.54 (0.32-0.92, p=0.023) | 0.57 (0.34-0.98, p=0.041) | |
60+ years | 272 (53.4) | 237 (46.6) | 0.75 (0.45-1.25, p=0.270) | 0.81 (0.48-1.36, p=0.424) | |
Sex | Female | 243 (55.6) | 194 (44.4) | - | - |
Male | 268 (56.1) | 210 (43.9) | 0.98 (0.76-1.27, p=0.889) | - | |
Obstruction | No | 408 (56.7) | 312 (43.3) | - | - |
Yes | 89 (51.1) | 85 (48.9) | 1.25 (0.90-1.74, p=0.189) | 1.26 (0.90-1.76, p=0.176) | |
Perforation | No | 497 (56.0) | 391 (44.0) | - | - |
Yes | 14 (51.9) | 13 (48.1) | 1.18 (0.54-2.55, p=0.672) | - |
Mixed effects logistic regression: lme4::glmer()
Of the form:
lme4::glmer(dependent ~ explanatory + (1 | random_effect), family="binomial")
Hierarchical/mixed effects/multilevel logistic regression models can
be specified using the argument random_effect
. At the
moment it is just set up for random intercepts
(i.e. (1 | random_effect)
, but in the future I’ll adjust
this to accommodate random gradients if needed
(i.e. (variable1 | variable2)
.
= c("age.factor", "sex.factor", "obstruct.factor", "perfor.factor")
explanatory = c("age.factor", "obstruct.factor")
explanatory_multi = "hospital"
random_effect = 'mort_5yr'
dependent %>%
colon_s finalfit(dependent, explanatory, explanatory_multi, random_effect) -> t5
::kable(t5, row.names=FALSE, align=c("l", "l", "r", "r", "r", "r")) knitr
Dependent: Mortality 5 year | Alive | Died | OR (univariable) | OR (multilevel) | |
---|---|---|---|---|---|
Age | <40 years | 31 (46.3) | 36 (53.7) | - | - |
40-59 years | 208 (61.4) | 131 (38.6) | 0.54 (0.32-0.92, p=0.023) | 0.73 (0.38-1.40, p=0.342) | |
60+ years | 272 (53.4) | 237 (46.6) | 0.75 (0.45-1.25, p=0.270) | 1.01 (0.53-1.90, p=0.984) | |
Sex | Female | 243 (55.6) | 194 (44.4) | - | - |
Male | 268 (56.1) | 210 (43.9) | 0.98 (0.76-1.27, p=0.889) | - | |
Obstruction | No | 408 (56.7) | 312 (43.3) | - | - |
Yes | 89 (51.1) | 85 (48.9) | 1.25 (0.90-1.74, p=0.189) | 1.24 (0.83-1.85, p=0.292) | |
Perforation | No | 497 (56.0) | 391 (44.0) | - | - |
Yes | 14 (51.9) | 13 (48.1) | 1.18 (0.54-2.55, p=0.672) | - |
Cox proportional hazards: survival::coxph()
Of the form:
survival::coxph(dependent ~ explanatory)
= c("age.factor", "sex.factor", "obstruct.factor", "perfor.factor")
explanatory = "Surv(time, status)"
dependent %>%
colon_s finalfit(dependent, explanatory) -> t6
::kable(t6, row.names=FALSE, align=c("l", "l", "r", "r", "r", "r")) knitr
Dependent: Surv(time, status) | all | HR (univariable) | HR (multivariable) | |
---|---|---|---|---|
Age | <40 years | 70 (7.5) | - | - |
40-59 years | 344 (37.0) | 0.76 (0.53-1.09, p=0.132) | 0.79 (0.55-1.13, p=0.196) | |
60+ years | 515 (55.4) | 0.93 (0.66-1.31, p=0.668) | 0.98 (0.69-1.40, p=0.926) | |
Sex | Female | 445 (47.9) | - | - |
Male | 484 (52.1) | 1.01 (0.84-1.22, p=0.888) | 1.02 (0.85-1.23, p=0.812) | |
Obstruction | No | 732 (80.6) | - | - |
Yes | 176 (19.4) | 1.29 (1.03-1.62, p=0.028) | 1.30 (1.03-1.64, p=0.026) | |
Perforation | No | 902 (97.1) | - | - |
Yes | 27 (2.9) | 1.17 (0.70-1.95, p=0.556) | 1.08 (0.64-1.81, p=0.785) |
Add common model metrics to output
metrics=TRUE
provides common model metrics. The output
is a list of two dataframes. Note chunk specification for output
below.
= c("age.factor", "sex.factor",
explanatory "obstruct.factor", "perfor.factor")
= 'mort_5yr'
dependent %>%
colon_s finalfit(dependent, explanatory, metrics=TRUE) -> t7
::kable(t7[[1]], row.names=FALSE, align=c("l", "l", "r", "r", "r", "r")) knitr
Dependent: Mortality 5 year | Alive | Died | OR (univariable) | OR (multivariable) | |
---|---|---|---|---|---|
Age | <40 years | 31 (46.3) | 36 (53.7) | - | - |
40-59 years | 208 (61.4) | 131 (38.6) | 0.54 (0.32-0.92, p=0.023) | 0.57 (0.34-0.98, p=0.041) | |
60+ years | 272 (53.4) | 237 (46.6) | 0.75 (0.45-1.25, p=0.270) | 0.81 (0.48-1.36, p=0.426) | |
Sex | Female | 243 (55.6) | 194 (44.4) | - | - |
Male | 268 (56.1) | 210 (43.9) | 0.98 (0.76-1.27, p=0.889) | 0.98 (0.75-1.28, p=0.902) | |
Obstruction | No | 408 (56.7) | 312 (43.3) | - | - |
Yes | 89 (51.1) | 85 (48.9) | 1.25 (0.90-1.74, p=0.189) | 1.25 (0.90-1.76, p=0.186) | |
Perforation | No | 497 (56.0) | 391 (44.0) | - | - |
Yes | 14 (51.9) | 13 (48.1) | 1.18 (0.54-2.55, p=0.672) | 1.12 (0.51-2.44, p=0.770) |
::kable(t7[[2]], row.names=FALSE, col.names="") knitr
Number in dataframe = 929, Number in model = 894, Missing = 35, AIC = 1230.7, C-statistic = 0.56, H&L = Chi-sq(8) 5.69 (p=0.682) |