bigPLScox

PLS models for Cox regression with big data in R

Frédéric Bertrand and Myriam Maumy-Bertrand

https://doi.org/10.32614/CRAN.package.bigPLScox

DOI R-CMD-check R-hub

bigPLScox provides Partial Least Squares (PLS) methods for Cox proportional hazards models, with a particular focus on high dimensional and big memory settings. The package supports classical PLS Cox methods together with accelerated C++ backends that operate directly on bigmemory::big.matrix objects.

The main design goals are:

Standalone benchmarking scripts that complement the vignette live under inst/benchmarks/.

The documentation website and examples are maintained by Frédéric Bertrand and Myriam Maumy.

Conference highlight. Maumy, M. and Bertrand, F. (2023). “PLS models and their extension for big data”. Conference presentation at the Joint Statistical Meetings (JSM 2023), Toronto, Ontario, Canada, Aug 5–10, 2023.

Conference highlight. Maumy, M. and Bertrand, F. (2023). “bigPLS: Fitting and cross-validating PLS-based Cox models to censored big data”. Poster at BioC2023: The Bioconductor Annual Conference, Dana-Farber Cancer Institute, Boston, MA, USA, Aug 2–4, 2023. doi:10.7490/f1000research.1119546.1.

Core modelling functions

The following families of PLS Cox estimators are available.

All these functions come in both default and formula interfaces and have matching predict() methods with support for type = "link", "risk" and other standard Cox outputs.

Cross validation helpers are provided through:

These mirror the criteria used in plsRcox and include time dependent survival metrics.

Big memory PLS Cox backends

The package offers dedicated functions for Cox PLS fits on large matrices, including file backed bigmemory::big.matrix objects.

Cross validation for the big memory backends is provided by:

These functions help select the number of components and compare the exact and gradient based backends.

Prediction, plots and summaries

The following S3 methods are provided for PLS Cox fits.

Several internal PLS models from plsRcox (for example gPLS, sPLS, sgPLS, pls.cox) also have stats::predict() methods registered in the namespace so that standard predict() calls continue to work.

Diagnostics and model selection

bigPLScox provides a range of tools for residual diagnostics, component selection and inspection of gradient based fits.

These tools are intended to complement classic survival model diagnostics such as survival::coxph() residual plots.

Utilities, data and scaling

A small number of helper functions and data objects round out the package.

The package also re exports the %*% and Arith methods used with some big matrix types.

Vignettes and learning material

Several vignettes ship with the package and are accessible once it is installed.

Refer to the pkgdown site for rendered versions of these documents and a complete function reference:

https://fbertran.github.io/bigPLScox/

Installation

You can install the released version of bigPLScox from CRAN with:

install.packages("bigPLScox")

You can install the development version of bigPLScox from GitHub with:

# install.packages("devtools")
devtools::install_github("fbertran/bigPLScox")

Minimal example

The following minimal example uses the micro array data bundled with the package.

library(bigPLScox)
data(micro.censure)
data(Xmicro.censure_compl_imp)

Y <- micro.censure$survyear
status <- micro.censure$DC
X <- Xmicro.censure_compl_imp

set.seed(123)
fit <- coxgpls(
  Xplan = X,
  time = Y,
  status = status,
  ncomp = 4,
  ind.block.x = c(3, 10, 20)
)
#> Error in colMeans(x, na.rm = TRUE): 'x' must be numeric

summary(fit)
#> Error: object 'fit' not found

A big memory workflow uses bigmemory::big.matrix objects.

library(bigmemory)

X_big <- bigmemory::as.big.matrix(X)

fast_fit <- big_pls_cox_fast(
  X = X_big,
  time = Y,
  status = status,
  ncomp = 4
)

lp <- predict(fast_fit, newdata = X_big, type = "link")
head(lp)
#> [1] -0.4296294 -0.7809034  1.6411946 -1.3885315  1.2299486 -1.7144312

For more elaborate examples, including cross validation and comparisons between the exact and gradient based backends, see the vignettes and the scripts under inst/benchmarks.

Citation

If you use bigPLScox in scientific work, please cite the package and the associated conference material.

Maumy, M. and Bertrand, F. (2023). PLS models and their extension for big data. Joint Statistical Meetings, Toronto, Ontario, Canada.

Maumy, M. and Bertrand, F. (2023). bigPLS: Fitting and cross validating PLS based Cox models to censored big data. BioC2023, Dana Farber Cancer Institute, Boston, MA, poster contribution. doi:10.7490/f1000research.1119546.1.