This article describes how to cut study SDTM data using a modular approach to enable any further study or project specific customization.
To start, all SDTM data to be cut needs to be stored in a list.
library(datacutr)
library(admiraldev)
library(dplyr)
library(lubridate)
library(stringr)
library(purrr)
library(rlang)
<- list(ds = datacutr_ds, dm = datacutr_dm, ae = datacutr_ae, sc = datacutr_sc, lb = datacutr_lb, fa = datacutr_fa, ts = datacutr_ts) source_data
The next step is to create the DCUT dataset containing the datacut date and description.
<- create_dcut(
dcut dataset_ds = source_data$ds,
ds_date_var = DSSTDTC,
filter = DSDECOD == "RANDOMIZATION",
cut_date = "2022-06-04",
cut_description = "Clinical Cutoff Date"
)
USUBJID | DCUTDTC | DCUTDTM | DCUTDESC |
---|---|---|---|
AB12345-001 | 2022-06-04 | 2022-06-04 23:59:59 | Clinical Cutoff Date |
AB12345-002 | 2022-06-04 | 2022-06-04 23:59:59 | Clinical Cutoff Date |
AB12345-003 | 2022-06-04 | 2022-06-04 23:59:59 | Clinical Cutoff Date |
AB12345-004 | 2022-06-04 | 2022-06-04 23:59:59 | Clinical Cutoff Date |
If any pre-processing of datasets is needed, for example in the case of FA, where there are multiple date variables, this should be done next.
$fa <- source_data$fa %>%
source_datamutate(DCUT_TEMP_FAXDTC = case_when(
!= "" ~ FASTDTC,
FASTDTC != "" ~ FADTC,
FADTC TRUE ~ as.character(NA)
))
USUBJID | FASTDTC | FADTC | DCUT_TEMP_FAXDTC |
---|---|---|---|
AB12345-001 | 2022-06-01 | 2022-06-01 | |
AB12345-002 | 2022-06-30 | 2022-06-30 | |
AB12345-003 | 2022-07-01 | 2022-07-01 | |
AB12345-004 | 2022-05-04 | 2022-05-04 | |
AB12345-005 | 2022-12-01 | 2022-12-01 |
We’ll next specify the cut types for each dataset (patient cut, date cut or no cut) and in the case of date cut which date variable should be used.
<- c("sc", "ds")
patient_cut_list
<- rbind(
date_cut_list c("ae", "AESTDTC"),
c("lb", "LBDTC"),
c("fa", "DCUT_TEMP_FAXDTC")
)
<- list(ts = source_data$ts) no_cut_list
Next we’ll apply the patient cut.
<- lapply(
patient_cut_data
source_data[patient_cut_list], pt_cut,dataset_cut = dcut
)
This adds on temporary flag variables indicating which observations will be removed, for example for SC:
USUBJID | SCORRES | DCUT_TEMP_REMOVE |
---|---|---|
AB12345-001 | A | NA |
AB12345-002 | B | NA |
AB12345-003 | C | NA |
AB12345-004 | D | NA |
AB12345-005 | E | Y |
Next we’ll apply the date cut.
<- pmap(
date_cut_data .l = list(
dataset_sdtm = source_data[date_cut_list[, 1]],
sdtm_date_var = syms(date_cut_list[, 2])
),.f = date_cut,
dataset_cut = dcut,
cut_var = DCUTDTM
)
This again adds on temporary flag variables indicating which observations will be removed, for example for AE:
USUBJID | AETERM | AESTDTC | DCUT_TEMP_SDTM_DATE | DCUT_TEMP_DCUTDTM | DCUT_TEMP_REMOVE |
---|---|---|---|---|---|
AB12345-001 | AE1 | 2022-06-01 | 2022-06-01 | 2022-06-04 23:59:59 | NA |
AB12345-002 | AE2 | 2022-06-30 | 2022-06-30 | 2022-06-04 23:59:59 | Y |
AB12345-003 | AE3 | 2022-07-01 | 2022-07-01 | 2022-06-04 23:59:59 | Y |
AB12345-004 | AE4 | 2022-05-04 | 2022-05-04 | 2022-06-04 23:59:59 | NA |
AB12345-005 | AE5 | 2022-12-01 | 2022-12-01 | NA | Y |
Then lastly we’ll apply the special DM cut which also updates the death related variables.
<- special_dm_cut(
dm_cut dataset_dm = source_data$dm,
dataset_cut = dcut,
cut_var = DCUTDTM
)
This adds on temporary variables indicating any death records that would change as a result of applying a datacut:
USUBJID | DTHFL | DTHDTC | DCUT_TEMP_REMOVE | DCUT_TEMP_DTHDT | DCUT_TEMP_DCUTDTM | DCUT_TEMP_DTHCHANGE |
---|---|---|---|---|---|---|
AB12345-001 | Y | 2022-06-01 | NA | 2022-06-01 | 2022-06-04 23:59:59 | NA |
AB12345-002 | NA | NA | 2022-06-04 23:59:59 | NA | ||
AB12345-003 | Y | 2022-07-01 | NA | 2022-07-01 | 2022-06-04 23:59:59 | Y |
AB12345-004 | NA | NA | 2022-06-04 23:59:59 | NA | ||
AB12345-005 | Y | 2022-12-01 | Y | 2022-12-01 | NA | NA |
The last step is to create the RMD report, to summarize which patients and observations will be cut, and then apply the cut to strip out all observations flagged as to be removed.
<- purrr::map(
cut_data c(patient_cut_data, date_cut_data, list(dm = dm_cut)),
apply_cut,dcutvar = DCUT_TEMP_REMOVE,
dthchangevar = DCUT_TEMP_DTHCHANGE
)
Lastly, we create the final list of all the cut SDTM data, adding in the SDTM where no cut was needed.
<- c(cut_data, no_cut_list, list(dcut = dcut)) final_data