Updating cohort start and end dates

Introduction

Accurately defining cohort entry and exit dates is crucial in observational research to ensure the validity of study findings. The CohortConstructor package provides several functions to adjust these dates based on specific criteria, and this vignette demonstrates how to use them.

Functions to update cohort dates can be categorized into four groups:

Exit at Specific Date Functions: Adjust the cohort end date to predefined events (observation end and death date).
Cohort Entry or Exit Based on Other Date Columns: Modify cohort start or end dates to the earliest or latests from a set of date columns.
Trim Dates Functions: Restrict cohort entries based on demographic criteria or specific date ranges.
Pad Dates Functions: Adjust cohort start or end dates by adding or subtracting a specified number of days.

We’ll explore each category in the following sections.

First, we’ll connect to the Eunomia synthetic data and create a mock cohort of women in the database to use as example in the vignette.

library(dplyr, warn.conflicts = FALSE)
library(CohortConstructor)
library(CohortCharacteristics)
library(PatientProfiles)
library(omock)
library(clock)

cdm <- mockCdmFromDataset(datasetName = "GiBleed", source = "duckdb")
#> ℹ Reading GiBleed tables.
#> ℹ Adding drug_strength table.

cdm$cohort <- demographicsCohort(cdm = cdm, name = "cohort", sex = "Female")
#> ℹ Building new trimmed cohort
#> Adding demographics information
#> Creating initial cohort
#> Trim sex
#> ✔ Cohort trimmed

Exit at Specific Date

`exitAtObservationEnd()`

The exitAtObservationEnd() function updates the cohort end date to the end of the observation period for each subject. This ensures that the cohort exit does not extend beyond the period during which data is available for the subject.

cdm$cohort_observation_end <- cdm$cohort |> 
  exitAtObservationEnd(name = "cohort_observation_end")

As cohort entries cannot overlap, updating the end date to the observation end may result in overlapping records. In such cases, overlapping records are collapsed into a single entry (starting at the earliest entry and ending at the end of observation).

This function has an argument persistAcrossObservationPeriods to consider cases when a subject may have more than one observation period. If persistAcrossObservationPeriods = FALSE then cohort end date will be set to the end of the observation period where the record occurs. If persistAcrossObservationPeriods = TRUE, in addition to updating the cohort end to the current observation end, cohort entries are created for each of the subsequent observation periods.

`exitAtDeath()`

The exitAtDeath() function sets the cohort end date to the recorded death date of the subject.

By default, it keeps the end date of subjects who do not have a death record unmodified; however, these can be dropped with the argument requireDeath.

cdm$cohort_death <- cdm$cohort |> 
  exitAtDeath(requireDeath = TRUE, name = "cohort_death")

Cohort Entry or Exit Based on Other Date Columns

`entryAtFirstDate()`

The entryAtFirstDate() function updates the cohort start date to the earliest date among specified columns.

Next we want to set the entry date to the first of: diclofenac or acetaminophen prescriptions after cohort start, or cohort end date.

# create cohort with of drugs diclofenac and acetaminophen 
cdm$medications <- conceptCohort(
  cdm = cdm, name = "medications",
  conceptSet = list("diclofenac" = 1124300, "acetaminophen" = 1127433)
)
#> Warning: ! `codelist` casted to integers.
#> ℹ Subsetting table drug_exposure using 2 concepts with domain: drug.
#> ℹ Combining tables.
#> ℹ Creating cohort attributes.
#> ℹ Applying cohort requirements.
#> ℹ Merging overlapping records.
#> ✔ Cohort medications created.

# add date first ocurrence of these drugs from index date
cdm$cohort_dates <- cdm$cohort |> 
  addCohortIntersectDate(
    targetCohortTable = "medications", 
    nameStyle = "{cohort_name}",
    name = "cohort_dates"
    ) 

# set cohort start at the first ocurrence of one of the drugs, or the end date
cdm$cohort_entry_first <- cdm$cohort_dates |>
  entryAtFirstDate(
    dateColumns = c("diclofenac", "acetaminophen", "cohort_end_date"), 
    name = "cohort_entry_first"
  )
#> Joining with `by = join_by(cohort_definition_id, subject_id, cohort_end_date)`
cdm$cohort_entry_first 
#> # Source:   table<results.test_cohort_entry_first> [?? x 6]
#> # Database: DuckDB v1.3.1 [root@Darwin 24.6.0:R 4.5.1//private/var/folders/sw/rd8zn92n2nz45cfcc5dcs_080000gr/T/Rtmpltb5fP/file18a662c65846.duckdb]
#>    cohort_definition_id subject_id cohort_start_date cohort_end_date diclofenac
#>                   <int>      <int> <date>            <date>          <date>    
#>  1                    1       4342 1964-01-02        2018-11-09      NA        
#>  2                    1       2950 1955-04-16        2018-04-09      NA        
#>  3                    1       3895 1933-09-14        2019-03-13      NA        
#>  4                    1       5256 1965-07-16        2018-10-26      2004-02-15
#>  5                    1       2333 1926-02-01        2010-04-20      NA        
#>  6                    1       4375 1933-06-09        2019-03-11      1969-07-19
#>  7                    1       2392 1942-09-17        2019-03-13      NA        
#>  8                    1       3175 1989-11-17        2018-09-04      1999-05-02
#>  9                    1       3093 1984-11-19        2019-04-07      NA        
#> 10                    1       4576 1955-10-23        2018-09-26      1993-03-31
#> # ℹ more rows
#> # ℹ 1 more variable: acetaminophen <date>

`entryAtLastDate()`

The entryAtLastDate() function works similarly to entryAtFirstDate(), however now the selected column is the latest date among specified columns.

cdm$cohort_entry_last <- cdm$cohort_dates |>
  entryAtLastDate(
    dateColumns = c("diclofenac", "acetaminophen", "cohort_end_date"), 
    keepDateColumns = FALSE,
    name = "cohort_entry_last"
  )

cdm$cohort_entry_last
#> # Source:   table<results.test_cohort_entry_last> [?? x 4]
#> # Database: DuckDB v1.3.1 [root@Darwin 24.6.0:R 4.5.1//private/var/folders/sw/rd8zn92n2nz45cfcc5dcs_080000gr/T/Rtmpltb5fP/file18a662c65846.duckdb]
#>    cohort_definition_id subject_id cohort_start_date cohort_end_date
#>                   <int>      <int> <date>            <date>         
#>  1                    1        299 2019-01-19        2019-01-19     
#>  2                    1        430 2012-10-20        2012-10-20     
#>  3                    1        655 2019-04-27        2019-04-27     
#>  4                    1       1855 2018-11-23        2018-11-23     
#>  5                    1       1976 2017-03-06        2017-03-06     
#>  6                    1       2329 2018-01-23        2018-01-23     
#>  7                    1       2392 2019-03-13        2019-03-13     
#>  8                    1       2752 2018-10-07        2018-10-07     
#>  9                    1       2989 2018-09-12        2018-09-12     
#> 10                    1       4639 2018-03-16        2018-03-16     
#> # ℹ more rows

In this example, we set keepDateColumns to FALSE, which drops columns in dateColumns.

`exitAtFirstDate()`

The exitAtFirstDate() function updates the cohort end date to the earliest date among specified columns.

For instance, next we want the exit to be observation end, except if there is a record of diclofenac or acetaminophen, in which case that would be the end:

cdm$cohort_exit_first <- cdm$cohort_dates |>
  addFutureObservation(futureObservationType = "date", name = "cohort_exit_first") |>
  exitAtFirstDate(
    dateColumns = c("future_observation", "acetaminophen", "diclofenac"),
    keepDateColumns = FALSE
  )

cdm$cohort_exit_first 
#> # Source:   table<results.test_cohort_exit_first> [?? x 4]
#> # Database: DuckDB v1.3.1 [root@Darwin 24.6.0:R 4.5.1//private/var/folders/sw/rd8zn92n2nz45cfcc5dcs_080000gr/T/Rtmpltb5fP/file18a662c65846.duckdb]
#>    cohort_definition_id subject_id cohort_start_date cohort_end_date
#>                   <int>      <int> <date>            <date>         
#>  1                    1        289 1959-03-17        2010-10-17     
#>  2                    1        484 1938-01-03        1953-08-12     
#>  3                    1       1306 1956-08-09        1970-03-29     
#>  4                    1       2326 1951-02-08        1963-01-07     
#>  5                    1       2647 1970-12-13        1974-08-25     
#>  6                    1       3200 1946-08-22        1981-03-19     
#>  7                    1       4372 1980-05-08        1987-07-11     
#>  8                    1       4563 1963-12-13        1970-04-12     
#>  9                    1        546 1943-11-14        1946-02-03     
#> 10                    1        915 1959-12-26        1969-08-06     
#> # ℹ more rows

`exitAtLastDate()`

Similarly, the exitAtLastDate() function sets the cohort end date to the latest date among specified columns.

cdm$cohort_exit_last <- cdm$cohort_dates |> 
  exitAtLastDate(
    dateColumns = c("cohort_end_date", "acetaminophen", "diclofenac"),
    returnReason = FALSE,
    keepDateColumns = FALSE,
    name = "cohort_exit_last"
  )
cdm$cohort_exit_last
#> # Source:   table<results.test_cohort_exit_last> [?? x 4]
#> # Database: DuckDB v1.3.1 [root@Darwin 24.6.0:R 4.5.1//private/var/folders/sw/rd8zn92n2nz45cfcc5dcs_080000gr/T/Rtmpltb5fP/file18a662c65846.duckdb]
#>    cohort_definition_id subject_id cohort_start_date cohort_end_date
#>                   <int>      <int> <date>            <date>         
#>  1                    1        421 1965-06-16        2018-07-04     
#>  2                    1       1235 1975-05-26        2019-06-10     
#>  3                    1       1236 1920-07-30        2018-09-14     
#>  4                    1       2307 1961-10-03        2018-11-13     
#>  5                    1       2798 1953-10-11        2018-10-07     
#>  6                    1       3170 1950-12-22        1990-04-27     
#>  7                    1       3573 1923-04-28        2018-08-04     
#>  8                    1       4320 1970-03-29        2019-03-03     
#>  9                    1       4676 1979-08-02        2017-11-09     
#> 10                    1       4855 1977-09-10        2019-04-06     
#> # ℹ more rows

In this last example, the return cohort doesn’t have the specified date columns, neither the “reason” column indicating which date was used for entry/exit. These was set with the keepDateColumns and returnReason arguments, common throughout the functions in this category.

Trim Dates Functions

`trimDemographics()`

The trimDemographics() function restricts the cohort based on patient demographics. This means that cohort start and end dates are moved (within the original cohort entry dates) to ensure that individuals meet specific demographic criteria throughout their cohort participation. If individuals do not satisfy the criteria at any point during their cohort period, their records are excluded.

For instance, if we trim using an age range from 18 to 65, individuals will only contribute in the cohort form the day they are 18 or older, up to the day before turning 66 (or before if they leave the database).

cdm$cohort_trim <- cdm$cohort |>
  trimDemographics(ageRange = c(18, 65), name = "cohort_trim")
#> ℹ Building new trimmed cohort
#> Adding demographics information
#> Creating initial cohort
#> Trim age
#> ✔ Cohort trimmed

`trimToDateRange()`

The trimToDateRange() function confines cohort entry and exit dates within a specified date range, ensuring that cohort periods align with the defined timeframe. If only the start or end of a range is required, the other can be set to NA.

For example, to restrict cohort dates to be on or after January 1st, 2015:

# Trim cohort dates to be within the year 2000
cdm$cohort_trim <- cdm$cohort_trim |> 
  trimToDateRange(dateRange = as.Date(c("2015-01-01", NA)))

Pad Dates Functions

`padCohortStart()`

The padCohortStart() function adds (or subtracts) a specified number of days to the cohort start date.

For example, to subtract 50 days from the cohort start date:

# Substract 50 days to cohort start
cdm$cohort <- cdm$cohort |> padCohortStart(days = -50, collapse = FALSE)

When subtracting days, it may result in cohort start dates preceding the observation period start. By default, such entries are corrected to the observation period start. To drop these entries instead, set the requireFullContribution argument to TRUE

Additionally, adjusting cohort start dates may lead to overlapping entries for the same subject. The collapse argument manages this: if TRUE, merges overlapping entries into a single record with the earliest start and latest end date (default), if FALSE retains only the first of the overlapping entries.

`padCohortEnd()`

Similarly, the padCohortEnd() function adjusts the cohort end date by adding (or subtracting) a specified number of days.

The example below adds 1000 days to cohort end date, while dropping records that are outside of observation after adding days.

cdm$cohort_pad <- cdm$cohort |> 
  padCohortEnd(days = 1000, requireFullContribution = TRUE, name = "cohort_pad")

Additionally, days to add can also be specified with a numeric column in the cohort, which allows to add a specific number of days for each record:

cdm$cohort <- cdm$cohort |> 
  dplyr::mutate(days_to_add = date_count_between(start = cohort_start_date, end = cohort_end_date, precision = "day")) |>
  padCohortEnd(days = "days_to_add", requireFullContribution = TRUE)

`padCohortDate()`

The padCohortDate() function provides a more flexible approach by allowing adjustments to either the cohort start or end date based on specified parameters. You can define which date to adjust (cohortDate), the reference date for the adjustment (indexDate), and the number of days to add or subtract.

For example, to set the cohort end date to be 365 days after the cohort start date:

cdm$cohort <- cdm$cohort |> 
  padCohortDate(days = 365, cohortDate = "cohort_end_date", indexDate = "cohort_start_date")

Updating cohort start and end dates

Introduction

Exit at Specific Date

exitAtObservationEnd()

exitAtDeath()

Cohort Entry or Exit Based on Other Date Columns

entryAtFirstDate()

entryAtLastDate()

exitAtFirstDate()

exitAtLastDate()

Trim Dates Functions

trimDemographics()

trimToDateRange()