2 Introduction

In this vignette, we will explore the OmopSketch functions designed to provide an overview of the clinical tables within a CDM object (observation_period, visit_occurrence, condition_occurrence, drug_exposure, procedure_occurrence, device_exposure, measurement, observation, and death). Specifically, there are four key functions that facilitate this:

2.1 Create a mock cdm

Let’s see an example of its functionalities. To start with, we will load essential packages and create a mock cdm using the mockOmopSketch() database.

library(dplyr)
library(OmopSketch)

# Connect to mock database
cdm <- mockOmopSketch()

3 Summarise clinical tables

Let’s now use summariseClinicalTables()from the OmopSketch package to help us have an overview of one of the clinical tables of the cdm (i.e., condition_occurrence).

summarisedResult <- summariseClinicalRecords(cdm, "condition_occurrence")
#> ℹ Adding variables of interest to condition_occurrence.
#> ℹ Summarising records per person in condition_occurrence.
#> ℹ Summarising condition_occurrence: `in_observation`, `standard_concept`,
#>   `source_vocabulary`, `domain_id`, and `type_concept`.

summarisedResult |> print()
#> # A tibble: 20 × 13
#>    result_id cdm_name       group_name group_level      strata_name strata_level
#>        <int> <chr>          <chr>      <chr>            <chr>       <chr>       
#>  1         1 mockOmopSketch omop_table condition_occur… overall     overall     
#>  2         1 mockOmopSketch omop_table condition_occur… overall     overall     
#>  3         1 mockOmopSketch omop_table condition_occur… overall     overall     
#>  4         1 mockOmopSketch omop_table condition_occur… overall     overall     
#>  5         1 mockOmopSketch omop_table condition_occur… overall     overall     
#>  6         1 mockOmopSketch omop_table condition_occur… overall     overall     
#>  7         1 mockOmopSketch omop_table condition_occur… overall     overall     
#>  8         1 mockOmopSketch omop_table condition_occur… overall     overall     
#>  9         1 mockOmopSketch omop_table condition_occur… overall     overall     
#> 10         1 mockOmopSketch omop_table condition_occur… overall     overall     
#> 11         1 mockOmopSketch omop_table condition_occur… overall     overall     
#> 12         1 mockOmopSketch omop_table condition_occur… overall     overall     
#> 13         1 mockOmopSketch omop_table condition_occur… overall     overall     
#> 14         1 mockOmopSketch omop_table condition_occur… overall     overall     
#> 15         1 mockOmopSketch omop_table condition_occur… overall     overall     
#> 16         1 mockOmopSketch omop_table condition_occur… overall     overall     
#> 17         1 mockOmopSketch omop_table condition_occur… overall     overall     
#> 18         1 mockOmopSketch omop_table condition_occur… overall     overall     
#> 19         1 mockOmopSketch omop_table condition_occur… overall     overall     
#> 20         1 mockOmopSketch omop_table condition_occur… overall     overall     
#> # ℹ 7 more variables: variable_name <chr>, variable_level <chr>,
#> #   estimate_name <chr>, estimate_type <chr>, estimate_value <chr>,
#> #   additional_name <chr>, additional_level <chr>

Notice that the output is in the summarised result format.

We can use the arguments to specify which statistics we want to perform. For example, use the argument recordsPerPerson to indicate which estimates you are interested regarding the number of records per person.

summarisedResult <- summariseClinicalRecords(cdm,
  "condition_occurrence",
  recordsPerPerson = c("mean", "sd", "q05", "q95")
)
#> ℹ Adding variables of interest to condition_occurrence.
#> ℹ Summarising records per person in condition_occurrence.
#> ℹ Summarising condition_occurrence: `in_observation`, `standard_concept`,
#>   `source_vocabulary`, `domain_id`, and `type_concept`.

summarisedResult |>
  filter(variable_name == "records_per_person") |>
  select(variable_name, estimate_name, estimate_value)
#> # A tibble: 4 × 3
#>   variable_name      estimate_name estimate_value
#>   <chr>              <chr>         <chr>         
#> 1 records_per_person mean          84            
#> 2 records_per_person q05           70            
#> 3 records_per_person q95           98            
#> 4 records_per_person sd            9.1817

You can further specify if you want to include the number of records in observation (inObservation = TRUE), the number of concepts mapped (standardConcept = TRUE), which types of source vocabulary does the table contain (sourceVocabulary = TRUE), which types of domain does the vocabulary have (domainId = TRUE) or the concept’s type (typeConcept = TRUE).

summarisedResult <- summariseClinicalRecords(cdm,
  "condition_occurrence",
  recordsPerPerson = c("mean", "sd", "q05", "q95"),
  inObservation = TRUE,
  standardConcept = TRUE,
  sourceVocabulary = TRUE,
  domainId = TRUE,
  typeConcept = TRUE
)
#> ℹ Adding variables of interest to condition_occurrence.
#> ℹ Summarising records per person in condition_occurrence.
#> ℹ Summarising condition_occurrence: `in_observation`, `standard_concept`,
#>   `source_vocabulary`, `domain_id`, and `type_concept`.

summarisedResult |>
  select(variable_name, estimate_name, estimate_value) |>
  glimpse()
#> Rows: 17
#> Columns: 3
#> $ variable_name  <chr> "Number subjects", "Number subjects", "Number records",…
#> $ estimate_name  <chr> "count", "percentage", "count", "mean", "q05", "q95", "…
#> $ estimate_value <chr> "100", "100", "8400", "84", "70", "98", "9.1817", "8400…

Additionally, you can also stratify the previous results by sex and age groups:

summarisedResult <- summariseClinicalRecords(cdm,
  "condition_occurrence",
  recordsPerPerson = c("mean", "sd", "q05", "q95"),
  inObservation = TRUE,
  standardConcept = TRUE,
  sourceVocabulary = TRUE,
  domainId = TRUE,
  typeConcept = TRUE,
  sex = TRUE,
  ageGroup = list("<35" = c(0, 34), ">=35" = c(35, Inf))
)
#> ℹ Adding variables of interest to condition_occurrence.
#> ℹ Summarising records per person in condition_occurrence.
#> ℹ Summarising condition_occurrence: `in_observation`, `standard_concept`,
#>   `source_vocabulary`, `domain_id`, and `type_concept`.

summarisedResult |>
  select(variable_name, strata_level, estimate_name, estimate_value) |>
  glimpse()
#> Rows: 153
#> Columns: 4
#> $ variable_name  <chr> "Number subjects", "Number subjects", "Number records",…
#> $ strata_level   <chr> "overall", "overall", "overall", "overall", "overall", …
#> $ estimate_name  <chr> "count", "percentage", "count", "mean", "q05", "q95", "…
#> $ estimate_value <chr> "100", "100", "8400", "84", "69.8000", "98", "9.1817", …

Notice that, by default, the “overall” group will be also included, as well as crossed strata (that means, sex == “Female” and ageGroup == “>35”).

Also, see that the analysis can be conducted for multiple OMOP tables at the same time:

summarisedResult <- summariseClinicalRecords(cdm,
  c("observation_period", "drug_exposure"),
  recordsPerPerson = c("mean", "sd"),
  inObservation = FALSE,
  standardConcept = FALSE,
  sourceVocabulary = FALSE,
  domainId = FALSE,
  typeConcept = FALSE
)
#> ℹ Adding variables of interest to observation_period.
#> ℹ Summarising records per person in observation_period.
#> ℹ Adding variables of interest to drug_exposure.
#> ℹ Summarising records per person in drug_exposure.

summarisedResult |>
  select(group_level, variable_name, estimate_name, estimate_value) |>
  glimpse()
#> Rows: 10
#> Columns: 4
#> $ group_level    <chr> "observation_period", "observation_period", "observatio…
#> $ variable_name  <chr> "Number subjects", "Number subjects", "Number records",…
#> $ estimate_name  <chr> "count", "percentage", "count", "mean", "sd", "count", …
#> $ estimate_value <chr> "100", "100", "100", "1", "0", "100", "100", "21600", "…

We can also filter the clinical table to a specific time window by setting the dateRange argument.

summarisedResult <- summariseClinicalRecords(cdm, "drug_exposure",
  dateRange = as.Date(c("1990-01-01", "2010-01-01"))) 
#> ℹ Adding variables of interest to drug_exposure.
#> ℹ Summarising records per person in drug_exposure.
#> ℹ Summarising drug_exposure: `in_observation`, `standard_concept`,
#>   `source_vocabulary`, `domain_id`, and `type_concept`.

summarisedResult |>
  omopgenerics::settings()|>
  glimpse()
#> Rows: 1
#> Columns: 10
#> $ result_id          <int> 1
#> $ result_type        <chr> "summarise_clinical_records"
#> $ package_name       <chr> "OmopSketch"
#> $ package_version    <chr> "0.5.0"
#> $ group              <chr> "omop_table"
#> $ strata             <chr> ""
#> $ additional         <chr> ""
#> $ min_cell_count     <chr> "0"
#> $ study_period_end   <chr> "2010-01-01"
#> $ study_period_start <chr> "1990-01-01"

3.1 Tidy the summarised object

tableClinicalRecords() will help you to tidy the previous results and create a gt table.

summarisedResult <- summariseClinicalRecords(cdm,
  "condition_occurrence",
  recordsPerPerson = c("mean", "sd", "q05", "q95"),
  inObservation = TRUE,
  standardConcept = TRUE,
  sourceVocabulary = TRUE,
  domainId = TRUE,
  typeConcept = TRUE,
  sex = TRUE
)
#> ℹ Adding variables of interest to condition_occurrence.
#> ℹ Summarising records per person in condition_occurrence.
#> ℹ Summarising condition_occurrence: `in_observation`, `standard_concept`,
#>   `source_vocabulary`, `domain_id`, and `type_concept`.

summarisedResult |>
  tableClinicalRecords()
Variable name Variable level Estimate name
Database name
mockOmopSketch
condition_occurrence; overall
Number records - N 8,400.00
Number subjects - N (%) 100 (100.00%)
Records per person - Mean (SD) 84.00 (9.18)
q05 69.80
q95 98.00
In observation Yes N (%) 8,400 (100.00%)
Domain Condition N (%) 8,400 (100.00%)
Source vocabulary No matching concept N (%) 8,400 (100.00%)
Standard concept S N (%) 8,400 (100.00%)
Type concept id Unknown type concept: 1 N (%) 8,400 (100.00%)
condition_occurrence; Female
Number records - N 4,410.00
Number subjects - N (%) 52 (100.00%)
Records per person - Mean (SD) 84.81 (9.55)
q05 68.20
q95 98.90
In observation Yes N (%) 4,410 (100.00%)
Domain Condition N (%) 4,410 (100.00%)
Source vocabulary No matching concept N (%) 4,410 (100.00%)
Standard concept S N (%) 4,410 (100.00%)
Type concept id Unknown type concept: 1 N (%) 4,410 (100.00%)
condition_occurrence; Male
Number records - N 3,990.00
Number subjects - N (%) 48 (100.00%)
Records per person - Mean (SD) 83.12 (8.78)
q05 70.35
q95 96.00
In observation Yes N (%) 3,990 (100.00%)
Domain Condition N (%) 3,990 (100.00%)
Source vocabulary No matching concept N (%) 3,990 (100.00%)
Standard concept S N (%) 3,990 (100.00%)
Type concept id Unknown type concept: 1 N (%) 3,990 (100.00%)

4 Summarise record counts

OmopSketch can also help you to summarise the trend of the records of an OMOP table. See the example below, where we use summariseRecordCount() to count the number of records within each year, and then, we use plotRecordCount() to create a ggplot with the trend. We can also use tableRecordCount() to display results in a table of type gt, reactable or datatable. By default it creates a gt table.

summarisedResult <- summariseRecordCount(cdm, "drug_exposure", interval = "years")

summarisedResult |> tableRecordCount(type = "gt")
Time interval
mockOmopSketch
Number records
drug_exposure 1955-01-01 to 1955-12-31 12
1956-01-01 to 1956-12-31 16
1957-01-01 to 1957-12-31 33
1958-01-01 to 1958-12-31 44
1959-01-01 to 1959-12-31 40
1960-01-01 to 1960-12-31 40
1961-01-01 to 1961-12-31 60
1962-01-01 to 1962-12-31 48
1963-01-01 to 1963-12-31 82
1964-01-01 to 1964-12-31 185
1965-01-01 to 1965-12-31 107
1966-01-01 to 1966-12-31 61
1967-01-01 to 1967-12-31 51
1968-01-01 to 1968-12-31 61
1969-01-01 to 1969-12-31 66
1970-01-01 to 1970-12-31 82
1971-01-01 to 1971-12-31 105
1972-01-01 to 1972-12-31 200
1973-01-01 to 1973-12-31 227
1974-01-01 to 1974-12-31 78
1975-01-01 to 1975-12-31 77
1976-01-01 to 1976-12-31 118
1977-01-01 to 1977-12-31 127
1978-01-01 to 1978-12-31 130
1979-01-01 to 1979-12-31 361
1980-01-01 to 1980-12-31 189
1981-01-01 to 1981-12-31 226
1982-01-01 to 1982-12-31 196
1983-01-01 to 1983-12-31 160
1984-01-01 to 1984-12-31 153
1985-01-01 to 1985-12-31 180
1986-01-01 to 1986-12-31 205
1987-01-01 to 1987-12-31 194
1988-01-01 to 1988-12-31 239
1989-01-01 to 1989-12-31 263
1990-01-01 to 1990-12-31 238
1991-01-01 to 1991-12-31 372
1992-01-01 to 1992-12-31 629
1993-01-01 to 1993-12-31 732
1994-01-01 to 1994-12-31 422
1995-01-01 to 1995-12-31 369
1996-01-01 to 1996-12-31 456
1997-01-01 to 1997-12-31 249
1998-01-01 to 1998-12-31 388
1999-01-01 to 1999-12-31 464
2000-01-01 to 2000-12-31 511
2001-01-01 to 2001-12-31 739
2002-01-01 to 2002-12-31 590
2003-01-01 to 2003-12-31 815
2004-01-01 to 2004-12-31 1072
2005-01-01 to 2005-12-31 563
2006-01-01 to 2006-12-31 1183
2007-01-01 to 2007-12-31 615
2008-01-01 to 2008-12-31 355
2009-01-01 to 2009-12-31 66
2010-01-01 to 2010-12-31 316
2011-01-01 to 2011-12-31 651
2012-01-01 to 2012-12-31 603
2013-01-01 to 2013-12-31 644
2014-01-01 to 2014-12-31 701
2015-01-01 to 2015-12-31 840
2016-01-01 to 2016-12-31 256
2017-01-01 to 2017-12-31 218
2018-01-01 to 2018-12-31 633
2019-01-01 to 2019-12-31 1494
overall 21600

Note that you can adjust the time interval period using the interval argument, which can be set to either “years”, “months” or “quarters”. See the example below, where it shows the number of records every 18 months:

summariseRecordCount(cdm, "drug_exposure", interval = "quarters") |>
  plotRecordCount()

We can further stratify our counts by sex (setting argument sex = TRUE) or by age (providing an age group). Notice that in both cases, the function will automatically create a group called overall with all the sex groups and all the age groups.

summariseRecordCount(cdm, "drug_exposure",
  interval = "months",
  sex = TRUE,
  ageGroup = list(
    "<30" = c(0, 29),
    ">=30" = c(30, Inf)
  )
) |>
  plotRecordCount()

By default, plotRecordCount() does not apply faceting or colour to any variables. This can result confusing when stratifying by different variables, as seen in the previous picture. We can use VisOmopResults package to help us know by which columns we can colour or face by:

summariseRecordCount(cdm, "drug_exposure",
  interval = "months",
  sex = TRUE,
  ageGroup = list(
    "0-29" = c(0, 29),
    "30-Inf" = c(30, Inf)
  )
) |>
  visOmopResults::tidyColumns()
#> [1] "cdm_name"       "omop_table"     "age_group"      "sex"           
#> [5] "variable_name"  "variable_level" "count"          "time_interval" 
#> [9] "interval"

Then, we can simply specify this by using the facet and colour arguments from plotRecordCount()

summariseRecordCount(cdm, "drug_exposure",
  interval = "months",
  sex = TRUE,
  ageGroup = list(
    "0-29" = c(0, 29),
    "30-Inf" = c(30, Inf)
  )
) |>
  plotRecordCount(facet = omop_table ~ age_group, colour = "sex")

We can also filter the clinical table to a specific time window by setting the dateRange argument.

summariseRecordCount(cdm, "drug_exposure",
  interval = "years",
  sex = TRUE, 
  dateRange = as.Date(c("1990-01-01", "2010-01-01"))) |>
  tableRecordCount(type = "gt")
Time interval Sex
mockOmopSketch
Number records
drug_exposure 1990-01-01 to 1990-12-31 overall 238
Female 122
Male 116
1991-01-01 to 1991-12-31 overall 372
Female 190
Male 182
1992-01-01 to 1992-12-31 overall 629
Female 421
Male 208
1993-01-01 to 1993-12-31 overall 732
Male 450
Female 282
1994-01-01 to 1994-12-31 overall 422
Male 163
Female 259
1995-01-01 to 1995-12-31 overall 369
Female 218
Male 151
1996-01-01 to 1996-12-31 overall 456
Female 152
Male 304
1997-01-01 to 1997-12-31 overall 249
Male 52
Female 197
1998-01-01 to 1998-12-31 overall 388
Female 359
Male 29
1999-01-01 to 1999-12-31 overall 464
Female 437
Male 27
2000-01-01 to 2000-12-31 overall 511
Female 362
Male 149
2001-01-01 to 2001-12-31 overall 739
Female 610
Male 129
2002-01-01 to 2002-12-31 overall 590
Female 509
Male 81
2003-01-01 to 2003-12-31 overall 815
Female 692
Male 123
2004-01-01 to 2004-12-31 overall 1072
Male 407
Female 665
2005-01-01 to 2005-12-31 overall 563
Male 260
Female 303
2006-01-01 to 2006-12-31 overall 1183
Female 649
Male 534
2007-01-01 to 2007-12-31 overall 615
Female 237
Male 378
2008-01-01 to 2008-12-31 overall 355
Female 79
Male 276
2009-01-01 to 2009-12-31 overall 66
Female 34
Male 32
overall overall 10828
Male 4051
Female 6777

Finally, disconnect from the cdm

PatientProfiles::mockDisconnect(cdm = cdm)