% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/ds.mdPattern.R
\name{ds.mdPattern}
\alias{ds.mdPattern}
\title{Display missing data patterns with disclosure control}
\usage{
ds.mdPattern(x = NULL, type = "split", datasources = NULL)
}
\arguments{
\item{x}{a character string specifying the name of a data frame or matrix on the 
server-side containing the data to analyze.}

\item{type}{a character string specifying the output type. If 'split' (default), 
returns separate patterns for each study. If 'combine', attempts to pool patterns 
across studies.}

\item{datasources}{a list of \code{\link[DSI]{DSConnection-class}} objects obtained 
after login. If the \code{datasources} argument is not specified, the default set of 
connections will be used: see \code{\link[DSI]{datashield.connections_default}}.}
}
\value{
For type='split': A list with one element per study, each containing:
\describe{
  \item{pattern}{The missing data pattern matrix for that study}
  \item{valid}{Logical indicating if all patterns meet disclosure requirements}
  \item{message}{A message describing the validity status}
}

For type='combine': A list containing:
\describe{
  \item{pattern}{The pooled missing data pattern matrix across all studies}
  \item{valid}{Logical indicating if all pooled patterns meet disclosure requirements}
  \item{message}{A message describing the validity status}
}
}
\description{
This function is a client-side wrapper for the server-side mdPatternDS
function. It generates a missing data pattern matrix similar to mice::md.pattern but
with disclosure control applied to prevent revealing small cell counts.
}
\details{
The function calls the server-side mdPatternDS function which uses
mice::md.pattern to analyze missing data patterns. Patterns with counts below the 
disclosure threshold (default: nfilter.tab = 3) are suppressed to maintain privacy.

\strong{Output Format:}
- Each row represents a missing data pattern
- Pattern counts are shown in row names (e.g., "150", "25")
- Columns show 1 if the variable is observed, 0 if missing
- Last column shows the total number of missing values per pattern
- Last row shows the total number of missing values per variable

\strong{Disclosure Control:}

Suppressed patterns (count below threshold) are indicated by:
- Row name: "suppressed(<N>)" where N is the threshold
- All pattern values set to NA
- Summary row also suppressed to prevent back-calculation

\strong{Pooling Behavior (type='combine'):}

When pooling across studies, the function uses a \emph{conservative approach} 
for disclosure control:

1. Identifies identical missing patterns across studies
2. \strong{EXCLUDES suppressed patterns from pooling} - patterns suppressed in 
   ANY study are not included in the pooled count
3. Sums counts only for non-suppressed identical patterns
4. Re-validates pooled counts against disclosure threshold

\strong{Important:} This conservative approach means:
- Pooled counts may be \emph{underestimates} if some studies had suppressed patterns
- This prevents disclosure through subtraction (e.g., if study A shows count=5 
  and pool shows count=7, one could deduce study B has count=2, violating disclosure)
- Different patterns across studies are preserved separately in the pooled result
}
\examples{
\dontrun{
 ## Version 6, for version 5 see the Wiki

  # Connecting to the Opal servers

  require('DSI')
  require('DSOpal')
  require('dsBaseClient')

  builder <- DSI::newDSLoginBuilder()
  builder$append(server = "study1",
                 url = "http://192.168.56.100:8080/",
                 user = "administrator", password = "datashield_test&",
                 table = "CNSIM.CNSIM1", driver = "OpalDriver")
  builder$append(server = "study2",
                 url = "http://192.168.56.100:8080/",
                 user = "administrator", password = "datashield_test&",
                 table = "CNSIM.CNSIM2", driver = "OpalDriver")
  logindata <- builder$build()

  connections <- DSI::datashield.login(logins = logindata, assign = TRUE, symbol = "D")

  # Get missing data patterns for each study separately
  patterns_split <- ds.mdPattern(x = "D", type = "split", datasources = connections)

  # View results for study1
  print(patterns_split$study1$pattern)
  #      var1 var2 var3
  # 150    1    1    1  0    <- 150 obs complete
  #  25    0    1    1  1    <- 25 obs missing var1
  #       25    0    0 25    <- Summary: 25 missing per variable

  # Get pooled missing data patterns across studies
  patterns_pooled <- ds.mdPattern(x = "D", type = "combine", datasources = connections)
  print(patterns_pooled$pattern)

  # Example with suppressed patterns:
  # If study1 has a pattern with count=2 (suppressed) and study2 has same pattern 
  # with count=5 (valid), the pooled result will show count=5 (conservative approach)
  # A warning will indicate: "Pooled counts may underestimate the true total"

  # Clear the Datashield R sessions and logout
  datashield.logout(connections)
}

}
\author{
Xavier Escribà montagut for DataSHIELD Development Team
}
