---
title: "Coefficients Covariance Matrix Adjustment"
package: mmrm
bibliography: '`r system.file("REFERENCES.bib", package = "mmrm")`'
output:
  rmarkdown::html_vignette:
          toc: true
vignette: |
  %\VignetteIndexEntry{Coefficients Covariance Matrix Adjustment}
  %\VignetteEncoding{UTF-8}
  %\VignetteEngine{knitr::rmarkdown}
editor_options:
  chunk_output_type: console
  markdown:
    wrap: 72
---

```{r, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)
```

Here we describe the variance-covariance matrix adjustment of coefficients.

# Introduction

To estimate the covariance matrix of coefficients, there are many ways.
In `mmrm` package, we implemented asymptotic, empirical, Jackknife and Kenward-Roger methods.
For simplicity, the following derivation are all for unweighted mmrm.
For weighted mmrm, we can follow the [details of weighted least square estimator](algorithm.html#weighted-least-squares-estimator).

## Asymptotic Covariance

Asymptotic covariance are derived based on the estimate of $\beta$.

Following the definition in [details in model fitting](algorithm.html#linear-model), we have

\[
  \hat\beta = (X^\top W X)^{-1} X^\top W Y
\]

\[
  cov(\hat\beta) = (X^\top W X)^{-1} X^\top W cov(\epsilon) W X (X^\top W X)^{-1} = (X^\top W X)^{-1}
\]

Where $W$ is the block diagonal matrix of inverse of covariance matrix of $\epsilon$.

## Empirical Covariance

Empirical covariance, also known as the robust sandwich estimator, or "CR0", is derived by replacing the covariance matrix of $\epsilon$ by observed
covariance matrix.

\[
  cov(\hat\beta) = (X^\top W X)^{-1}(\sum_{i}{X_i^\top W_i \hat\epsilon_i\hat\epsilon_i^\top W_i X_i})(X^\top W X)^{-1}
  = (X^\top W X)^{-1}(\sum_{i}{X_i^\top L_{i} L_{i}^\top \hat\epsilon_i\hat\epsilon_i^\top L_{i} L_{i}^\top X_i})(X^\top W X)^{-1}
\]

Where $W_i$ is the block diagonal part for subject $i$ of $W$ matrix, $\hat\epsilon_i$ is the observed residuals for subject i,
$L_i$ is the Cholesky factor of $\Sigma_i^{-1}$ ($W_i = L_i L_i^\top$). 
In the sandwich, the score $X_i^\top W_i \hat\epsilon_i$ computed for 
subject $i$ can be accessed by `component(mmrm_obj, name = "score_per_subject")`. 

See the detailed explanation of these formulas in the [Weighted Least Square Empirical Covariance](empirical_wls.html) vignette.

## Jackknife Covariance

Jackknife method in `mmrm` is the "leave-one-cluster-out" method. It is also known as "CR3".
Following @mccaffrey2003bias, we have

\[
  cov(\hat\beta) = (X^\top W X)^{-1}(\sum_{i}{X_i^\top L_{i} (I_{i} - H_{ii})^{-1} L_{i}^\top \hat\epsilon_i\hat\epsilon_i^\top L_{i} (I_{i} - H_{ii})^{-1} L_{i}^\top X_i})(X^\top W X)^{-1}
\]

where

\[H_{ii} = X_i(X^\top X)^{-1}X_i^\top\]

Please note that in the paper there is an additional scale parameter $\frac{n-1}{n}$ where $n$ is the number of subjects, here we do not include this parameter.

## Bias-Reduced Covariance

Bias-reduced method, also known as "CR2", provides unbiased under correct working model.
Following @mccaffrey2003bias, we have
\[
  cov(\hat\beta) = (X^\top W X)^{-1}(\sum_{i}{X_i^\top L_{i} (I_{i} - H_{ii})^{-1/2} L_{i}^\top \hat\epsilon_i\hat\epsilon_i^\top L_{i} (I_{i} - H_{ii})^{-1} L_{i}^\top X_i})(X^\top W X)^{-1}
\]

where

\[H_{ii} = X_i(X^\top X)^{-1}X_i^\top\]

## Kenward-Roger Covariance

Kenward-Roger covariance is an adjusted covariance matrix for small sample size.
Details can be found in [Kenward-Roger](kenward.html#mathematical-details-of-kenward-roger-method)