This vignette demonstrates the methods of splitting data that are
supported by the splithalfr
. Each splitting method is
illustrated by calling by_split
with the right arguments,
printing to the terminal what data is in each of the two parts produced
by a split. For a comprehensive review of each splitting method, see Pronk et
al. (2021).
We’ll use this example dataset with eight trials of one participant, each trial having a condition and rt variable.
ds <- data.frame(
participant = rep(1, 8),
condition = rep(c("a", "b"), each = 4),
rt = 100 * 1 : 8
)
First-second splitting assigns trials of the first half of rows to
one part and trials of the second half of rows to the other (Green et al., 2016;
Webb, Shavelson,
& Haertel, 1996; Williams &
Kaufmann, 2012). For this splitting method, set method
to first_second
.
dummy = by_split(
ds,
ds$participant,
method = "first_second",
function(ds) { print(ds); },
ncores = 1,
verbose = F
)
Odd-even splitting assigns trials with an odd row number to one part
and trials with an even row number to the other (Green et al., 2016;
Webb, Shavelson,
& Haertel, 1996; Williams &
Kaufmann, 2012). For this splitting method, set method
to odd_even
.
dummy = by_split(
ds,
ds$participant,
method = "odd_even",
function(ds) { print(ds); },
ncores = 1,
verbose = F
)
Permutated splitting is also known as random splitting (Kopp, Lange, &
Steinke, 2021), bootstrapped splitting (Parsons, Kruijt, &
Fox, 2019) and random sample of split halves (Williams &
Kaufmann, 2012). It assigns trials to each part via random sampling
without replacement. This splitting method is the default, but you can
make it explicit by setting method
to random
.
In practice, random splits are averaged over many replications, but for
illustration we’re only printing one.
dummy = by_split(
ds,
ds$participant,
method = "random",
replications = 1,
function(ds) { print(ds); },
ncores = 1,
verbose = F
)
Monte Carlo splitting assigns trials to each part by sampling with
replacement (Williams &
Kaufmann, 2012). For constructing parts that are of any length, use
the split_p
argument and set replace
to
TRUE
. The example below constructs two parts of the same
length as the original dataset by setting split_p
to 1.
dummy = by_split(
ds,
ds$participant,
method = "random",
replace = TRUE,
split_p = 1,
replications = 1,
function(ds) { print(ds); },
ncores = 1,
verbose = F
)
If a split is stratified by a variable, then trials are separately
assigned to each part for each level of that variable (Green et al.,
2016). For example, if splits are stratified by
ds$condition
, the trials with condition a and b are split
separately. Stratification can be used in combination with any of the
methods above. For illustration we combine it with first-second
splitting
dummy = by_split(
ds,
ds$participant,
method = "first_second",
stratification = ds$condition,
function(ds) { print(ds); },
ncores = 1,
verbose = F
)
In a subsampled split, a subset of the trials is randomly sampled without replacement and then split (see the supplementary materials of Hedge, Powell, & Sumner, 2018). Sub-sampling only works well with splitting methods that uses random sampling (permutated and Monte Carlo). Since the sub-sampling procedure already randomizes the trials selected for splitting, splitting methods that assign trials to part based on their row number, such as first-second and odd-even, should give results that are similar to permutated splitting. Any stratifications are applied both to the sub-sampling and splitting.
dummy = by_split(
ds,
ds$participant,
method = "random",
stratification = ds$condition,
subsample_p = 0.5,
function(ds) { print(ds); },
ncores = 1,
verbose = F
)