Version 1.1.0 makes two changes. First, it enables estimation of the
conditional misclassification rate of predictions by classification
random forests as proposed by Lu and Hardin (2021). Second, it
compartmentalizes a costly step in the quantForestError
algorithm: The identification of each training observation’s out-of-bag
terminal nodes.
The conditional misclassification rate of predictions by
classification random forests can now be estimated. To estimate it,
simply set the what
argument in the
quantForestError
function to "mcr"
.
what
will default to this if the provided
forest
is a classification random forest. See the example
code in the README for a toy demonstration of the performance of this
estimator.
The identification of each training observation’s out-of-bag terminal
nodes is now compartmentalized from the main
quantForestError
function. By isolating this step from the
main quantForestError
function, Version 1.1.0 allows users
to more efficiently iterate the algorithm. Users may wish to feed
quantForestError
batches of test observations iteratively
if they have streaming data or a large test set that cannot be processed
in one go due to memory constraints. In previous versions of this
package, doing so would require the algorithm to recompute each training
observation’s out-of-bag terminal nodes in each iteration. This was
redundant and costly. By separating this computation from the rest of
the quantForestError
algorithm, Version 1.1.0 allows the
user to perform this computation only once.
As part of this modularization, the quantForestError
function now has two additional arguments. If set to TRUE
,
return_train_nodes
will return a data.table
identifying each training observation’s out-of-bag terminal nodes. This
data.table
can then be fed back into
quantForestError
via the argument train_nodes
to avoid the redundant recomputation.
Version 1.1.0 also exports the function that produces the
data.table
identifying each training observation’s
out-of-bag terminal nodes. It is called findOOBErrors
.
Assuming the same inputs, findOOBErrors
will produce the
same output that is returned by setting return_train_nodes
to TRUE
in the quantForestError
function.
See the documentation on quantForestError
and
findOOBErrors
for examples.
Neither of these changes affects code that relied on Version 1.0.0 of
this package, as the changes consist solely of a newly exported
function, two optional arguments to quantForestError
that
by default do nothing new, and a new possible input for the
what
argument.
This package has been updated to reflect the conventional sign of bias (mean prediction minus mean response). Previous versions of the package returned negative bias (mean response minus mean prediction). The sign of any algebraic operations involving the bias outputted by this package must therefore be reversed to preserve their intended effect.
In the future, we hope to implement a stochastic version of the
quantForestError
function, in which the parameters are
estimated by random subsets of the training sample and/or the trees of
the random forest.
Thanks to John Sheffield (Github
Profile) for his helpful improvements to the computational
performance of this package. (See the Issue Tracker
for details.) These changes, which substantially reduce the runtime and
memory load of this package’s quantForestError
,
perror
, and qerror
functions, have been
implemented in Version 0.2.0.
Version 0.2.0 also now allows the user to generate conditional
prediction intervals with different type-I error rates in a single call
of the quantForestError
function.