The prcbench
package is a testing workbench for
evaluating precision-recall curves, which requires simple three step
processes to perform evaluations of libraries that create
precision-recall plots.
Tool selection by using the tool interface
Test data selection/creation by using the test data interface
Select pre-defined test data for the accuracy evaluation
Define randomly generated test data for the running-time evaluation
Run a evaluation function with the selected tools and test data sets
Accuracy evaluation of precision-recall curves
Running-time evaluation of precision-recall curves
In addition to predifined tools and test data sets, the
prcbench
package provides help functions for users to
define their own tools and datasets.
User-defined test data interface
User-defined test data for the accuracy evaluation
User-defined test data for the running-time evaluation
The prcbench
package provides predefined interfaces for
the following five tools that calculate precision-recall curves.
Tool | Language | Link |
---|---|---|
precrec | R | Tool web site, CRAN |
ROCR | R | Tool web site, CRAN |
PRROC | R | CRAN |
AUCCalculator | Java | Tool web site |
PerfMeas | R | CRAN |
The create_toolset
function generates a tool set with a
combination of the five tools.
library(prcbench)
## A single tool
<- create_toolset("ROCR")
toolsetA
## Multiple tools
<- create_toolset(c("PerfMeas", "PRROC"))
toolsetB
## Tool sets can be manually combined to a single set
<- c(toolsetA, toolsetB) toolsetAB
create_toolset
functionThe create_toolset
function takes two additional
arguments - calc_auc
and store_res
.
calc_auc
decides whether tools calculate AUC score
or not (Calculation of AUCs are optional for the running-time
evaluation, but not necessary for the evaluation of accurate
precision-recall curves)
store_res
decides whether tools store the calculated
curves or not (actual curves are required for the evaluation of accurate
precision-recall curves)
The following six tool sets are predefined with a different combination of tools along with default argument values.
Set name | Tools | calc_auc | store_res |
---|---|---|---|
def5 | ROCR, AUCCalculator, PerfMeas, PRROC, precrec | TRUE | TRUE |
auc5 | ROCR, xAUCCalculator, PerfMeas, PRROC, precrec | TRUE | FALSE |
crv5 | ROCR, AUCCalculator, PerfMeas, PRROC, precrec | FALSE | TRUE |
def4 | ROCR, AUCCalculator, PerfMeas, precrec | TRUE | TRUE |
auc4 | ROCR, AUCCalculator, PerfMeas, precrec | TRUE | FALSE |
crv4 | ROCR, AUCCalculator, PerfMeas, precrec | FALSE | TRUE |
## Use 'set_names'
<- create_toolset(set_names = "auc5")
toolsetC
## Multiple sets are automatically combined to a single set
<- create_toolset(set_names = c("auc5", "crv4")) toolsetD
The prcbench
package provides two different types of
test data sets.
curve
: evaluates the accuracy of precision-recall
curvesbench
: measures running times of creating
precision-recall curvesThe create_testset
function offers both types of test
data by setting the first argument either as “curve” or “bench”.
The create_testset
function takes predefined set names
for curve evaluation. These data sets contain pre-calculated precision
and recall values. The pre-calculated values must be correct so that
they can be compared with the results of specified tools.
The following four test sets are currently available.
name | #scores&labels | #pos labels | #neg labels | expected #points | expected start | expected end |
---|---|---|---|---|---|---|
c1 | 4 | 2 | 2 | 6 | (0, 1) | (1, 0.5) |
c2 | 4 | 2 | 2 | 6 | (0, 0.5) | (1, 0.5) |
c3 | 4 | 2 | 2 | 6 | (0, 0) | (1, 0.5) |
c4 | 8 | 4 | 4 | 9 | (0, 1) | (1, 0.5) |
## C1 test set
<- create_testset("curve", "c1")
testset2A
## C2 test set
<- create_testset("curve", "c2")
testset2B
## Test data sets can be manually combined to a single set
<- c(testset2A, testset2B)
testset2AB
## Multiple sets are automatically combined to a single set
<- create_testset("curve", c("c1", "c2")) testset2C
The create_testset
function uses a naming convention for
randomly generated data for benchmarking. The format is a prefix (‘b’ or
‘i’) followed by the number of dataset. The prefix ‘b’ indicates a
balanced dataset, whereas ‘i’ indicates an imbalanced dataset. The
number can be used with a suffix ‘k’ or ‘m’, indicating respectively
1000 or 1 million.
## A balanced data set with 50 positives and 50 negatives
<- create_testset("bench", "b100")
testset1A
## An imbalanced data set with 2500 positives and 7500 negatives
<- create_testset("bench", "i10k")
testset1B
## Test data sets can be manually combined to a single set
<- c(testset1A, testset1B)
testset1AB
## Multiple sets are automatically combined to a single set
<- create_testset("bench", c("i10", "b10")) testset1C
The prcbench
package currently provides two differnt
types of peformance evaluation.
Accuracy evaluation of precision-recall curves
Running-time evaluation of precision-recall curves
The run_evalcurve
function evaluates precision-recall
curves with the following five test cases. The basic idea is that the
function returns the full score as long as the points generated by a
library matches with the manually calculated recall and precision
values.
Test case | Description |
---|---|
fpoint | Check the first point |
int_pts | Check the intermediate points |
epoint | Check the end point |
x_range | Evaluate a range between two recall values |
y_range | Evaluate a range between two precision values |
The run_evalcurve
function calculates the scores of the
test cases and summarizes them to a data frame.
## Evaluate precision-recall curves for ROCR and precrec with c1 test set
<- create_testset("curve", "c1")
testset <- create_toolset(c("ROCR", "precrec"))
toolset <- run_evalcurve(testset, toolset)
scores scores
## testset toolset toolname score
## 1 c1 ROCR ROCR 5/8
## 2 c1 precrec precrec 8/8
The result of each test case can be displayed by specifying
data_type
= all
of the print
function.
## Print all results
print(scores, data_type = "all")
## testset toolset toolname testitem testcat success total
## 1 c1 ROCR ROCR x_range Rg 1 1
## 2 c1 ROCR ROCR y_range Rg 1 1
## 3 c1 ROCR ROCR fpoint SE 0 1
## 4 c1 ROCR ROCR intpts Ip 2 4
## 5 c1 ROCR ROCR epoint SE 1 1
## 6 c1 precrec precrec x_range Rg 1 1
## 7 c1 precrec precrec y_range Rg 1 1
## 8 c1 precrec precrec fpoint SE 1 1
## 9 c1 precrec precrec intpts Ip 4 4
## 10 c1 precrec precrec epoint SE 1 1
The autoplot
shows a plot with the result of the
run_evalcurve
function.
## ggplot2 is necessary to use autoplot
library(ggplot2)
## Plot base points and the result of precrec on c1, c2, and c3 test sets
<- create_testset("curve", c("c1", "c2", "c3"))
testset <- create_toolset("precrec")
toolset <- run_evalcurve(testset, toolset)
scores1 autoplot(scores1)
## Plot the results of PerfMeas and PRROC on c1, c2, and c3 test sets
<- create_toolset(c("PerfMeas", "PRROC"))
toolset <- run_evalcurve(testset, toolset)
scores2 autoplot(scores2, base_plot = FALSE)
The run_benchmark
function internally calls the
microbenchmark
function provided by the microbenchmark
package. It takes a test set and a tool set and returns the result of
microbenchmark
.
## Run microbenchmark for aut5 on b10
<- create_testset("bench", "b10")
testset <- create_toolset(set_names = "auc5")
toolset <- run_benchmark(testset, toolset)
res res
## testset toolset toolname min lq mean median uq max neval
## 1 b10 auc5 AUCCalculator 1.883 2.238 4.78 2.733 3.41 13.62 5
## 2 b10 auc5 PRROC 0.137 0.140 0.17 0.145 0.15 0.26 5
## 3 b10 auc5 PerfMeas 0.057 0.059 0.10 0.077 0.12 0.21 5
## 4 b10 auc5 ROCR 1.537 1.543 1.62 1.547 1.58 1.90 5
## 5 b10 auc5 precrec 3.558 3.627 3.73 3.630 3.79 4.03 5
In addition to the predefined five tools, users can add new tool
interfaces for their own tools to run benchmarking and curve evaluation.
The create_usrtool
function takes a name of the tool and a
function for calculating a precision-recall curve.
## Create a new tool set for 'xyz'
<- "xyz"
toolname <- create_example_func()
calcfunc <- create_usrtool(toolname, calcfunc)
toolsetU
## User-defined tools can be combined with predefined tools
<- create_toolset("ROCR")
toolsetA <- c(toolsetA, toolsetU) toolsetU2
Like the predefined tool sets, user-defined tool sets can be used for
both run_benchmark
and run_evalcurve
.
## Curve evaluation
<- create_testset("curve", "c2")
testset3 <- run_evalcurve(testset3, toolsetU2)
scores3 autoplot(scores3, base_plot = FALSE)
The create_example_func
function creates an example for
the second argument of the create_usrtool
function. The
actual function should also take a testset
generated by the
create_testset
function and returns a list with three
elements - x
, y
, and auc
.
## Show an example of the second argument
<- create_example_func()
calcfunc print(calcfunc)
## function (single_testset)
## {
## scores <- single_testset$get_scores()
## list(x = seq(0, 1, 1/length(scores)), y = seq(0, 1, 1/length(scores)),
## auc = 0.5)
## }
## <bytecode: 0x5612b4b88920>
## <environment: 0x5612b5cea558>
The create_testset
function produces a
testset
as either TestDataB
or
TestDataC
object. See the help files of the R6 classes -
help(TestDataB)
and help(TestDataC)
- for the
methods that can be used with the precision-recall calculation.
The prcbench
package also supports user-defined test
data interfaces. The create_usrdata
function creates two
types of test datasets.
User-defined test data for the accuracy evaluation
User-defined test data for the running-time evaluation
The first argument of the create_usrdata
function should
be “curve” to create a test dataset for the accuracy evaluation. Scores
and labels as well as pre-calculated recall and precision values are
required. These pre-calculated values are used to compare with the
corresponding values predicted by the specified tools.
## Create a test dataset 'c5' for benchmarking
<- create_usrdata("curve",
testsetC scores = c(0.1, 0.2), labels = c(1, 0),
tsname = "c5", base_x = c(0.0, 1.0),
base_y = c(0.0, 0.5)
)
It can be used in the same way as the predefined test datasets
selected by create_testset
.
## Run curve evaluation for ROCR and precrec on a predefined test dataset
<- create_toolset(c("ROCR", "precrec"))
toolset2 <- run_evalcurve(testsetC, toolset2)
scores2 autoplot(scores2, base_plot = FALSE)
The first argument of the create_usrdata
function should
be “bench” to create a test dataset for the running-time evaluation.
Scores and labels are also required.
## Create a test dataset 'b5' for benchmarking
<- create_usrdata("bench",
testsetB scores = c(0.1, 0.2), labels = c(1, 0),
tsname = "b5"
)
It can be used in the same way as the test datasets generated by
create_testset
.
## Run microbenchmark for ROCR and precrec on a predefined test dataset
<- create_toolset(c("ROCR", "precrec"))
toolset <- run_benchmark(testsetB, toolset)
res res
## testset toolset toolname min lq mean median uq max neval
## 1 b5 ROCR ROCR 1.5 1.6 1.6 1.6 1.7 1.8 5
## 2 b5 precrec precrec 3.6 3.7 3.8 3.8 3.9 4.3 5
See our website - Classifier evaluation with imbalanced datasets – for useful tips for performance evaluation on binary classifiers. In addition, we have summarized potential pitfalls of ROC plots with imbalanced datasets. See our paper – The Precision-Recall Plot Is More Informative than the ROC Plot When Evaluating Binary Classifiers on Imbalanced Datasets - for more details.