Tune the Threshold of a Classification Prediction

Tunes optimal probability thresholds over different PredictionClassifs.

mlr3::Learner predict_type: "prob" is required. Thresholds for each learner are optimized using the Optimizer supplied via the param_set. Defaults to GenSA. Returns a single PredictionClassif.

This PipeOp should be used in conjunction with PipeOpLearnerCV in order to optimize thresholds of cross-validated predictions. In order to optimize thresholds without cross-validation, use PipeOpLearnerCV in conjunction with ResamplingInsample.

Format

R6Class object inheriting from PipeOp.

Construction

PipeOpTuneThreshold$new(id = "tunethreshold", param_vals = list())

id :: character(1)
Identifier of resulting object. Default: "tunethreshold".
param_vals :: named list
List of hyperparameter settings, overwriting the hyperparameter settings that would otherwise be set during construction. Default list().

Input and Output Channels

Input and output channels are inherited from PipeOp.

State

The $state is a named list with elements

thresholds :: numeric
Learned thresholds;

Parameters

The parameters are the parameters inherited from PipeOp, as well as:

measure :: Measure | character
Measure to optimize for. Will be converted to a Measure in case it is character. Initialized to "classif.ce", i.e. misclassification error.
optimizer :: Optimizer|character(1)
Optimizer used to find optimal thresholds. If character, converts to Optimizer via opt. Initialized to OptimizerGenSA.
log_level :: character(1) | integer(1)
Set a temporary log-level for lgr::get_logger("mlr3/bbotk"). Initialized to: "warn".

Internals

Uses the optimizer provided as a param_val in order to find an optimal threshold. See the optimizer parameter for more info.

Fields

Fields inherited from PipeOp, as well as:

predict_type :: character(1)
Type of prediction to return. Either "prob" (default) or "response". Setting to "response" should rarely be used; it may potentially save some memory but has no other benefits.

Methods

Only methods inherited from PipeOp.

Examples

library("mlr3")

task = tsk("iris")
pop = po("learner_cv", lrn("classif.rpart", predict_type = "prob")) %>>%
  po("tunethreshold")

task$data()
#>        Species Petal.Length Petal.Width Sepal.Length Sepal.Width
#>         <fctr>        <num>       <num>        <num>       <num>
#>   1:    setosa          1.4         0.2          5.1         3.5
#>   2:    setosa          1.4         0.2          4.9         3.0
#>   3:    setosa          1.3         0.2          4.7         3.2
#>   4:    setosa          1.5         0.2          4.6         3.1
#>   5:    setosa          1.4         0.2          5.0         3.6
#>  ---                                                            
#> 146: virginica          5.2         2.3          6.7         3.0
#> 147: virginica          5.0         1.9          6.3         2.5
#> 148: virginica          5.2         2.0          6.5         3.0
#> 149: virginica          5.4         2.3          6.2         3.4
#> 150: virginica          5.1         1.8          5.9         3.0
pop$train(task)
#> OptimInstanceSingleCrit is deprecated. Use OptimInstanceBatchSingleCrit instead.
#> $tunethreshold.output
#> NULL
#> 

pop$state
#> $classif.rpart
#> $model
#> n= 150 
#> 
#> node), split, n, loss, yval, (yprob)
#>       * denotes terminal node
#> 
#> 1) root 150 100 setosa (0.33333333 0.33333333 0.33333333)  
#>   2) Petal.Length< 2.45 50   0 setosa (1.00000000 0.00000000 0.00000000) *
#>   3) Petal.Length>=2.45 100  50 versicolor (0.00000000 0.50000000 0.50000000)  
#>     6) Petal.Width< 1.75 54   5 versicolor (0.00000000 0.90740741 0.09259259) *
#>     7) Petal.Width>=1.75 46   1 virginica (0.00000000 0.02173913 0.97826087) *
#> 
#> $param_vals
#> $param_vals$xval
#> [1] 0
#> 
#> 
#> $log
#> Empty data.table (0 rows and 3 cols): stage,class,msg
#> 
#> $train_time
#> [1] 0.003
#> 
#> $task_hash
#> [1] "abc694dd29a7a8ce"
#> 
#> $feature_names
#> [1] "Petal.Length" "Petal.Width"  "Sepal.Length" "Sepal.Width" 
#> 
#> $validate
#> NULL
#> 
#> $mlr3_version
#> [1] ‘1.2.0’
#> 
#> $data_prototype
#> Empty data.table (0 rows and 5 cols): Species,Petal.Length,Petal.Width,Sepal.Length,Sepal.Width
#> 
#> $task_prototype
#> Empty data.table (0 rows and 5 cols): Species,Petal.Length,Petal.Width,Sepal.Length,Sepal.Width
#> 
#> $train_task
#> 
#> ── <TaskClassif> (150x5): Iris Flowers ─────────────────────────────────────────
#> • Target: Species
#> • Target classes: setosa, versicolor, virginica
#> • Properties: multiclass
#> • Features (4):
#>   • dbl (4): Petal.Length, Petal.Width, Sepal.Length, Sepal.Width
#> 
#> $predict_method
#> [1] "full"
#> 
#> $affected_cols
#> [1] "Petal.Length" "Petal.Width"  "Sepal.Length" "Sepal.Width" 
#> 
#> $intasklayout
#> Key: <id>
#>              id    type
#>          <char>  <char>
#> 1: Petal.Length numeric
#> 2:  Petal.Width numeric
#> 3: Sepal.Length numeric
#> 4:  Sepal.Width numeric
#> 
#> $outtasklayout
#> Key: <id>
#>                               id    type
#>                           <char>  <char>
#> 1:     classif.rpart.prob.setosa numeric
#> 2: classif.rpart.prob.versicolor numeric
#> 3:  classif.rpart.prob.virginica numeric
#> 
#> $outtaskshell
#> Empty data.table (0 rows and 4 cols): Species,classif.rpart.prob.setosa,classif.rpart.prob.versicolor,classif.rpart.prob.virginica
#> 
#> attr(,"class")
#> [1] "pipeop_learner_cv_state" "learner_state"          
#> [3] "list"                   
#> 
#> $tunethreshold
#> $tunethreshold$threshold
#>     setosa versicolor  virginica 
#>  0.5852014  0.3398626  0.6166942 
#> 
#>