Skip to contents

Adds a class-dependent sample weights column to a Task, allowing Learners and Measures to weight observations differently during training and evaluation.

Weights are assigned per observation based on the target class and can be written to the "weights_learner" column, the "weights_measure" column, both, or neither.

Only binary classification tasks (TaskClassif) are supported.

Note: By default, all weights are set to 1. To obtain a meaningful effect, the minor_weight parameter must be adjusted.

See PipeOpClassWeightsEx for an extended version of this PipeOp which can handle multiclass classification tasks and offers several methods for automatically determining weights.

Format

R6Class object inheriting from PipeOpTaskPreproc/PipeOp.

Construction

PipeOpClassWeights$new(id = "classweights", param_vals = list())

  • id :: character(1)
    Identifier of the resulting object, default "classweights"

  • param_vals :: named list
    List of hyperparameter settings, overwriting the hyperparameter settings that would otherwise be set during construction. Default list().

Input and Output Channels

Input and output channels are inherited from PipeOpTaskPreproc. Instead of a Task, a TaskClassif is used as input and output during training and prediction.

The output during training is the input Task with an added weights column according to the target class. The output during prediction is the unchanged input.

State

The $state is a named list with the $state elements inherited from PipeOpTaskPreproc.

Parameters

The parameters are the parameters inherited from PipeOpTaskPreproc; however, the affect_columns parameter is not present. Further parameters are:

  • minor_weight :: numeric(1)
    Weight given to samples of the minor class. Major class samples have weight 1. Initialized to 1.

  • weights_learner :: logical(1)
    Whether the created weights should be stored as a weights_learner column or not. Initialized to TRUE.

  • weights_measure :: logical(1)
    Whether the created weights should be stored as a weights_measure column or not. Initialized to FALSE.

Internals

Adds a .WEIGHTS column to the Task, which is removed from the feature role and mapped to the requested weight roles. There will be a naming conflict if this column already exists and is not a weight column already. For potentially pre-existing weight columns, the weight column role gets dropped, but they remain in the DataBackend of the Task. The Learner must support weights for this PipeOp to have an effect.

Fields

Only fields inherited from PipeOp.

Methods

Only methods inherited from PipeOpTaskPreproc/PipeOp.

See also

https://mlr-org.com/pipeops.html

Other PipeOps: PipeOp, PipeOpEncodePL, PipeOpEnsemble, PipeOpImpute, PipeOpTargetTrafo, PipeOpTaskPreproc, PipeOpTaskPreprocSimple, mlr_pipeops, mlr_pipeops_adas, mlr_pipeops_blsmote, mlr_pipeops_boxcox, mlr_pipeops_branch, mlr_pipeops_chunk, mlr_pipeops_classbalancing, mlr_pipeops_classifavg, mlr_pipeops_classweightsex, mlr_pipeops_colapply, mlr_pipeops_collapsefactors, mlr_pipeops_colroles, mlr_pipeops_copy, mlr_pipeops_datefeatures, mlr_pipeops_decode, mlr_pipeops_encode, mlr_pipeops_encodeimpact, mlr_pipeops_encodelmer, mlr_pipeops_encodeplquantiles, mlr_pipeops_encodepltree, mlr_pipeops_featureunion, mlr_pipeops_filter, mlr_pipeops_fixfactors, mlr_pipeops_histbin, mlr_pipeops_ica, mlr_pipeops_imputeconstant, mlr_pipeops_imputehist, mlr_pipeops_imputelearner, mlr_pipeops_imputemean, mlr_pipeops_imputemedian, mlr_pipeops_imputemode, mlr_pipeops_imputeoor, mlr_pipeops_imputesample, mlr_pipeops_info, mlr_pipeops_isomap, mlr_pipeops_kernelpca, mlr_pipeops_learner, mlr_pipeops_learner_pi_cvplus, mlr_pipeops_learner_quantiles, mlr_pipeops_missind, mlr_pipeops_modelmatrix, mlr_pipeops_multiplicityexply, mlr_pipeops_multiplicityimply, mlr_pipeops_mutate, mlr_pipeops_nearmiss, mlr_pipeops_nmf, mlr_pipeops_nop, mlr_pipeops_ovrsplit, mlr_pipeops_ovrunite, mlr_pipeops_pca, mlr_pipeops_proxy, mlr_pipeops_quantilebin, mlr_pipeops_randomprojection, mlr_pipeops_randomresponse, mlr_pipeops_regravg, mlr_pipeops_removeconstants, mlr_pipeops_renamecolumns, mlr_pipeops_replicate, mlr_pipeops_rowapply, mlr_pipeops_scale, mlr_pipeops_scalemaxabs, mlr_pipeops_scalerange, mlr_pipeops_select, mlr_pipeops_smote, mlr_pipeops_smotenc, mlr_pipeops_spatialsign, mlr_pipeops_splines, mlr_pipeops_subsample, mlr_pipeops_targetinvert, mlr_pipeops_targetmutate, mlr_pipeops_targettrafoscalerange, mlr_pipeops_textvectorizer, mlr_pipeops_threshold, mlr_pipeops_tomek, mlr_pipeops_tunethreshold, mlr_pipeops_unbranch, mlr_pipeops_updatetarget, mlr_pipeops_vtreat, mlr_pipeops_yeojohnson

Examples

library("mlr3")

task = tsk("spam")
opb = po("classweights")

# task weights
if ("weights_learner" %in% names(task)) {
  task$weights_learner  # recent mlr3-versions
} else {
  task$weights  # old mlr3-versions
}
#> NULL

# double the instances in the minority class (spam)
opb$param_set$values$minor_weight = 2
result = opb$train(list(task))[[1L]]
if ("weights_learner" %in% names(result)) {
  result$weights_learner  # recent mlr3-versions
} else {
  result$weights  # old mlr3-versions
}
#> Key: <row_id>
#>       row_id weight
#>        <int>  <num>
#>    1:      1      2
#>    2:      2      2
#>    3:      3      2
#>    4:      4      2
#>    5:      5      2
#>   ---              
#> 4597:   4597      1
#> 4598:   4598      1
#> 4599:   4599      1
#> 4600:   4600      1
#> 4601:   4601      1