Skip to contents

Adds a class-dependent sample weights column to a Task, allowing Learners and Measures to weight observations differently during training and evaluation.

Weights are assigned per observation based on the target class and can be written to the "weights_learner" column, the "weights_measure" column, both, or neither.

Binary as well as multiclass classification tasks (TaskClassif) are supported.

Format

R6Class object inheriting from PipeOpTaskPreproc/PipeOp.

Construction

PipeOpClassWeightsEx$new(id = "classweightsex", param_vals = list())

  • id :: character(1)
    Identifier of the resulting object, default "classweightsex"

  • param_vals :: named list
    List of hyperparameter settings, overwriting the hyperparameter settings that would otherwise be set during construction. Default list().

Input and Output Channels

Input and output channels are inherited from PipeOpTaskPreproc. Instead of a Task, a TaskClassif is used as input and output during training and prediction.

The output during training is the input Task with an added weights column according to the target class. The output during prediction is the unchanged input.

State

The $state is a named list with the $state elements inherited from PipeOpTaskPreproc.

Parameters

The parameters are the parameters inherited from PipeOpTaskPreproc; however, the affect_columns parameter is not present. Further parameters are:

  • weights_learner :: logical(1)
    Whether the created weights should be stored as a weights_learner column or not. Initialized to TRUE.

  • weights_measure :: logical(1)
    Whether the created weights should be stored as a weights_measure column or not. Initialized to FALSE.

  • weight_method :: character(1)
    The method that is chosen to determine the weights of the samples. Methods encompass "inverse_class_frequency", "inverse_square_root_of_frequency", "median_frequency_balancing" and "explicit". In case of "explicit", the mapping hyperparameter must be use. Initialized to "explicit".

  • mapping :: named numeric
    A named numeric vector that specifies a finite weight for each target class in the task. This only has an effect if weight_method is explicit.

Internals

Adds a .WEIGHTS column to the Task, which is removed from the feature role and mapped to the requested weight roles. There will be a naming conflict if this column already exists and is not a weight column already. For potentially pre-existing weight columns, the weight column role gets dropped, but they remain in the DataBackend of the Task.
When weight_method = "explicit", the mapping must cover every class present in the training data and may not contain additional classes.
The Learner must support weights for this PipeOp to have an effect.

Fields

Only fields inherited from PipeOp.

Methods

Only methods inherited from PipeOpTaskPreproc/PipeOp.

See also

https://mlr-org.com/pipeops.html

Other PipeOps: PipeOp, PipeOpEncodePL, PipeOpEnsemble, PipeOpImpute, PipeOpTargetTrafo, PipeOpTaskPreproc, PipeOpTaskPreprocSimple, mlr_pipeops, mlr_pipeops_adas, mlr_pipeops_blsmote, mlr_pipeops_boxcox, mlr_pipeops_branch, mlr_pipeops_chunk, mlr_pipeops_classbalancing, mlr_pipeops_classifavg, mlr_pipeops_classweights, mlr_pipeops_colapply, mlr_pipeops_collapsefactors, mlr_pipeops_colroles, mlr_pipeops_copy, mlr_pipeops_datefeatures, mlr_pipeops_decode, mlr_pipeops_encode, mlr_pipeops_encodeimpact, mlr_pipeops_encodelmer, mlr_pipeops_encodeplquantiles, mlr_pipeops_encodepltree, mlr_pipeops_featureunion, mlr_pipeops_filter, mlr_pipeops_fixfactors, mlr_pipeops_histbin, mlr_pipeops_ica, mlr_pipeops_imputeconstant, mlr_pipeops_imputehist, mlr_pipeops_imputelearner, mlr_pipeops_imputemean, mlr_pipeops_imputemedian, mlr_pipeops_imputemode, mlr_pipeops_imputeoor, mlr_pipeops_imputesample, mlr_pipeops_info, mlr_pipeops_isomap, mlr_pipeops_kernelpca, mlr_pipeops_learner, mlr_pipeops_learner_pi_cvplus, mlr_pipeops_learner_quantiles, mlr_pipeops_missind, mlr_pipeops_modelmatrix, mlr_pipeops_multiplicityexply, mlr_pipeops_multiplicityimply, mlr_pipeops_mutate, mlr_pipeops_nearmiss, mlr_pipeops_nmf, mlr_pipeops_nop, mlr_pipeops_ovrsplit, mlr_pipeops_ovrunite, mlr_pipeops_pca, mlr_pipeops_proxy, mlr_pipeops_quantilebin, mlr_pipeops_randomprojection, mlr_pipeops_randomresponse, mlr_pipeops_regravg, mlr_pipeops_removeconstants, mlr_pipeops_renamecolumns, mlr_pipeops_replicate, mlr_pipeops_rowapply, mlr_pipeops_scale, mlr_pipeops_scalemaxabs, mlr_pipeops_scalerange, mlr_pipeops_select, mlr_pipeops_smote, mlr_pipeops_smotenc, mlr_pipeops_spatialsign, mlr_pipeops_subsample, mlr_pipeops_targetinvert, mlr_pipeops_targetmutate, mlr_pipeops_targettrafoscalerange, mlr_pipeops_textvectorizer, mlr_pipeops_threshold, mlr_pipeops_tomek, mlr_pipeops_tunethreshold, mlr_pipeops_unbranch, mlr_pipeops_updatetarget, mlr_pipeops_vtreat, mlr_pipeops_yeojohnson

Examples

library("mlr3")

task = tsk("spam")

poicf = po("classweightsex", param_vals = list(weights_learner = TRUE, weights_measure = TRUE, 
  weight_method = "inverse_class_frequency"))
result = poicf$train(list(task))[[1L]]

if ("weights_learner" %in% names(result)) {
  result$weights_learner  # recent mlr3-versions
} else {
  result$weights  # old mlr3-versions
}
#> Key: <row_id>
#>       row_id   weight
#>        <int>    <num>
#>    1:      1 2.537783
#>    2:      2 2.537783
#>    3:      3 2.537783
#>    4:      4 2.537783
#>    5:      5 2.537783
#>   ---                
#> 4597:   4597 1.650287
#> 4598:   4598 1.650287
#> 4599:   4599 1.650287
#> 4600:   4600 1.650287
#> 4601:   4601 1.650287

if ("weights_measure" %in% names(result)) {
  result$weights_measure  # recent mlr3-versions
} else {
  result$weights  # old mlr3-versions
}
#> Key: <row_id>
#>       row_id   weight
#>        <int>    <num>
#>    1:      1 2.537783
#>    2:      2 2.537783
#>    3:      3 2.537783
#>    4:      4 2.537783
#>    5:      5 2.537783
#>   ---                
#> 4597:   4597 1.650287
#> 4598:   4598 1.650287
#> 4599:   4599 1.650287
#> 4600:   4600 1.650287
#> 4601:   4601 1.650287