Impute factorial features by adding a new level ".MISSING".

Impute numerical features by constant values shifted below the minimum or above the maximum by using \(min(x) - offset - multiplier * diff(range(x))\) or \(max(x) + offset + multiplier * diff(range(x))\).

This type of imputation is especially sensible in the context of tree-based methods, see also Ding & Simonoff (2010).

Format

R6Class object inheriting from PipeOpImpute/PipeOp.

Construction

PipeOpImputeOOR$new(id = "imputeoor", param_vals = list())
  • id :: character(1)
    Identifier of resulting object, default "imputeoor".

  • param_vals :: named list
    List of hyperparameter settings, overwriting the hyperparameter settings that would otherwise be set during construction. Default list().

Input and Output Channels

Input and output channels are inherited from PipeOpImpute.

The output is the input Task with all affected features having missing values imputed as described above.

State

The $state is a named list with the $state elements inherited from PipeOpImpute.

The $state$model contains either ".MISSING" used for character and factor (also ordered) features or numeric(1) indicating the constant value used for imputation of integer and numeric features.

Parameters

The parameters are the parameters inherited from PipeOpImpute, as well as:

  • min :: logical(1)
    Should integer and numeric features be shifted below the minimum? Initialized to TRUE. If FALSE they are shifted above the maximum. See also the description above.

  • offset :: numeric(1)
    Numerical non-negative offset as used in the description above for integer and numeric features. Initialized to 1.

  • multiplier :: numeric(1)
    Numerical non-negative multiplier as used in the description above for integer and numeric features. Initialized to 1.

Internals

Adds an explicit new level() to factor and ordered features, but not to character features. For integer and numeric features uses the min, max, diff and range functions. integer and numeric features that are entirely NA are imputed as 0.

Methods

Only methods inherited from PipeOpImpute/PipeOp.

References

Ding Y, Simonoff JS (2010). “An Investigation of Missing Data Methods for Classification Trees Applied to Binary Response Data.” Journal of Machine Learning Research, 11(6), 131-170. https://jmlr.org/papers/v11/ding10a.html.

See also

https://mlr3book.mlr-org.com/list-pipeops.html

Other PipeOps: PipeOpEnsemble, PipeOpImpute, PipeOpTargetTrafo, PipeOpTaskPreprocSimple, PipeOpTaskPreproc, PipeOp, mlr_pipeops_boxcox, mlr_pipeops_branch, mlr_pipeops_chunk, mlr_pipeops_classbalancing, mlr_pipeops_classifavg, mlr_pipeops_classweights, mlr_pipeops_colapply, mlr_pipeops_collapsefactors, mlr_pipeops_colroles, mlr_pipeops_copy, mlr_pipeops_datefeatures, mlr_pipeops_encodeimpact, mlr_pipeops_encodelmer, mlr_pipeops_encode, mlr_pipeops_featureunion, mlr_pipeops_filter, mlr_pipeops_fixfactors, mlr_pipeops_histbin, mlr_pipeops_ica, mlr_pipeops_imputeconstant, mlr_pipeops_imputehist, mlr_pipeops_imputelearner, mlr_pipeops_imputemean, mlr_pipeops_imputemedian, mlr_pipeops_imputemode, mlr_pipeops_imputesample, mlr_pipeops_kernelpca, mlr_pipeops_learner, mlr_pipeops_missind, mlr_pipeops_modelmatrix, mlr_pipeops_multiplicityexply, mlr_pipeops_multiplicityimply, mlr_pipeops_mutate, mlr_pipeops_nmf, mlr_pipeops_nop, mlr_pipeops_ovrsplit, mlr_pipeops_ovrunite, mlr_pipeops_pca, mlr_pipeops_proxy, mlr_pipeops_quantilebin, mlr_pipeops_randomprojection, mlr_pipeops_randomresponse, mlr_pipeops_regravg, mlr_pipeops_removeconstants, mlr_pipeops_renamecolumns, mlr_pipeops_replicate, mlr_pipeops_scalemaxabs, mlr_pipeops_scalerange, mlr_pipeops_scale, mlr_pipeops_select, mlr_pipeops_smote, mlr_pipeops_spatialsign, mlr_pipeops_subsample, mlr_pipeops_targetinvert, mlr_pipeops_targetmutate, mlr_pipeops_targettrafoscalerange, mlr_pipeops_textvectorizer, mlr_pipeops_threshold, mlr_pipeops_tunethreshold, mlr_pipeops_unbranch, mlr_pipeops_updatetarget, mlr_pipeops_vtreat, mlr_pipeops_yeojohnson, mlr_pipeops

Other Imputation PipeOps: PipeOpImpute, mlr_pipeops_imputeconstant, mlr_pipeops_imputehist, mlr_pipeops_imputelearner, mlr_pipeops_imputemean, mlr_pipeops_imputemedian, mlr_pipeops_imputemode, mlr_pipeops_imputesample

Examples

library("mlr3")
set.seed(2409)
data = tsk("pima")$data()
data$y = factor(c(NA, sample(letters, size = 766, replace = TRUE), NA))
data$z = ordered(c(NA, sample(1:10, size = 767, replace = TRUE)))
task = TaskClassif$new("task", backend = data, target = "diabetes")
task$missings()
#> diabetes      age  glucose  insulin     mass pedigree pregnant pressure 
#>        0        0        5      374       11        0        0       35 
#>  triceps        y        z 
#>      227        2        1 
po = po("imputeoor")
new_task = po$train(list(task = task))[[1]]
new_task$missings()
#> diabetes      age pedigree pregnant  glucose  insulin     mass pressure 
#>        0        0        0        0        0        0        0        0 
#>  triceps        y        z 
#>        0        0        0 
new_task$data()
#>      diabetes age pedigree pregnant glucose insulin mass pressure triceps
#>   1:      pos  50    0.627        6     148    -819 33.6       72      35
#>   2:      neg  31    0.351        1      85    -819 26.6       66      29
#>   3:      pos  32    0.672        8     183    -819 23.3       64     -86
#>   4:      neg  21    0.167        1      89      94 28.1       66      23
#>   5:      pos  33    2.288        0     137     168 43.1       40      35
#>  ---                                                                     
#> 764:      neg  63    0.171       10     101     180 32.9       76      48
#> 765:      neg  27    0.340        2     122    -819 36.8       70      27
#> 766:      neg  30    0.245        5     121     112 26.2       72      23
#> 767:      pos  47    0.349        1     126    -819 30.1       60     -86
#> 768:      neg  23    0.315        1      93    -819 30.4       70      31
#>             y        z
#>   1: .MISSING .MISSING
#>   2:        l        9
#>   3:        q        6
#>   4:        f        3
#>   5:        l        3
#>  ---                  
#> 764:        o        7
#> 765:        n        5
#> 766:        e        6
#> 767:        c        8
#> 768: .MISSING        9