Impute factorial features by adding a new level `".MISSING"`

.

Impute numerical features by constant values shifted below the minimum or above the maximum by using \(min(x) - offset - multiplier * diff(range(x))\) or \(max(x) + offset + multiplier * diff(range(x))\).

This type of imputation is especially sensible in the context of tree-based methods, see also Ding & Simonoff (2010).

`R6Class`

object inheriting from `PipeOpImpute`

/`PipeOp`

.

PipeOpImputeOOR$new(id = "imputeoor", param_vals = list())

`id`

::`character(1)`

Identifier of resulting object, default`"imputeoor"`

.`param_vals`

:: named`list`

List of hyperparameter settings, overwriting the hyperparameter settings that would otherwise be set during construction. Default`list()`

.

Input and output channels are inherited from `PipeOpImpute`

.

The output is the input `Task`

with all affected features having missing values imputed as described above.

The `$state`

is a named `list`

with the `$state`

elements inherited from `PipeOpImpute`

.

The `$state$model`

contains either `".MISSING"`

used for `character`

and `factor`

(also
`ordered`

) features or `numeric(1)`

indicating the constant value used for imputation of
`integer`

and `numeric`

features.

The parameters are the parameters inherited from `PipeOpImpute`

, as well as:

`min`

::`logical(1)`

Should`integer`

and`numeric`

features be shifted below the minimum? Initialized to TRUE. If FALSE they are shifted above the maximum. See also the description above.`offset`

::`numeric(1)`

Numerical non-negative offset as used in the description above for`integer`

and`numeric`

features. Initialized to 1.`multiplier`

::`numeric(1)`

Numerical non-negative multiplier as used in the description above for`integer`

and`numeric`

features. Initialized to 1.

Adds an explicit new `level()`

to `factor`

and `ordered`

features, but not to `character`

features.
For `integer`

and `numeric`

features uses the `min`

, `max`

, `diff`

and `range`

functions.
`integer`

and `numeric`

features that are entirely `NA`

are imputed as `0`

.

Only methods inherited from `PipeOpImpute`

/`PipeOp`

.

Ding Y, Simonoff JS (2010).
“An Investigation of Missing Data Methods for Classification Trees Applied to Binary Response Data.”
*Journal of Machine Learning Research*, **11**(6), 131-170.
https://jmlr.org/papers/v11/ding10a.html.

Other PipeOps:
`PipeOpEnsemble`

,
`PipeOpImpute`

,
`PipeOpTargetTrafo`

,
`PipeOpTaskPreprocSimple`

,
`PipeOpTaskPreproc`

,
`PipeOp`

,
`mlr_pipeops_boxcox`

,
`mlr_pipeops_branch`

,
`mlr_pipeops_chunk`

,
`mlr_pipeops_classbalancing`

,
`mlr_pipeops_classifavg`

,
`mlr_pipeops_classweights`

,
`mlr_pipeops_colapply`

,
`mlr_pipeops_collapsefactors`

,
`mlr_pipeops_colroles`

,
`mlr_pipeops_copy`

,
`mlr_pipeops_datefeatures`

,
`mlr_pipeops_encodeimpact`

,
`mlr_pipeops_encodelmer`

,
`mlr_pipeops_encode`

,
`mlr_pipeops_featureunion`

,
`mlr_pipeops_filter`

,
`mlr_pipeops_fixfactors`

,
`mlr_pipeops_histbin`

,
`mlr_pipeops_ica`

,
`mlr_pipeops_imputeconstant`

,
`mlr_pipeops_imputehist`

,
`mlr_pipeops_imputelearner`

,
`mlr_pipeops_imputemean`

,
`mlr_pipeops_imputemedian`

,
`mlr_pipeops_imputemode`

,
`mlr_pipeops_imputesample`

,
`mlr_pipeops_kernelpca`

,
`mlr_pipeops_learner`

,
`mlr_pipeops_missind`

,
`mlr_pipeops_modelmatrix`

,
`mlr_pipeops_multiplicityexply`

,
`mlr_pipeops_multiplicityimply`

,
`mlr_pipeops_mutate`

,
`mlr_pipeops_nmf`

,
`mlr_pipeops_nop`

,
`mlr_pipeops_ovrsplit`

,
`mlr_pipeops_ovrunite`

,
`mlr_pipeops_pca`

,
`mlr_pipeops_proxy`

,
`mlr_pipeops_quantilebin`

,
`mlr_pipeops_randomprojection`

,
`mlr_pipeops_randomresponse`

,
`mlr_pipeops_regravg`

,
`mlr_pipeops_removeconstants`

,
`mlr_pipeops_renamecolumns`

,
`mlr_pipeops_replicate`

,
`mlr_pipeops_scalemaxabs`

,
`mlr_pipeops_scalerange`

,
`mlr_pipeops_scale`

,
`mlr_pipeops_select`

,
`mlr_pipeops_smote`

,
`mlr_pipeops_spatialsign`

,
`mlr_pipeops_subsample`

,
`mlr_pipeops_targetinvert`

,
`mlr_pipeops_targetmutate`

,
`mlr_pipeops_targettrafoscalerange`

,
`mlr_pipeops_textvectorizer`

,
`mlr_pipeops_threshold`

,
`mlr_pipeops_tunethreshold`

,
`mlr_pipeops_unbranch`

,
`mlr_pipeops_updatetarget`

,
`mlr_pipeops_vtreat`

,
`mlr_pipeops_yeojohnson`

,
`mlr_pipeops`

Other Imputation PipeOps:
`PipeOpImpute`

,
`mlr_pipeops_imputeconstant`

,
`mlr_pipeops_imputehist`

,
`mlr_pipeops_imputelearner`

,
`mlr_pipeops_imputemean`

,
`mlr_pipeops_imputemedian`

,
`mlr_pipeops_imputemode`

,
`mlr_pipeops_imputesample`

library("mlr3") set.seed(2409) data = tsk("pima")$data() data$y = factor(c(NA, sample(letters, size = 766, replace = TRUE), NA)) data$z = ordered(c(NA, sample(1:10, size = 767, replace = TRUE))) task = TaskClassif$new("task", backend = data, target = "diabetes") task$missings()#> diabetes age glucose insulin mass pedigree pregnant pressure #> 0 0 5 374 11 0 0 35 #> triceps y z #> 227 2 1#> diabetes age pedigree pregnant glucose insulin mass pressure #> 0 0 0 0 0 0 0 0 #> triceps y z #> 0 0 0new_task$data()#> diabetes age pedigree pregnant glucose insulin mass pressure triceps #> 1: pos 50 0.627 6 148 -819 33.6 72 35 #> 2: neg 31 0.351 1 85 -819 26.6 66 29 #> 3: pos 32 0.672 8 183 -819 23.3 64 -86 #> 4: neg 21 0.167 1 89 94 28.1 66 23 #> 5: pos 33 2.288 0 137 168 43.1 40 35 #> --- #> 764: neg 63 0.171 10 101 180 32.9 76 48 #> 765: neg 27 0.340 2 122 -819 36.8 70 27 #> 766: neg 30 0.245 5 121 112 26.2 72 23 #> 767: pos 47 0.349 1 126 -819 30.1 60 -86 #> 768: neg 23 0.315 1 93 -819 30.4 70 31 #> y z #> 1: .MISSING .MISSING #> 2: l 9 #> 3: q 6 #> 4: f 3 #> 5: l 3 #> --- #> 764: o 7 #> 765: n 5 #> 766: e 6 #> 767: c 8 #> 768: .MISSING 9