Add missing indicator columns ("dummy columns") to the Task. Drops original features; should probably be used in combination with PipeOpFeatureUnion and imputation PipeOps (see examples).

Format

R6Class object inheriting from PipeOpTaskPreprocSimple/PipeOpTaskPreproc/PipeOp.

Construction

PipeOpMissInd$new(id = "missind", param_vals = list())
  • id :: character(1) Identifier of the resulting object, defaulting to "missind".

  • param_vals :: named list
    List of hyperparameter settings, overwriting the hyperparameter settings that would otherwise be set during construction. Default list().

State

$state is a named list with the $state elements inherited from PipeOpTaskPreproc, as well as:

  • indicand_cols :: character
    Names of columns for which indicator columns are added. If the which parameter is "all", this is just the names of all features, otherwise it is the names of all features that had missing values during training.

Parameters

The parameters are the parameters inherited from the PipeOpTaskPreproc, as well as:

  • which :: character(1)
    Determines for which features the indicator columns are added. Can either be "missing_train" (default), adding indicator columns for each feature that actually has missing values, or "all", adding indicator columns for all features.

  • type :: character(1)
    Determines the type of the newly created columns. Can be one of "numeric", "factor" (default), "logical".

Internals

This PipeOp should cover most cases where "dummy columns" or "missing indicators" are desired. Some edge cases:

  • If imputation for factorial features is performed and only numeric features should gain missing indicators, the affect_columns parameter can be set to selector_type("numeric").

  • If missing indicators should only be added for features that have more than a fraction of x missing values, the PipeOpRemoveConstants can be used with affect_columns = selector_grep("^missing_") and ratio = x.

Fields

Fields inherited from PipeOpTaskPreproc/PipeOp.

Methods

Methods inherited from PipeOpTaskPreproc/PipeOp.

See also

Examples

library("mlr3") task = tsk("pima")$select(c("insulin", "triceps")) sum(complete.cases(task$data()))
#> [1] 394
task$missings()
#> diabetes insulin triceps #> 0 374 227
tail(task$data())
#> diabetes insulin triceps #> 1: neg NA NA #> 2: neg 180 48 #> 3: neg NA 27 #> 4: neg 112 23 #> 5: pos NA NA #> 6: neg NA 31
po = po("missind") new_task = po$train(list(task))[[1]] tail(new_task$data())
#> diabetes missing_insulin missing_triceps #> 1: neg missing missing #> 2: neg present present #> 3: neg missing present #> 4: neg present present #> 5: pos missing missing #> 6: neg missing present
# proper imputation + missing indicators impgraph = list( po("imputesample"), po("missind") ) %>>% po("featureunion") tail(impgraph$train(task)[[1]]$data())
#> diabetes insulin triceps missing_insulin missing_triceps #> 1: neg 100 40 missing missing #> 2: neg 180 48 present present #> 3: neg 280 27 missing present #> 4: neg 112 23 present present #> 5: pos 86 30 missing missing #> 6: neg 115 31 missing present