Add missing indicator columns ("dummy columns") to the Task.
Drops original features; should probably be used in combination with PipeOpFeatureUnion and imputation PipeOps (see examples).
Note the affect_columns is initialized with selector_invert(selector_type(c("factor", "ordered", "character"))), since missing
values in factorial columns are often indicated by out-of-range imputation (PipeOpImputeOOR).
Format
R6Class object inheriting from PipeOpTaskPreprocSimple/PipeOpTaskPreproc/PipeOp.
Construction
- id::- character(1)Identifier of the resulting object, defaulting to- "missind".
- param_vals:: named- list
 List of hyperparameter settings, overwriting the hyperparameter settings that would otherwise be set during construction. Default- list().
State
$state is a named list with the $state elements inherited from PipeOpTaskPreproc, as well as:
- indicand_cols::- character
 Names of columns for which indicator columns are added. If the- whichparameter is- "all", this is just the names of all features, otherwise it is the names of all features that had missing values during training.
Parameters
The parameters are the parameters inherited from the PipeOpTaskPreproc, as well as:
- which::- character(1)
 Determines for which features the indicator columns are added. Can either be- "missing_train"(default), adding indicator columns for each feature that actually has missing values, or- "all", adding indicator columns for all features.
- type::- character(1)
 Determines the type of the newly created columns. Can be one of- "factor"(default),- "integer",- "logical",- "numeric".
Internals
This PipeOp should cover most cases where "dummy columns" or "missing indicators" are desired. Some edge cases:
- If imputation for factorial features is performed and only numeric features should gain missing indicators, the - affect_columnsparameter can be set to- selector_type("numeric").
- If missing indicators should only be added for features that have more than a fraction of - xmissing values, the- PipeOpRemoveConstantscan be used with- affect_columns = selector_grep("^missing_")and- ratio = x.
Fields
Fields inherited from PipeOp.
Methods
Methods inherited from PipeOpTaskPreprocSimple(PipeOpTaskPreproc/PipeOp.
See also
https://mlr-org.com/pipeops.html
Other PipeOps:
PipeOp,
PipeOpEncodePL,
PipeOpEnsemble,
PipeOpImpute,
PipeOpTargetTrafo,
PipeOpTaskPreproc,
PipeOpTaskPreprocSimple,
mlr_pipeops,
mlr_pipeops_adas,
mlr_pipeops_blsmote,
mlr_pipeops_boxcox,
mlr_pipeops_branch,
mlr_pipeops_chunk,
mlr_pipeops_classbalancing,
mlr_pipeops_classifavg,
mlr_pipeops_classweights,
mlr_pipeops_colapply,
mlr_pipeops_collapsefactors,
mlr_pipeops_colroles,
mlr_pipeops_copy,
mlr_pipeops_datefeatures,
mlr_pipeops_decode,
mlr_pipeops_encode,
mlr_pipeops_encodeimpact,
mlr_pipeops_encodelmer,
mlr_pipeops_encodeplquantiles,
mlr_pipeops_encodepltree,
mlr_pipeops_featureunion,
mlr_pipeops_filter,
mlr_pipeops_fixfactors,
mlr_pipeops_histbin,
mlr_pipeops_ica,
mlr_pipeops_imputeconstant,
mlr_pipeops_imputehist,
mlr_pipeops_imputelearner,
mlr_pipeops_imputemean,
mlr_pipeops_imputemedian,
mlr_pipeops_imputemode,
mlr_pipeops_imputeoor,
mlr_pipeops_imputesample,
mlr_pipeops_kernelpca,
mlr_pipeops_learner,
mlr_pipeops_learner_pi_cvplus,
mlr_pipeops_learner_quantiles,
mlr_pipeops_modelmatrix,
mlr_pipeops_multiplicityexply,
mlr_pipeops_multiplicityimply,
mlr_pipeops_mutate,
mlr_pipeops_nearmiss,
mlr_pipeops_nmf,
mlr_pipeops_nop,
mlr_pipeops_ovrsplit,
mlr_pipeops_ovrunite,
mlr_pipeops_pca,
mlr_pipeops_proxy,
mlr_pipeops_quantilebin,
mlr_pipeops_randomprojection,
mlr_pipeops_randomresponse,
mlr_pipeops_regravg,
mlr_pipeops_removeconstants,
mlr_pipeops_renamecolumns,
mlr_pipeops_replicate,
mlr_pipeops_rowapply,
mlr_pipeops_scale,
mlr_pipeops_scalemaxabs,
mlr_pipeops_scalerange,
mlr_pipeops_select,
mlr_pipeops_smote,
mlr_pipeops_smotenc,
mlr_pipeops_spatialsign,
mlr_pipeops_subsample,
mlr_pipeops_targetinvert,
mlr_pipeops_targetmutate,
mlr_pipeops_targettrafoscalerange,
mlr_pipeops_textvectorizer,
mlr_pipeops_threshold,
mlr_pipeops_tomek,
mlr_pipeops_tunethreshold,
mlr_pipeops_unbranch,
mlr_pipeops_updatetarget,
mlr_pipeops_vtreat,
mlr_pipeops_yeojohnson
Examples
library("mlr3")
task = tsk("pima")$select(c("insulin", "triceps"))
sum(complete.cases(task$data()))
#> [1] 394
task$missings()
#> diabetes  insulin  triceps 
#>        0      374      227 
tail(task$data())
#>    diabetes insulin triceps
#>      <fctr>   <num>   <num>
#> 1:      neg      NA      NA
#> 2:      neg     180      48
#> 3:      neg      NA      27
#> 4:      neg     112      23
#> 5:      pos      NA      NA
#> 6:      neg      NA      31
po = po("missind")
new_task = po$train(list(task))[[1]]
tail(new_task$data())
#>    diabetes missing_insulin missing_triceps
#>      <fctr>          <fctr>          <fctr>
#> 1:      neg         missing         missing
#> 2:      neg         present         present
#> 3:      neg         missing         present
#> 4:      neg         present         present
#> 5:      pos         missing         missing
#> 6:      neg         missing         present
# proper imputation + missing indicators
impgraph = list(
  po("imputesample"),
  po("missind")
) %>>% po("featureunion")
tail(impgraph$train(task)[[1]]$data())
#>    diabetes insulin triceps missing_insulin missing_triceps
#>      <fctr>   <num>   <num>          <fctr>          <fctr>
#> 1:      neg     465      39         missing         missing
#> 2:      neg     180      48         present         present
#> 3:      neg      49      27         missing         present
#> 4:      neg     112      23         present         present
#> 5:      pos      75      39         missing         missing
#> 6:      neg      44      31         missing         present
