Impute Features by Fitting a Learner

Impute features by fitting a Learner for each feature. Uses the features indicated by the context_columns parameter as features to train the imputation Learner. Note this parameter is part of the PipeOpImpute base class and explained there.

Additionally, only features supported by the learner can be imputed; i.e. learners of type regr can only impute features of type integer and numeric, while classif can impute features of type factor, ordered and logical.

The Learner used for imputation is trained on all context_columns; if these contain missing values, the Learner typically either needs to be able to handle missing values itself, or needs to do its own imputation (see examples).

Format

R6Class object inheriting from PipeOpImpute/PipeOp.

Construction

PipeOpImputeLearner$new(learner, id = NULL, param_vals = list())

id :: character(1)
Identifier of resulting object, default "impute.", followed by the id of the Learner.
learner :: Learner | character(1) Learner to wrap, or a string identifying a Learner in the mlr3::mlr_learners Dictionary. The Learner usually needs to be able to handle missing values, i.e. have the missings property, unless care is taken that context_columns do not contain missings; see examples.
This argument is always cloned; to access the Learner inside PipeOpImputeLearner by-reference, use $learner.
param_vals :: named list
List of hyperparameter settings, overwriting the hyperparameter settings that would otherwise be set during construction. Default list().

Input and Output Channels

Input and output channels are inherited from PipeOpImpute.

The output is the input Task with missing values from all affected features imputed by the trained model.

State

The $state is a named list with the $state elements inherited from PipeOpImpute.

The $state$models is a named list of models created by the Learner's $.train() function for each column. If a column consists of missing values only during training, the model is 0 or the levels of the feature; these are used for sampling during prediction.

This state is given the class "pipeop_impute_learner_state".

Parameters

The parameters are the parameters inherited from PipeOpImpute, in addition to the parameters of the Learner used for imputation.

Internals

Uses the $train and $predict functions of the provided learner. Features that are entirely NA are imputed as 0 or randomly sampled from available (factor / logical) levels.

The Learner does not necessarily need to handle missing values in cases where context_columns is chosen well (or there is only one column with missing values present).

Fields

Fields inherited from PipeOpTaskPreproc/PipeOp, as well as:

learner :: Learner
Learner that is being wrapped. Read-only.
learner_models :: list of Learner | NULL
Learner that is being wrapped. This list is named by features for which a Learner was fitted, and contains the same Learner, but with different respective models for each feature. If this PipeOp is not trained, this is an empty list. For features that were entirely NA during training, the list contains NULL elements.

Methods

Only methods inherited from PipeOpImpute/PipeOp.

Other PipeOps: PipeOp, PipeOpEncodePL, PipeOpEnsemble, PipeOpImpute, PipeOpTargetTrafo, PipeOpTaskPreproc, PipeOpTaskPreprocSimple, mlr_pipeops, mlr_pipeops_adas, mlr_pipeops_blsmote, mlr_pipeops_boxcox, mlr_pipeops_branch, mlr_pipeops_chunk, mlr_pipeops_classbalancing, mlr_pipeops_classifavg, mlr_pipeops_classweights, mlr_pipeops_colapply, mlr_pipeops_collapsefactors, mlr_pipeops_colroles, mlr_pipeops_copy, mlr_pipeops_datefeatures, mlr_pipeops_decode, mlr_pipeops_encode, mlr_pipeops_encodeimpact, mlr_pipeops_encodelmer, mlr_pipeops_encodeplquantiles, mlr_pipeops_encodepltree, mlr_pipeops_featureunion, mlr_pipeops_filter, mlr_pipeops_fixfactors, mlr_pipeops_histbin, mlr_pipeops_ica, mlr_pipeops_imputeconstant, mlr_pipeops_imputehist, mlr_pipeops_imputemean, mlr_pipeops_imputemedian, mlr_pipeops_imputemode, mlr_pipeops_imputeoor, mlr_pipeops_imputesample, mlr_pipeops_kernelpca, mlr_pipeops_learner, mlr_pipeops_learner_pi_cvplus, mlr_pipeops_learner_quantiles, mlr_pipeops_missind, mlr_pipeops_modelmatrix, mlr_pipeops_multiplicityexply, mlr_pipeops_multiplicityimply, mlr_pipeops_mutate, mlr_pipeops_nearmiss, mlr_pipeops_nmf, mlr_pipeops_nop, mlr_pipeops_ovrsplit, mlr_pipeops_ovrunite, mlr_pipeops_pca, mlr_pipeops_proxy, mlr_pipeops_quantilebin, mlr_pipeops_randomprojection, mlr_pipeops_randomresponse, mlr_pipeops_regravg, mlr_pipeops_removeconstants, mlr_pipeops_renamecolumns, mlr_pipeops_replicate, mlr_pipeops_rowapply, mlr_pipeops_scale, mlr_pipeops_scalemaxabs, mlr_pipeops_scalerange, mlr_pipeops_select, mlr_pipeops_smote, mlr_pipeops_smotenc, mlr_pipeops_spatialsign, mlr_pipeops_subsample, mlr_pipeops_targetinvert, mlr_pipeops_targetmutate, mlr_pipeops_targettrafoscalerange, mlr_pipeops_textvectorizer, mlr_pipeops_threshold, mlr_pipeops_tomek, mlr_pipeops_tunethreshold, mlr_pipeops_unbranch, mlr_pipeops_updatetarget, mlr_pipeops_vtreat, mlr_pipeops_yeojohnson

Other Imputation PipeOps: PipeOpImpute, mlr_pipeops_imputeconstant, mlr_pipeops_imputehist, mlr_pipeops_imputemean, mlr_pipeops_imputemedian, mlr_pipeops_imputemode, mlr_pipeops_imputeoor, mlr_pipeops_imputesample

Examples

library("mlr3")

task = tsk("pima")
task$missings()
#> diabetes      age  glucose  insulin     mass pedigree pregnant pressure 
#>        0        0        5      374       11        0        0       35 
#>  triceps 
#>      227 

po = po("imputelearner", lrn("regr.rpart"))
new_task = po$train(list(task = task))[[1]]
new_task$missings()
#> diabetes      age pedigree pregnant  glucose  insulin     mass pressure 
#>        0        0        0        0        0        0        0        0 
#>  triceps 
#>        0 

# '$state' of the "regr.rpart" Learner, trained to predict the 'mass' column:
po$state$model$mass
#> $model
#> n= 757 
#> 
#> node), split, n, deviance, yval
#>       * denotes terminal node
#> 
#>  1) root 757 36254.3300 32.45746  
#>    2) triceps< 25.5 219  5537.6560 27.93196  
#>      4) triceps< 20.5 144  3140.7800 26.68333 *
#>      5) triceps>=20.5 75  1741.3150 30.32933  
#>       10) pressure< 83 64  1081.6090 29.37813 *
#>       11) pressure>=83 11   264.8855 35.86364 *
#>    3) triceps>=25.5 538 24405.7800 34.29963  
#>      6) triceps< 35.5 380 14414.2500 32.50474  
#>       12) pressure< 74.5 223  6772.1180 31.49013  
#>         24) glucose< 73.5 8    44.1000 24.20000 *
#>         25) glucose>=73.5 215  6287.0300 31.76140  
#>           50) pregnant>=0.5 190  4822.6790 31.28947 *
#>           51) pregnant< 0.5 25  1100.4420 35.34800 *
#>       13) pressure>=74.5 157  7086.5100 33.94586  
#>         26) insulin< 187 122  4736.5000 33.05656 *
#>         27) insulin>=187 35  1917.2070 37.04571 *
#>      7) triceps>=35.5 158  5822.9770 38.61646  
#>       14) pregnant>=1.5 92  2351.3170 37.02174 *
#>       15) pregnant< 1.5 66  2911.5580 40.83939 *
#> 
#> $param_vals
#> $param_vals$xval
#> [1] 0
#> 
#> 
#> $log
#> Empty data.table (0 rows and 3 cols): stage,class,msg
#> 
#> $train_time
#> [1] 0.004
#> 
#> $task_hash
#> [1] "a666d2778d446faf"
#> 
#> $feature_names
#> [1] "age"      "glucose"  "insulin"  "pedigree" "pregnant" "pressure" "triceps" 
#> 
#> $validate
#> NULL
#> 
#> $mlr3_version
#> [1] ‘0.23.0’
#> 
#> $data_prototype
#> Empty data.table (0 rows and 8 cols): .impute_col,age,glucose,insulin,pedigree,pregnant...
#> 
#> $task_prototype
#> Empty data.table (0 rows and 8 cols): .impute_col,age,glucose,insulin,pedigree,pregnant...
#> 
#> $train_task
#> <TaskRegr:imputing> (768 x 8)
#> * Target: .impute_col
#> * Properties: -
#> * Features (7):
#>   - dbl (7): age, glucose, insulin, pedigree, pregnant, pressure,
#>     triceps
#> 
#> attr(,"class")
#> [1] "learner_state" "list"         

library("mlr3learners")
# To use the "regr.lm" Learner, prefix it with its own imputation method!
# The "imputehist" PipeOp is used to train "regr.lm"; predictions of this
# trained Learner are then used to impute the missing values in the Task.
po = po("imputelearner",
  po("imputehist") %>>% lrn("regr.lm")
)

new_task = po$train(list(task = task))[[1]]
new_task$missings()
#> diabetes      age pedigree pregnant  glucose  insulin     mass pressure 
#>        0        0        0        0        0        0        0        0 
#>  triceps 
#>        0