Impute features by fitting a Learner for each feature. Uses the features indicated by the context_columns parameter as features to train the imputation Learner. Note this parameter is part of the PipeOpImpute base class and explained there.

Additionally, only features supported by the learner can be imputed; i.e. learners of type regr can only impute features of type integer and numeric, while classif can impute features of type factor, ordered and logical.

Format

R6Class object inheriting from PipeOpImpute/PipeOp.

Construction

PipeOpImputeLearner$new(learner, id = NULL, param_vals = list())
  • id :: character(1)
    Identifier of resulting object, default "impute.", followed by the id of the Learner.

  • learner :: Learner | character(1) Learner to wrap, or a string identifying a Learner in the mlr3::mlr_learners Dictionary. The Learner needs to be able to handle missing values, i.e. have the missings property.

  • param_vals :: named list
    List of hyperparameter settings, overwriting the hyperparameter settings that would otherwise be set during construction. Default list().

Input and Output Channels

Input and output channels are inherited from PipeOpImpute.

The output is the input Task with missing values from all affected features imputed by the trained model.

State

The $state is a named list with the $state elements inherited from PipeOpImpute.

The $state$models is a named list of models created by the Learner's $.train() function for each column. If a column consists of missing values only during training, the model is 0 or the levels of the feature; these are used for sampling during prediction.

Parameters

The parameters are the parameters inherited from PipeOpImpute, in addition to the parameters of the Learner used for imputation.

Internals

Uses the $train and $predict functions of the provided learner. Features that are entirely NA are imputed as 0 or randomly sampled from available (factor / logical) levels.

The Learner does not necessarily need to handle missing values in cases where context_columns is chosen well (or there is only one column with missing values present).

Fields

Fields inherited from PipeOpTaskPreproc/PipeOp, as well as:

  • learner :: Learner
    Learner that is being wrapped. Read-only.

  • learner_models :: list of Learner | NULL
    Learner that is being wrapped. This list is named by features for which a Learner was fitted, and contains the same Learner, but with different respective models for each feature. If this PipeOp is not trained, this is an empty list. For features that were entirely NA during training, the list contains NULL elements.

Methods

Only methods inherited from PipeOpImpute/PipeOp.

See also

Other PipeOps: PipeOpEnsemble, PipeOpImpute, PipeOpTargetTrafo, PipeOpTaskPreprocSimple, PipeOpTaskPreproc, PipeOp, mlr_pipeops_boxcox, mlr_pipeops_branch, mlr_pipeops_chunk, mlr_pipeops_classbalancing, mlr_pipeops_classifavg, mlr_pipeops_classweights, mlr_pipeops_colapply, mlr_pipeops_collapsefactors, mlr_pipeops_colroles, mlr_pipeops_copy, mlr_pipeops_datefeatures, mlr_pipeops_encodeimpact, mlr_pipeops_encodelmer, mlr_pipeops_encode, mlr_pipeops_featureunion, mlr_pipeops_filter, mlr_pipeops_fixfactors, mlr_pipeops_histbin, mlr_pipeops_ica, mlr_pipeops_imputeconstant, mlr_pipeops_imputehist, mlr_pipeops_imputemean, mlr_pipeops_imputemedian, mlr_pipeops_imputemode, mlr_pipeops_imputeoor, mlr_pipeops_imputesample, mlr_pipeops_kernelpca, mlr_pipeops_learner, mlr_pipeops_missind, mlr_pipeops_modelmatrix, mlr_pipeops_multiplicityexply, mlr_pipeops_multiplicityimply, mlr_pipeops_mutate, mlr_pipeops_nmf, mlr_pipeops_nop, mlr_pipeops_ovrsplit, mlr_pipeops_ovrunite, mlr_pipeops_pca, mlr_pipeops_proxy, mlr_pipeops_quantilebin, mlr_pipeops_randomprojection, mlr_pipeops_randomresponse, mlr_pipeops_regravg, mlr_pipeops_removeconstants, mlr_pipeops_renamecolumns, mlr_pipeops_replicate, mlr_pipeops_scalemaxabs, mlr_pipeops_scalerange, mlr_pipeops_scale, mlr_pipeops_select, mlr_pipeops_smote, mlr_pipeops_spatialsign, mlr_pipeops_subsample, mlr_pipeops_targetinvert, mlr_pipeops_targetmutate, mlr_pipeops_targettrafoscalerange, mlr_pipeops_textvectorizer, mlr_pipeops_threshold, mlr_pipeops_tunethreshold, mlr_pipeops_unbranch, mlr_pipeops_updatetarget, mlr_pipeops_vtreat, mlr_pipeops_yeojohnson, mlr_pipeops

Other Imputation PipeOps: PipeOpImpute, mlr_pipeops_imputeconstant, mlr_pipeops_imputehist, mlr_pipeops_imputemean, mlr_pipeops_imputemedian, mlr_pipeops_imputemode, mlr_pipeops_imputeoor, mlr_pipeops_imputesample

Examples

library("mlr3") task = tsk("pima") task$missings()
#> diabetes age glucose insulin mass pedigree pregnant pressure #> 0 0 5 374 11 0 0 35 #> triceps #> 227
po = po("imputelearner", lrn("regr.rpart")) new_task = po$train(list(task = task))[[1]] new_task$missings()
#> diabetes age pedigree pregnant glucose insulin mass pressure #> 0 0 0 0 0 0 0 0 #> triceps #> 0
po$state$model
#> $age #> $age$model #> n= 768 #> #> node), split, n, deviance, yval #> * denotes terminal node #> #> 1) root 768 106078.400 33.24089 #> 2) pregnant< 4.5 492 43078.630 28.38618 #> 4) pressure< 71 279 16080.330 26.33692 #> 8) glucose< 180.5 268 12548.100 25.82836 * #> 9) glucose>=180.5 11 1774.182 38.72727 * #> 5) pressure>=71 213 24291.940 31.07042 #> 10) pregnant< 3.5 179 16978.990 29.99441 * #> 11) pregnant>=3.5 34 6014.618 36.73529 * #> 3) pregnant>=4.5 276 30733.950 41.89493 #> 6) glucose< 124.5 143 14584.150 39.07692 #> 12) pregnant< 7.5 81 6781.136 35.61728 * #> 13) pregnant>=7.5 62 5566.919 43.59677 * #> 7) glucose>=124.5 133 13793.250 44.92481 #> 14) mass>=26.9 115 9558.730 43.48696 * #> 15) mass< 26.9 18 2477.778 54.11111 * #> #> $age$log #> Empty data.table (0 rows and 3 cols): stage,class,msg #> #> $age$train_time #> [1] 0.006 #> #> $age$train_task #> <TaskRegr:imputing> (0 x 8) #> * Target: .impute_col #> * Properties: - #> * Features (7): #> - dbl (7): glucose, insulin, mass, pedigree, pregnant, pressure, #> triceps #> #> #> $glucose #> $glucose$model #> n= 763 #> #> node), split, n, deviance, yval #> * denotes terminal node #> #> 1) root 763 710508.10 121.6868 #> 2) insulin< 119.5 341 216223.70 108.0293 #> 4) age< 34.5 250 103114.90 102.8200 * #> 5) age>=34.5 91 87686.44 122.3407 * #> 3) insulin>=119.5 422 379282.60 132.7227 #> 6) insulin< 222.5 338 277176.00 127.9882 #> 12) pedigree< 0.296 122 91512.30 120.4590 * #> 13) pedigree>=0.296 216 174841.50 132.2407 #> 26) pressure< 81 161 128716.40 128.7205 * #> 27) pressure>=81 55 38289.64 142.5455 * #> 7) insulin>=222.5 84 64042.70 151.7738 * #> #> $glucose$log #> Empty data.table (0 rows and 3 cols): stage,class,msg #> #> $glucose$train_time #> [1] 0.006 #> #> $glucose$train_task #> <TaskRegr:imputing> (0 x 8) #> * Target: .impute_col #> * Properties: - #> * Features (7): #> - dbl (7): age, insulin, mass, pedigree, pregnant, pressure, triceps #> #> #> $insulin #> $insulin$model #> n= 394 #> #> node), split, n, deviance, yval #> * denotes terminal node #> #> 1) root 394 5544328.00 155.54820 #> 2) glucose< 121.5 209 646628.00 99.46411 #> 4) glucose< 99.5 103 168042.10 73.80583 * #> 5) glucose>=99.5 106 344885.40 124.39620 * #> 3) glucose>=121.5 185 3497627.00 218.90810 #> 6) glucose< 180.5 163 2415965.00 203.23310 #> 12) mass< 38.05 123 1296233.00 183.49590 * #> 13) mass>=38.05 40 924476.80 263.92500 #> 26) glucose< 138 13 58892.31 194.76920 * #> 27) glucose>=138 27 773476.70 297.22220 #> 54) mass>=42.5 11 297613.60 212.18180 * #> 55) mass< 42.5 16 341621.40 355.68750 * #> 7) glucose>=180.5 22 744879.00 335.04550 #> 14) triceps< 35.5 14 426141.40 282.57140 * #> 15) triceps>=35.5 8 212726.90 426.87500 * #> #> $insulin$log #> Empty data.table (0 rows and 3 cols): stage,class,msg #> #> $insulin$train_time #> [1] 0.005 #> #> $insulin$train_task #> <TaskRegr:imputing> (0 x 8) #> * Target: .impute_col #> * Properties: - #> * Features (7): #> - dbl (7): age, glucose, mass, pedigree, pregnant, pressure, triceps #> #> #> $mass #> $mass$model #> n= 757 #> #> node), split, n, deviance, yval #> * denotes terminal node #> #> 1) root 757 36254.3300 32.45746 #> 2) triceps< 25.5 219 5537.6560 27.93196 #> 4) triceps< 20.5 144 3140.7800 26.68333 * #> 5) triceps>=20.5 75 1741.3150 30.32933 #> 10) pressure< 83 64 1081.6090 29.37813 * #> 11) pressure>=83 11 264.8855 35.86364 * #> 3) triceps>=25.5 538 24405.7800 34.29963 #> 6) triceps< 35.5 380 14414.2500 32.50474 #> 12) pressure< 74.5 223 6772.1180 31.49013 #> 24) glucose< 73.5 8 44.1000 24.20000 * #> 25) glucose>=73.5 215 6287.0300 31.76140 #> 50) pregnant>=0.5 190 4822.6790 31.28947 * #> 51) pregnant< 0.5 25 1100.4420 35.34800 * #> 13) pressure>=74.5 157 7086.5100 33.94586 #> 26) insulin< 187 122 4736.5000 33.05656 * #> 27) insulin>=187 35 1917.2070 37.04571 * #> 7) triceps>=35.5 158 5822.9770 38.61646 #> 14) pregnant>=1.5 92 2351.3170 37.02174 * #> 15) pregnant< 1.5 66 2911.5580 40.83939 * #> #> $mass$log #> Empty data.table (0 rows and 3 cols): stage,class,msg #> #> $mass$train_time #> [1] 0.005 #> #> $mass$train_task #> <TaskRegr:imputing> (0 x 8) #> * Target: .impute_col #> * Properties: - #> * Features (7): #> - dbl (7): age, glucose, insulin, pedigree, pregnant, pressure, #> triceps #> #> #> $pedigree #> $pedigree$model #> n= 768 #> #> node), split, n, deviance, yval #> * denotes terminal node #> #> 1) root 768 84.2002200 0.4718763 #> 2) mass< 39.95 670 62.5082100 0.4498313 #> 4) insulin< 149 440 33.4978700 0.4156341 * #> 5) insulin>=149 230 27.5114100 0.5152522 #> 10) triceps< 46.5 222 23.0603100 0.4986712 #> 20) glucose< 194.5 215 18.9026700 0.4872930 #> 40) insulin>=157 206 16.8518900 0.4721699 * #> 41) insulin< 157 9 0.9252762 0.8334444 * #> 21) glucose>=194.5 7 3.2748970 0.8481429 * #> 11) triceps>=46.5 8 2.6963540 0.9753750 * #> 3) mass>=39.95 98 19.1403100 0.6225918 #> 6) glucose< 172.5 85 11.3114300 0.5637647 * #> 7) glucose>=172.5 13 5.6114200 1.0072310 * #> #> $pedigree$log #> Empty data.table (0 rows and 3 cols): stage,class,msg #> #> $pedigree$train_time #> [1] 0.006 #> #> $pedigree$train_task #> <TaskRegr:imputing> (0 x 8) #> * Target: .impute_col #> * Properties: - #> * Features (7): #> - dbl (7): age, glucose, insulin, mass, pregnant, pressure, triceps #> #> #> $pregnant #> $pregnant$model #> n= 768 #> #> node), split, n, deviance, yval #> * denotes terminal node #> #> 1) root 768 8708.5610 3.845052 #> 2) age< 31.5 441 1602.7760 2.108844 #> 4) age< 26.5 300 626.5700 1.590000 * #> 5) age>=26.5 141 723.6170 3.212766 * #> 3) age>=31.5 327 3983.6210 6.186544 #> 6) age< 37.5 92 758.7283 4.945652 * #> 7) age>=37.5 235 3027.7700 6.672340 #> 14) age>=58.5 35 305.6000 4.800000 * #> 15) age< 58.5 200 2578.0000 7.000000 * #> #> $pregnant$log #> Empty data.table (0 rows and 3 cols): stage,class,msg #> #> $pregnant$train_time #> [1] 0.007 #> #> $pregnant$train_task #> <TaskRegr:imputing> (0 x 8) #> * Target: .impute_col #> * Properties: - #> * Features (7): #> - dbl (7): age, glucose, insulin, mass, pedigree, pressure, triceps #> #> #> $pressure #> $pressure$model #> n= 733 #> #> node), split, n, deviance, yval #> * denotes terminal node #> #> 1) root 733 112228.700 72.40518 #> 2) age< 26.5 285 40349.940 67.21404 #> 4) mass< 42.35 258 31003.860 66.02326 #> 8) pregnant>=0.5 201 23452.640 64.87065 * #> 9) pregnant< 0.5 57 6342.561 70.08772 * #> 5) mass>=42.35 27 5484.519 78.59259 #> 10) insulin< 117.5 8 1928.000 67.00000 * #> 11) insulin>=117.5 19 2028.737 83.47368 * #> 3) age>=26.5 448 59312.690 75.70759 #> 6) mass< 34.05 265 31517.600 73.40000 #> 12) age< 42.5 174 20436.530 71.42529 * #> 13) age>=42.5 91 9105.187 77.17582 * #> 7) mass>=34.05 183 24340.560 79.04918 #> 14) age< 34.5 71 9296.310 75.09859 #> 28) triceps< 38.5 33 4378.182 70.54545 * #> 29) triceps>=38.5 38 3639.895 79.05263 * #> 15) age>=34.5 112 13233.680 81.55357 * #> #> $pressure$log #> Empty data.table (0 rows and 3 cols): stage,class,msg #> #> $pressure$train_time #> [1] 0.006 #> #> $pressure$train_task #> <TaskRegr:imputing> (0 x 8) #> * Target: .impute_col #> * Properties: - #> * Features (7): #> - dbl (7): age, glucose, insulin, mass, pedigree, pregnant, triceps #> #> #> $triceps #> $triceps$model #> n= 541 #> #> node), split, n, deviance, yval #> * denotes terminal node #> #> 1) root 541 59274.2700 29.15342 #> 2) mass< 32.75 267 15721.6700 22.87266 #> 4) mass< 27.55 123 5480.2600 19.64228 #> 8) mass< 23.85 39 878.9744 15.97436 * #> 9) mass>=23.85 84 3832.9880 21.34524 #> 18) age< 36.5 70 2536.0000 20.00000 * #> 19) age>=36.5 14 536.9286 28.07143 * #> 5) mass>=27.55 144 7861.4930 25.63194 #> 10) pregnant< 4.5 101 5228.2380 24.27723 * #> 11) pregnant>=4.5 43 2012.5120 28.81395 * #> 3) mass>=32.75 274 22756.4700 35.27372 #> 6) mass< 36.95 140 12899.2200 32.33571 #> 12) age< 53 133 8114.1050 31.78947 * #> 13) age>=53 7 3991.4290 42.71429 * #> 7) mass>=36.95 134 7386.2090 38.34328 #> 14) mass< 46.75 121 5587.3390 37.58678 * #> 15) mass>=46.75 13 1085.0770 45.38462 * #> #> $triceps$log #> Empty data.table (0 rows and 3 cols): stage,class,msg #> #> $triceps$train_time #> [1] 0.007 #> #> $triceps$train_task #> <TaskRegr:imputing> (0 x 8) #> * Target: .impute_col #> * Properties: - #> * Features (7): #> - dbl (7): age, glucose, insulin, mass, pedigree, pregnant, pressure #> #>