Impute features by fitting a Learner
for each feature.
Uses the features indicated by the context_columns
parameter as features to train the imputation Learner
.
Note this parameter is part of the PipeOpImpute
base class and explained there.
Additionally, only features supported by the learner can be imputed; i.e. learners of type
regr
can only impute features of type integer
and numeric
, while classif
can impute
features of type factor
, ordered
and logical
.
R6Class
object inheriting from PipeOpImpute
/PipeOp
.
PipeOpImputeLearner$new(learner, id = NULL, param_vals = list())
id
:: character(1)
Identifier of resulting object, default "impute."
, followed by the id
of the Learner
.
learner
:: Learner
| character(1)
Learner
to wrap, or a string identifying a Learner
in the mlr3::mlr_learners
Dictionary
.
The Learner
needs to be able to handle missing values, i.e. have the missings
property.
param_vals
:: named list
List of hyperparameter settings, overwriting the hyperparameter settings that would otherwise be set during construction. Default list()
.
Input and output channels are inherited from PipeOpImpute
.
The output is the input Task
with missing values from all affected features imputed by the trained model.
The $state
is a named list
with the $state
elements inherited from PipeOpImpute
.
The $state$models
is a named list
of models
created by the Learner
's $.train()
function
for each column. If a column consists of missing values only during training, the model
is 0
or the levels of the
feature; these are used for sampling during prediction.
The parameters are the parameters inherited from PipeOpImpute
, in addition to the parameters of the Learner
used for imputation.
Uses the $train
and $predict
functions of the provided learner. Features that are entirely NA
are imputed as 0
or randomly sampled from available (factor
/ logical
) levels.
The Learner
does not necessarily need to handle missing values in cases
where context_columns
is chosen well (or there is only one column with missing values present).
Fields inherited from PipeOpTaskPreproc
/PipeOp
, as well as:
learner
:: Learner
Learner
that is being wrapped. Read-only.
learner_models
:: list
of Learner
| NULL
Learner
that is being wrapped. This list is named by features for which a Learner
was fitted, and
contains the same Learner
, but with different respective models for each feature. If this PipeOp
is not trained,
this is an empty list
. For features that were entirely NA
during training, the list
contains NULL
elements.
Only methods inherited from PipeOpImpute
/PipeOp
.
Other PipeOps:
PipeOpEnsemble
,
PipeOpImpute
,
PipeOpTargetTrafo
,
PipeOpTaskPreprocSimple
,
PipeOpTaskPreproc
,
PipeOp
,
mlr_pipeops_boxcox
,
mlr_pipeops_branch
,
mlr_pipeops_chunk
,
mlr_pipeops_classbalancing
,
mlr_pipeops_classifavg
,
mlr_pipeops_classweights
,
mlr_pipeops_colapply
,
mlr_pipeops_collapsefactors
,
mlr_pipeops_colroles
,
mlr_pipeops_copy
,
mlr_pipeops_datefeatures
,
mlr_pipeops_encodeimpact
,
mlr_pipeops_encodelmer
,
mlr_pipeops_encode
,
mlr_pipeops_featureunion
,
mlr_pipeops_filter
,
mlr_pipeops_fixfactors
,
mlr_pipeops_histbin
,
mlr_pipeops_ica
,
mlr_pipeops_imputeconstant
,
mlr_pipeops_imputehist
,
mlr_pipeops_imputemean
,
mlr_pipeops_imputemedian
,
mlr_pipeops_imputemode
,
mlr_pipeops_imputeoor
,
mlr_pipeops_imputesample
,
mlr_pipeops_kernelpca
,
mlr_pipeops_learner
,
mlr_pipeops_missind
,
mlr_pipeops_modelmatrix
,
mlr_pipeops_multiplicityexply
,
mlr_pipeops_multiplicityimply
,
mlr_pipeops_mutate
,
mlr_pipeops_nmf
,
mlr_pipeops_nop
,
mlr_pipeops_ovrsplit
,
mlr_pipeops_ovrunite
,
mlr_pipeops_pca
,
mlr_pipeops_proxy
,
mlr_pipeops_quantilebin
,
mlr_pipeops_randomprojection
,
mlr_pipeops_randomresponse
,
mlr_pipeops_regravg
,
mlr_pipeops_removeconstants
,
mlr_pipeops_renamecolumns
,
mlr_pipeops_replicate
,
mlr_pipeops_scalemaxabs
,
mlr_pipeops_scalerange
,
mlr_pipeops_scale
,
mlr_pipeops_select
,
mlr_pipeops_smote
,
mlr_pipeops_spatialsign
,
mlr_pipeops_subsample
,
mlr_pipeops_targetinvert
,
mlr_pipeops_targetmutate
,
mlr_pipeops_targettrafoscalerange
,
mlr_pipeops_textvectorizer
,
mlr_pipeops_threshold
,
mlr_pipeops_tunethreshold
,
mlr_pipeops_unbranch
,
mlr_pipeops_updatetarget
,
mlr_pipeops_vtreat
,
mlr_pipeops_yeojohnson
,
mlr_pipeops
Other Imputation PipeOps:
PipeOpImpute
,
mlr_pipeops_imputeconstant
,
mlr_pipeops_imputehist
,
mlr_pipeops_imputemean
,
mlr_pipeops_imputemedian
,
mlr_pipeops_imputemode
,
mlr_pipeops_imputeoor
,
mlr_pipeops_imputesample
#> diabetes age glucose insulin mass pedigree pregnant pressure #> 0 0 5 374 11 0 0 35 #> triceps #> 227po = po("imputelearner", lrn("regr.rpart")) new_task = po$train(list(task = task))[[1]] new_task$missings()#> diabetes age pedigree pregnant glucose insulin mass pressure #> 0 0 0 0 0 0 0 0 #> triceps #> 0po$state$model#> $age #> $age$model #> n= 768 #> #> node), split, n, deviance, yval #> * denotes terminal node #> #> 1) root 768 106078.400 33.24089 #> 2) pregnant< 4.5 492 43078.630 28.38618 #> 4) pressure< 71 279 16080.330 26.33692 #> 8) glucose< 180.5 268 12548.100 25.82836 * #> 9) glucose>=180.5 11 1774.182 38.72727 * #> 5) pressure>=71 213 24291.940 31.07042 #> 10) pregnant< 3.5 179 16978.990 29.99441 * #> 11) pregnant>=3.5 34 6014.618 36.73529 * #> 3) pregnant>=4.5 276 30733.950 41.89493 #> 6) glucose< 124.5 143 14584.150 39.07692 #> 12) pregnant< 7.5 81 6781.136 35.61728 * #> 13) pregnant>=7.5 62 5566.919 43.59677 * #> 7) glucose>=124.5 133 13793.250 44.92481 #> 14) mass>=26.9 115 9558.730 43.48696 * #> 15) mass< 26.9 18 2477.778 54.11111 * #> #> $age$log #> Empty data.table (0 rows and 3 cols): stage,class,msg #> #> $age$train_time #> [1] 0.007 #> #> $age$train_task #> <TaskRegr:imputing> (0 x 8) #> * Target: .impute_col #> * Properties: - #> * Features (7): #> - dbl (7): glucose, insulin, mass, pedigree, pregnant, pressure, #> triceps #> #> #> $glucose #> $glucose$model #> n= 763 #> #> node), split, n, deviance, yval #> * denotes terminal node #> #> 1) root 763 710508.10 121.6868 #> 2) insulin< 119.5 341 216223.70 108.0293 #> 4) age< 34.5 250 103114.90 102.8200 * #> 5) age>=34.5 91 87686.44 122.3407 * #> 3) insulin>=119.5 422 379282.60 132.7227 #> 6) insulin< 222.5 338 277176.00 127.9882 #> 12) pedigree< 0.296 122 91512.30 120.4590 * #> 13) pedigree>=0.296 216 174841.50 132.2407 #> 26) pressure< 81 161 128716.40 128.7205 * #> 27) pressure>=81 55 38289.64 142.5455 * #> 7) insulin>=222.5 84 64042.70 151.7738 * #> #> $glucose$log #> Empty data.table (0 rows and 3 cols): stage,class,msg #> #> $glucose$train_time #> [1] 0.006 #> #> $glucose$train_task #> <TaskRegr:imputing> (0 x 8) #> * Target: .impute_col #> * Properties: - #> * Features (7): #> - dbl (7): age, insulin, mass, pedigree, pregnant, pressure, triceps #> #> #> $insulin #> $insulin$model #> n= 394 #> #> node), split, n, deviance, yval #> * denotes terminal node #> #> 1) root 394 5544328.00 155.54820 #> 2) glucose< 121.5 209 646628.00 99.46411 #> 4) glucose< 99.5 103 168042.10 73.80583 * #> 5) glucose>=99.5 106 344885.40 124.39620 * #> 3) glucose>=121.5 185 3497627.00 218.90810 #> 6) glucose< 180.5 163 2415965.00 203.23310 #> 12) mass< 38.05 123 1296233.00 183.49590 * #> 13) mass>=38.05 40 924476.80 263.92500 #> 26) glucose< 138 13 58892.31 194.76920 * #> 27) glucose>=138 27 773476.70 297.22220 #> 54) mass>=42.5 11 297613.60 212.18180 * #> 55) mass< 42.5 16 341621.40 355.68750 * #> 7) glucose>=180.5 22 744879.00 335.04550 #> 14) triceps< 35.5 14 426141.40 282.57140 * #> 15) triceps>=35.5 8 212726.90 426.87500 * #> #> $insulin$log #> Empty data.table (0 rows and 3 cols): stage,class,msg #> #> $insulin$train_time #> [1] 0.005 #> #> $insulin$train_task #> <TaskRegr:imputing> (0 x 8) #> * Target: .impute_col #> * Properties: - #> * Features (7): #> - dbl (7): age, glucose, mass, pedigree, pregnant, pressure, triceps #> #> #> $mass #> $mass$model #> n= 757 #> #> node), split, n, deviance, yval #> * denotes terminal node #> #> 1) root 757 36254.3300 32.45746 #> 2) triceps< 25.5 219 5537.6560 27.93196 #> 4) triceps< 20.5 144 3140.7800 26.68333 * #> 5) triceps>=20.5 75 1741.3150 30.32933 #> 10) pressure< 83 64 1081.6090 29.37813 * #> 11) pressure>=83 11 264.8855 35.86364 * #> 3) triceps>=25.5 538 24405.7800 34.29963 #> 6) triceps< 35.5 380 14414.2500 32.50474 #> 12) pressure< 74.5 223 6772.1180 31.49013 #> 24) glucose< 73.5 8 44.1000 24.20000 * #> 25) glucose>=73.5 215 6287.0300 31.76140 #> 50) pregnant>=0.5 190 4822.6790 31.28947 * #> 51) pregnant< 0.5 25 1100.4420 35.34800 * #> 13) pressure>=74.5 157 7086.5100 33.94586 #> 26) insulin< 187 122 4736.5000 33.05656 * #> 27) insulin>=187 35 1917.2070 37.04571 * #> 7) triceps>=35.5 158 5822.9770 38.61646 #> 14) pregnant>=1.5 92 2351.3170 37.02174 * #> 15) pregnant< 1.5 66 2911.5580 40.83939 * #> #> $mass$log #> Empty data.table (0 rows and 3 cols): stage,class,msg #> #> $mass$train_time #> [1] 0.007 #> #> $mass$train_task #> <TaskRegr:imputing> (0 x 8) #> * Target: .impute_col #> * Properties: - #> * Features (7): #> - dbl (7): age, glucose, insulin, pedigree, pregnant, pressure, #> triceps #> #> #> $pedigree #> $pedigree$model #> n= 768 #> #> node), split, n, deviance, yval #> * denotes terminal node #> #> 1) root 768 84.2002200 0.4718763 #> 2) mass< 39.95 670 62.5082100 0.4498313 #> 4) insulin< 149 440 33.4978700 0.4156341 * #> 5) insulin>=149 230 27.5114100 0.5152522 #> 10) triceps< 46.5 222 23.0603100 0.4986712 #> 20) glucose< 194.5 215 18.9026700 0.4872930 #> 40) insulin>=157 206 16.8518900 0.4721699 * #> 41) insulin< 157 9 0.9252762 0.8334444 * #> 21) glucose>=194.5 7 3.2748970 0.8481429 * #> 11) triceps>=46.5 8 2.6963540 0.9753750 * #> 3) mass>=39.95 98 19.1403100 0.6225918 #> 6) glucose< 172.5 85 11.3114300 0.5637647 * #> 7) glucose>=172.5 13 5.6114200 1.0072310 * #> #> $pedigree$log #> Empty data.table (0 rows and 3 cols): stage,class,msg #> #> $pedigree$train_time #> [1] 0.007 #> #> $pedigree$train_task #> <TaskRegr:imputing> (0 x 8) #> * Target: .impute_col #> * Properties: - #> * Features (7): #> - dbl (7): age, glucose, insulin, mass, pregnant, pressure, triceps #> #> #> $pregnant #> $pregnant$model #> n= 768 #> #> node), split, n, deviance, yval #> * denotes terminal node #> #> 1) root 768 8708.5610 3.845052 #> 2) age< 31.5 441 1602.7760 2.108844 #> 4) age< 26.5 300 626.5700 1.590000 * #> 5) age>=26.5 141 723.6170 3.212766 * #> 3) age>=31.5 327 3983.6210 6.186544 #> 6) age< 37.5 92 758.7283 4.945652 * #> 7) age>=37.5 235 3027.7700 6.672340 #> 14) age>=58.5 35 305.6000 4.800000 * #> 15) age< 58.5 200 2578.0000 7.000000 * #> #> $pregnant$log #> Empty data.table (0 rows and 3 cols): stage,class,msg #> #> $pregnant$train_time #> [1] 0.006 #> #> $pregnant$train_task #> <TaskRegr:imputing> (0 x 8) #> * Target: .impute_col #> * Properties: - #> * Features (7): #> - dbl (7): age, glucose, insulin, mass, pedigree, pressure, triceps #> #> #> $pressure #> $pressure$model #> n= 733 #> #> node), split, n, deviance, yval #> * denotes terminal node #> #> 1) root 733 112228.700 72.40518 #> 2) age< 26.5 285 40349.940 67.21404 #> 4) mass< 42.35 258 31003.860 66.02326 #> 8) pregnant>=0.5 201 23452.640 64.87065 * #> 9) pregnant< 0.5 57 6342.561 70.08772 * #> 5) mass>=42.35 27 5484.519 78.59259 #> 10) insulin< 117.5 8 1928.000 67.00000 * #> 11) insulin>=117.5 19 2028.737 83.47368 * #> 3) age>=26.5 448 59312.690 75.70759 #> 6) mass< 34.05 265 31517.600 73.40000 #> 12) age< 42.5 174 20436.530 71.42529 * #> 13) age>=42.5 91 9105.187 77.17582 * #> 7) mass>=34.05 183 24340.560 79.04918 #> 14) age< 34.5 71 9296.310 75.09859 #> 28) triceps< 38.5 33 4378.182 70.54545 * #> 29) triceps>=38.5 38 3639.895 79.05263 * #> 15) age>=34.5 112 13233.680 81.55357 * #> #> $pressure$log #> Empty data.table (0 rows and 3 cols): stage,class,msg #> #> $pressure$train_time #> [1] 0.007 #> #> $pressure$train_task #> <TaskRegr:imputing> (0 x 8) #> * Target: .impute_col #> * Properties: - #> * Features (7): #> - dbl (7): age, glucose, insulin, mass, pedigree, pregnant, triceps #> #> #> $triceps #> $triceps$model #> n= 541 #> #> node), split, n, deviance, yval #> * denotes terminal node #> #> 1) root 541 59274.2700 29.15342 #> 2) mass< 32.75 267 15721.6700 22.87266 #> 4) mass< 27.55 123 5480.2600 19.64228 #> 8) mass< 23.85 39 878.9744 15.97436 * #> 9) mass>=23.85 84 3832.9880 21.34524 #> 18) age< 36.5 70 2536.0000 20.00000 * #> 19) age>=36.5 14 536.9286 28.07143 * #> 5) mass>=27.55 144 7861.4930 25.63194 #> 10) pregnant< 4.5 101 5228.2380 24.27723 * #> 11) pregnant>=4.5 43 2012.5120 28.81395 * #> 3) mass>=32.75 274 22756.4700 35.27372 #> 6) mass< 36.95 140 12899.2200 32.33571 #> 12) age< 53 133 8114.1050 31.78947 * #> 13) age>=53 7 3991.4290 42.71429 * #> 7) mass>=36.95 134 7386.2090 38.34328 #> 14) mass< 46.75 121 5587.3390 37.58678 * #> 15) mass>=46.75 13 1085.0770 45.38462 * #> #> $triceps$log #> Empty data.table (0 rows and 3 cols): stage,class,msg #> #> $triceps$train_time #> [1] 0.006 #> #> $triceps$train_task #> <TaskRegr:imputing> (0 x 8) #> * Target: .impute_col #> * Properties: - #> * Features (7): #> - dbl (7): age, glucose, insulin, mass, pedigree, pregnant, pressure #> #>