Impute features by fitting a Learner
for each feature.
Uses the features indicated by the context_columns
parameter as features to train the imputation Learner
.
Note this parameter is part of the PipeOpImpute
base class and explained there.
Additionally, only features supported by the learner can be imputed; i.e. learners of type
regr
can only impute features of type integer
and numeric
, while classif
can impute
features of type factor
, ordered
and logical
.
The Learner
used for imputation is trained on all context_columns
; if these contain missing values,
the Learner
typically either needs to be able to handle missing values itself, or needs to do its
own imputation (see examples).
Format
R6Class
object inheriting from PipeOpImpute
/PipeOp
.
Construction
id
::character(1)
Identifier of resulting object, default"impute."
, followed by theid
of theLearner
.learner
::Learner
|character(1)
Learner
to wrap, or a string identifying aLearner
in themlr3::mlr_learners
Dictionary
. TheLearner
usually needs to be able to handle missing values, i.e. have themissings
property, unless care is taken thatcontext_columns
do not contain missings; see examples.
This argument is always cloned; to access theLearner
insidePipeOpImputeLearner
by-reference, use$learner
.param_vals
:: namedlist
List of hyperparameter settings, overwriting the hyperparameter settings that would otherwise be set during construction. Defaultlist()
.
Input and Output Channels
Input and output channels are inherited from PipeOpImpute
.
The output is the input Task
with missing values from all affected features imputed by the trained model.
State
The $state
is a named list
with the $state
elements inherited from PipeOpImpute
.
The $state$models
is a named list
of models
created by the Learner
's $.train()
function
for each column. If a column consists of missing values only during training, the model
is 0
or the levels of the
feature; these are used for sampling during prediction.
This state is given the class "pipeop_impute_learner_state"
.
Parameters
The parameters are the parameters inherited from PipeOpImpute
, in addition to the parameters of the Learner
used for imputation.
Internals
Uses the $train
and $predict
functions of the provided learner. Features that are entirely NA
are imputed as 0
or randomly sampled from available (factor
/ logical
) levels.
The Learner
does not necessarily need to handle missing values in cases
where context_columns
is chosen well (or there is only one column with missing values present).
Fields
Fields inherited from PipeOpTaskPreproc
/PipeOp
, as well as:
learner_models
::list
ofLearner
|NULL
Learner
that is being wrapped. This list is named by features for which aLearner
was fitted, and contains the sameLearner
, but with different respective models for each feature. If thisPipeOp
is not trained, this is an emptylist
. For features that were entirelyNA
during training, thelist
containsNULL
elements.
Methods
Only methods inherited from PipeOpImpute
/PipeOp
.
See also
https://mlr-org.com/pipeops.html
Other PipeOps:
PipeOp
,
PipeOpEnsemble
,
PipeOpImpute
,
PipeOpTargetTrafo
,
PipeOpTaskPreproc
,
PipeOpTaskPreprocSimple
,
mlr_pipeops
,
mlr_pipeops_adas
,
mlr_pipeops_blsmote
,
mlr_pipeops_boxcox
,
mlr_pipeops_branch
,
mlr_pipeops_chunk
,
mlr_pipeops_classbalancing
,
mlr_pipeops_classifavg
,
mlr_pipeops_classweights
,
mlr_pipeops_colapply
,
mlr_pipeops_collapsefactors
,
mlr_pipeops_colroles
,
mlr_pipeops_copy
,
mlr_pipeops_datefeatures
,
mlr_pipeops_encode
,
mlr_pipeops_encodeimpact
,
mlr_pipeops_encodelmer
,
mlr_pipeops_featureunion
,
mlr_pipeops_filter
,
mlr_pipeops_fixfactors
,
mlr_pipeops_histbin
,
mlr_pipeops_ica
,
mlr_pipeops_imputeconstant
,
mlr_pipeops_imputehist
,
mlr_pipeops_imputemean
,
mlr_pipeops_imputemedian
,
mlr_pipeops_imputemode
,
mlr_pipeops_imputeoor
,
mlr_pipeops_imputesample
,
mlr_pipeops_kernelpca
,
mlr_pipeops_learner
,
mlr_pipeops_learner_pi_cvplus
,
mlr_pipeops_learner_quantiles
,
mlr_pipeops_missind
,
mlr_pipeops_modelmatrix
,
mlr_pipeops_multiplicityexply
,
mlr_pipeops_multiplicityimply
,
mlr_pipeops_mutate
,
mlr_pipeops_nearmiss
,
mlr_pipeops_nmf
,
mlr_pipeops_nop
,
mlr_pipeops_ovrsplit
,
mlr_pipeops_ovrunite
,
mlr_pipeops_pca
,
mlr_pipeops_proxy
,
mlr_pipeops_quantilebin
,
mlr_pipeops_randomprojection
,
mlr_pipeops_randomresponse
,
mlr_pipeops_regravg
,
mlr_pipeops_removeconstants
,
mlr_pipeops_renamecolumns
,
mlr_pipeops_replicate
,
mlr_pipeops_rowapply
,
mlr_pipeops_scale
,
mlr_pipeops_scalemaxabs
,
mlr_pipeops_scalerange
,
mlr_pipeops_select
,
mlr_pipeops_smote
,
mlr_pipeops_smotenc
,
mlr_pipeops_spatialsign
,
mlr_pipeops_subsample
,
mlr_pipeops_targetinvert
,
mlr_pipeops_targetmutate
,
mlr_pipeops_targettrafoscalerange
,
mlr_pipeops_textvectorizer
,
mlr_pipeops_threshold
,
mlr_pipeops_tomek
,
mlr_pipeops_tunethreshold
,
mlr_pipeops_unbranch
,
mlr_pipeops_updatetarget
,
mlr_pipeops_vtreat
,
mlr_pipeops_yeojohnson
Other Imputation PipeOps:
PipeOpImpute
,
mlr_pipeops_imputeconstant
,
mlr_pipeops_imputehist
,
mlr_pipeops_imputemean
,
mlr_pipeops_imputemedian
,
mlr_pipeops_imputemode
,
mlr_pipeops_imputeoor
,
mlr_pipeops_imputesample
Examples
library("mlr3")
task = tsk("pima")
task$missings()
#> diabetes age glucose insulin mass pedigree pregnant pressure
#> 0 0 5 374 11 0 0 35
#> triceps
#> 227
po = po("imputelearner", lrn("regr.rpart"))
new_task = po$train(list(task = task))[[1]]
new_task$missings()
#> diabetes age pedigree pregnant glucose insulin mass pressure
#> 0 0 0 0 0 0 0 0
#> triceps
#> 0
# '$state' of the "regr.rpart" Learner, trained to predict the 'mass' column:
po$state$model$mass
#> $model
#> n= 757
#>
#> node), split, n, deviance, yval
#> * denotes terminal node
#>
#> 1) root 757 36254.3300 32.45746
#> 2) triceps< 25.5 219 5537.6560 27.93196
#> 4) triceps< 20.5 144 3140.7800 26.68333 *
#> 5) triceps>=20.5 75 1741.3150 30.32933
#> 10) pressure< 83 64 1081.6090 29.37813 *
#> 11) pressure>=83 11 264.8855 35.86364 *
#> 3) triceps>=25.5 538 24405.7800 34.29963
#> 6) triceps< 35.5 380 14414.2500 32.50474
#> 12) pressure< 74.5 223 6772.1180 31.49013
#> 24) glucose< 73.5 8 44.1000 24.20000 *
#> 25) glucose>=73.5 215 6287.0300 31.76140
#> 50) pregnant>=0.5 190 4822.6790 31.28947 *
#> 51) pregnant< 0.5 25 1100.4420 35.34800 *
#> 13) pressure>=74.5 157 7086.5100 33.94586
#> 26) insulin< 187 122 4736.5000 33.05656 *
#> 27) insulin>=187 35 1917.2070 37.04571 *
#> 7) triceps>=35.5 158 5822.9770 38.61646
#> 14) pregnant>=1.5 92 2351.3170 37.02174 *
#> 15) pregnant< 1.5 66 2911.5580 40.83939 *
#>
#> $log
#> Empty data.table (0 rows and 3 cols): stage,class,msg
#>
#> $train_time
#> [1] 0.004
#>
#> $param_vals
#> $param_vals$xval
#> [1] 0
#>
#>
#> $task_hash
#> [1] "d842f1537a7dd7ef"
#>
#> $feature_names
#> [1] "age" "glucose" "insulin" "pedigree" "pregnant" "pressure" "triceps"
#>
#> $validate
#> NULL
#>
#> $mlr3_version
#> [1] ‘0.21.1’
#>
#> $data_prototype
#> Empty data.table (0 rows and 8 cols): .impute_col,age,glucose,insulin,pedigree,pregnant...
#>
#> $task_prototype
#> Empty data.table (0 rows and 8 cols): .impute_col,age,glucose,insulin,pedigree,pregnant...
#>
#> $train_task
#> <TaskRegr:imputing> (768 x 8)
#> * Target: .impute_col
#> * Properties: -
#> * Features (7):
#> - dbl (7): age, glucose, insulin, pedigree, pregnant, pressure,
#> triceps
#>
#> attr(,"class")
#> [1] "learner_state" "list"
library("mlr3learners")
# to use the "regr.kknn" Learner, prefix it with its own imputation method!
# The "imputehist" PipeOp is used to train "regr.kknn"; predictions of this
# trained Learner are then used to impute the missing values in the Task.
po = po("imputelearner",
po("imputehist") %>>% lrn("regr.kknn")
)
new_task = po$train(list(task = task))[[1]]
new_task$missings()
#> diabetes age pedigree pregnant glucose insulin mass pressure
#> 0 0 0 0 0 0 0 0
#> triceps
#> 0