Abstract base class for feature imputation.
Construction
PipeOpImpute$$new(id, param_set = ps(), param_vals = list(), whole_task_dependent = FALSE, packages = character(0), task_type = "Task")
id
::character(1)
Identifier of resulting object. See$id
slot ofPipeOp
.param_set
::ParamSet
Parameter space description. This should be created by the subclass and given tosuper$initialize()
.param_vals
:: namedlist
List of hyperparameter settings, overwriting the hyperparameter settings given inparam_set
. The subclass should have its ownparam_vals
parameter and pass it on tosuper$initialize()
. Defaultlist()
.whole_task_dependent
::logical(1)
Whether thecontext_columns
parameter should be added which lets the user limit the columns that are used for imputation inference. This should generally beFALSE
if imputation depends only on individual features (e.g. mode imputation), andTRUE
if imputation depends on other features as well (e.g. kNN-imputation).packages ::
character
Set of all required packages for thePipeOp
'sprivate$.train
andprivate$.predict
methods. See$packages
slot. Default ischaracter(0)
.task_type
::character(1)
The class ofTask
that should be accepted as input and will be returned as output. This should generally be acharacter(1)
identifying a type ofTask
, e.g."Task"
,"TaskClassif"
or"TaskRegr"
(or another subclass introduced by other packages). Default is"Task"
.feature_types
::character
Feature types affected by thePipeOp
. Seeprivate$.select_cols()
for more information.
Input and Output Channels
PipeOpImpute
has one input channel named "input"
, taking a Task
, or a subclass of
Task
if the task_type
construction argument is given as such; both during training and prediction.
PipeOpImpute
has one output channel named "output"
, producing a Task
, or a subclass;
the Task
type is the same as for input; both during training and prediction.
The output Task
is the modified input Task
with features imputed according to the private$.impute()
function.
State
The $state
is a named list
; besides members added by inheriting classes, the members are:
affected_cols
::character
Names of features being selected by theaffect_columns
parameter.context_cols
::character
Names of features being selected by thecontext_columns
parameter.intasklayout
::data.table
Copy of the trainingTask
's$feature_types
slot. This is used during prediction to ensure that the predictionTask
has the same features, feature layout, and feature types as during training.outtasklayout
::data.table
Copy of the trainedTask
's$feature_types
slot. This is used during prediction to ensure that theTask
resulting from the prediction operation has the same features, feature layout, and feature types as after training.model
:: namedlist
Model used for imputation. This is a list named byTask
features, containing the result of theprivate$.train_imputer()
orprivate$.train_nullmodel()
function for each one.imputed_train
::character
Names of features that were imputed during training. This is used to ensure that factor levels that were added during training are also added during prediction. Note that features that are imputed during prediction but not during training will still have inconsistent factor levels.
Parameters
affect_columns
::function
|Selector
|NULL
What columns thePipeOpImpute
should operate on. The parameter must be aSelector
function, which takes aTask
as argument and returns acharacter
of features to use.
SeeSelector
for example functions. Defaults toNULL
, which selects all features.context_columns
::function
|Selector
|NULL
What columns thePipeOpImpute
imputation may depend on. This parameter is only present if the constructor is called with thewhole_task_dependent
argument set toTRUE
.
The parameter must be aSelector
function, which takes aTask
as argument and returns acharacter
of features to use.
SeeSelector
for example functions. Defaults toNULL
, which selects all features.
Internals
PipeOpImpute
is an abstract class inheriting from PipeOp
that makes implementing imputer PipeOp
s simple.
Fields
Fields inherited from PipeOp
.
Methods
Methods inherited from PipeOp
, as well as:
.select_cols(task)
(Task
) ->character
Selects which columns thePipeOp
operates on. In contrast to theaffect_columns
parameter.private$.select_cols()
is for the inheriting class to determine which columns the operator should function on, e.g. based on feature type, whileaffect_columns
is a way for the user to limit the columns that aPipeOpTaskPreproc
should operate on. This method can optionally be overloaded when inheritingPipeOpImpute
; If this method is not overloaded, it defaults to selecting the columns of type indicated by thefeature_types
construction argument..train_imputer(feature, type, context)
(atomic
,character(1)
,data.table
) ->any
Abstract function that must be overloaded when inheriting. Called once for each feature selected byaffect_columns
to create the model entry to be used forprivate$.impute()
. This function is only called for features with at least one non-missing value..train_nullmodel(feature, type, context)
(atomic
,character(1)
,data.table
) ->any
Like.train_imputer()
, but only called for each feature that only contains missing values. This is not an abstract function and, if not overloaded, gives a default response of0
(integer
,numeric
),c(TRUE, FALSE)
(logical
), all available levels (factor
/ordered
), or the empty string (character
)..impute(feature, type, model, context)
(atomic
,character(1)
,any
,data.table
) ->atomic
Imputes the features.model
is the model created byprivate$.train_imputer()
Default behaviour is to assumemodel
is an atomic vector from which values are sampled to impute missing values offeature
.model
may have an attributeprobabilities
for non-uniform sampling.
See also
https://mlr-org.com/pipeops.html
Other PipeOps:
PipeOp
,
PipeOpEnsemble
,
PipeOpTargetTrafo
,
PipeOpTaskPreproc
,
PipeOpTaskPreprocSimple
,
mlr_pipeops
,
mlr_pipeops_adas
,
mlr_pipeops_blsmote
,
mlr_pipeops_boxcox
,
mlr_pipeops_branch
,
mlr_pipeops_chunk
,
mlr_pipeops_classbalancing
,
mlr_pipeops_classifavg
,
mlr_pipeops_classweights
,
mlr_pipeops_colapply
,
mlr_pipeops_collapsefactors
,
mlr_pipeops_colroles
,
mlr_pipeops_copy
,
mlr_pipeops_datefeatures
,
mlr_pipeops_decode
,
mlr_pipeops_encode
,
mlr_pipeops_encodeimpact
,
mlr_pipeops_encodelmer
,
mlr_pipeops_featureunion
,
mlr_pipeops_filter
,
mlr_pipeops_fixfactors
,
mlr_pipeops_histbin
,
mlr_pipeops_ica
,
mlr_pipeops_imputeconstant
,
mlr_pipeops_imputehist
,
mlr_pipeops_imputelearner
,
mlr_pipeops_imputemean
,
mlr_pipeops_imputemedian
,
mlr_pipeops_imputemode
,
mlr_pipeops_imputeoor
,
mlr_pipeops_imputesample
,
mlr_pipeops_kernelpca
,
mlr_pipeops_learner
,
mlr_pipeops_learner_pi_cvplus
,
mlr_pipeops_learner_quantiles
,
mlr_pipeops_missind
,
mlr_pipeops_modelmatrix
,
mlr_pipeops_multiplicityexply
,
mlr_pipeops_multiplicityimply
,
mlr_pipeops_mutate
,
mlr_pipeops_nearmiss
,
mlr_pipeops_nmf
,
mlr_pipeops_nop
,
mlr_pipeops_ovrsplit
,
mlr_pipeops_ovrunite
,
mlr_pipeops_pca
,
mlr_pipeops_proxy
,
mlr_pipeops_quantilebin
,
mlr_pipeops_randomprojection
,
mlr_pipeops_randomresponse
,
mlr_pipeops_regravg
,
mlr_pipeops_removeconstants
,
mlr_pipeops_renamecolumns
,
mlr_pipeops_replicate
,
mlr_pipeops_rowapply
,
mlr_pipeops_scale
,
mlr_pipeops_scalemaxabs
,
mlr_pipeops_scalerange
,
mlr_pipeops_select
,
mlr_pipeops_smote
,
mlr_pipeops_smotenc
,
mlr_pipeops_spatialsign
,
mlr_pipeops_subsample
,
mlr_pipeops_targetinvert
,
mlr_pipeops_targetmutate
,
mlr_pipeops_targettrafoscalerange
,
mlr_pipeops_textvectorizer
,
mlr_pipeops_threshold
,
mlr_pipeops_tomek
,
mlr_pipeops_tunethreshold
,
mlr_pipeops_unbranch
,
mlr_pipeops_updatetarget
,
mlr_pipeops_vtreat
,
mlr_pipeops_yeojohnson
Other Imputation PipeOps:
mlr_pipeops_imputeconstant
,
mlr_pipeops_imputehist
,
mlr_pipeops_imputelearner
,
mlr_pipeops_imputemean
,
mlr_pipeops_imputemedian
,
mlr_pipeops_imputemode
,
mlr_pipeops_imputeoor
,
mlr_pipeops_imputesample