Abstract base class for feature imputation.
Construction
PipeOpImpute$$new(id, param_set = ps(), param_vals = list(), whole_task_dependent = FALSE, empty_level_control = FALSE,
packages = character(0), task_type = "Task")id::character(1)
Identifier of resulting object. See$idslot ofPipeOp.param_set::ParamSet
Parameter space description. This should be created by the subclass and given tosuper$initialize().param_vals:: namedlist
List of hyperparameter settings, overwriting the hyperparameter settings given inparam_set. The subclass should have its ownparam_valsparameter and pass it on tosuper$initialize(). Defaultlist().whole_task_dependent::logical(1)
Whether thecontext_columnsparameter should be added which lets the user limit the columns that are used for imputation inference. This should generally beFALSEif imputation depends only on individual features (e.g. mode imputation), andTRUEif imputation depends on other features as well (e.g. kNN-imputation).empty_level_control::logical(1)
Control how to handle edge cases whereNAs occur infactorororderedfeatures only during prediction but not during training. Can be one of"never","always", or"param":If set to
"never", no empty level is introduced during training, but columns that have missing values only during prediction will not be imputed.If set to
"always", an unseen level is added to the feature during training and missing values are imputed as that value during prediction.Finally, if set to
"param", the hyperparametercreate_empty_levelis added and control over this behavior is left to the user.
For implementation details, see Internals below. Default is
"never".packages::character
Set of all required packages for thePipeOp'sprivate$.trainandprivate$.predictmethods. See$packagesslot. Default ischaracter(0).task_type::character(1)
The class ofTaskthat should be accepted as input and will be returned as output. This should generally be acharacter(1)identifying a type ofTask, e.g."Task","TaskClassif"or"TaskRegr"(or another subclass introduced by other packages). Default is"Task".feature_types::character
Feature types affected by thePipeOp. Seeprivate$.select_cols()for more information.
Input and Output Channels
PipeOpImpute has one input channel named "input", taking a Task, or a subclass of
Task if the task_type construction argument is given as such; both during training and prediction.
PipeOpImpute has one output channel named "output", producing a Task, or a subclass;
the Task type is the same as for input; both during training and prediction.
The output Task is the modified input Task with features imputed according to the private$.impute() function.
State
The $state is a named list; besides members added by inheriting classes, the members are:
affected_cols::character
Names of features being selected by theaffect_columnsparameter.context_cols::character
Names of features being selected by thecontext_columnsparameter.intasklayout::data.table
Copy of the trainingTask's$feature_typesslot. This is used during prediction to ensure that the predictionTaskhas the same features, feature layout, and feature types as during training.outtasklayout::data.table
Copy of the trainedTask's$feature_typesslot. This is used during prediction to ensure that theTaskresulting from the prediction operation has the same features, feature layout, and feature types as after training.model:: namedlist
Model used for imputation. This is a list named byTaskfeatures, containing the result of theprivate$.train_imputer()orprivate$.train_nullmodel()function for each one.imputed_train::character
Names of features that were imputed during training. This is used to ensure that factor levels that were added during training are also added during prediction. Note that features that are imputed during prediction but not during training will still have inconsistent factor levels.
Parameters
affect_columns::function|Selector|NULL
What columns thePipeOpImputeshould operate on. The parameter must be aSelectorfunction, which takes aTaskas argument and returns acharacterof features to use.
SeeSelectorfor example functions. Defaults toNULL, which selects all features.context_columns::function|Selector|NULL
What columns thePipeOpImputeimputation may depend on. This parameter is only present if the constructor is called with thewhole_task_dependentargument set toTRUE.
The parameter must be aSelectorfunction, which takes aTaskas argument and returns acharacterof features to use.
SeeSelectorfor example functions. Defaults toNULL, which selects all features.create_empty_level::logical(1)
Whether an empty level should always be created forfactorororderedcolumns during training. IfFALSE, columns that had noNAs during training but haveNAs during prediction will not be imputed. This parameter is only present if the constructor is called with theempty_level_controlargument set to"param". Initialized toFALSE.
Internals
PipeOpImpute is an abstract class inheriting from PipeOp that makes implementing imputer PipeOps simple.
Internally, the construction argument empty_level_control and the hyperparameter create_empty_level (should it
exist) modify the private$.create_empty_level field. Behavior then depends on whether this field is set to TRUE
or FALSE and works by controlling for which cases imputation is performed on factor or ordered columns. Its
setting has no impact on columns of other types.
If private$.create_empty_level is set to TRUE, private$.impute() is called for all factor or ordered
columns during training, regardless of whether they have any missing values. For this to lead to the creation of an
empty level for columns with no missing values, inheriting PipeOps must implement private$.train_imputer() in
such a way that it returns the name of the level to be created for the feature types factor and ordered.
If private$.create_empty_level is set to FALSE, private$.impute() is not called during prediction for factor
or ordered columns which were not modified during training. This means that NAs will not be imputed for these
columns.
See PipeOpImputeOOR, for a detailed explanation of why these controls are necessary.
Fields
Fields inherited from PipeOp.
Methods
Methods inherited from PipeOp, as well as:
.select_cols(task)
(Task) ->character
Selects which columns thePipeOpoperates on. In contrast to theaffect_columnsparameter.private$.select_cols()is for the inheriting class to determine which columns the operator should function on, e.g. based on feature type, whileaffect_columnsis a way for the user to limit the columns that aPipeOpTaskPreprocshould operate on. This method can optionally be overloaded when inheritingPipeOpImpute; If this method is not overloaded, it defaults to selecting the columns of type indicated by thefeature_typesconstruction argument..train_imputer(feature, type, context)
(atomic,character(1),data.table) ->any
Abstract function that must be overloaded when inheriting. Called once for each feature selected byaffect_columnsto create the model entry to be used forprivate$.impute(). This function is only called for features with at least one non-missing value..train_nullmodel(feature, type, context)
(atomic,character(1),data.table) ->any
Like.train_imputer(), but only called for each feature that only contains missing values. This is not an abstract function and, if not overloaded, gives a default response of0(integer,numeric),c(TRUE, FALSE)(logical), all available levels (factor/ordered), or the empty string (character)..impute(feature, type, model, context)
(atomic,character(1),any,data.table) ->atomic
Imputes the features.modelis the model created byprivate$.train_imputer(). Default behaviour is to assumemodelis an atomic vector from which values are sampled to impute missing values offeature.modelmay have an attributeprobabilitiesfor non-uniform sampling. Ifmodelhas length zero,featureis returned unchanged.
See also
https://mlr-org.com/pipeops.html
Other PipeOps:
PipeOp,
PipeOpEncodePL,
PipeOpEnsemble,
PipeOpTargetTrafo,
PipeOpTaskPreproc,
PipeOpTaskPreprocSimple,
mlr_pipeops,
mlr_pipeops_adas,
mlr_pipeops_blsmote,
mlr_pipeops_boxcox,
mlr_pipeops_branch,
mlr_pipeops_chunk,
mlr_pipeops_classbalancing,
mlr_pipeops_classifavg,
mlr_pipeops_classweights,
mlr_pipeops_colapply,
mlr_pipeops_collapsefactors,
mlr_pipeops_colroles,
mlr_pipeops_copy,
mlr_pipeops_datefeatures,
mlr_pipeops_decode,
mlr_pipeops_encode,
mlr_pipeops_encodeimpact,
mlr_pipeops_encodelmer,
mlr_pipeops_encodeplquantiles,
mlr_pipeops_encodepltree,
mlr_pipeops_featureunion,
mlr_pipeops_filter,
mlr_pipeops_fixfactors,
mlr_pipeops_histbin,
mlr_pipeops_ica,
mlr_pipeops_imputeconstant,
mlr_pipeops_imputehist,
mlr_pipeops_imputelearner,
mlr_pipeops_imputemean,
mlr_pipeops_imputemedian,
mlr_pipeops_imputemode,
mlr_pipeops_imputeoor,
mlr_pipeops_imputesample,
mlr_pipeops_kernelpca,
mlr_pipeops_learner,
mlr_pipeops_learner_pi_cvplus,
mlr_pipeops_learner_quantiles,
mlr_pipeops_missind,
mlr_pipeops_modelmatrix,
mlr_pipeops_multiplicityexply,
mlr_pipeops_multiplicityimply,
mlr_pipeops_mutate,
mlr_pipeops_nearmiss,
mlr_pipeops_nmf,
mlr_pipeops_nop,
mlr_pipeops_ovrsplit,
mlr_pipeops_ovrunite,
mlr_pipeops_pca,
mlr_pipeops_proxy,
mlr_pipeops_quantilebin,
mlr_pipeops_randomprojection,
mlr_pipeops_randomresponse,
mlr_pipeops_regravg,
mlr_pipeops_removeconstants,
mlr_pipeops_renamecolumns,
mlr_pipeops_replicate,
mlr_pipeops_rowapply,
mlr_pipeops_scale,
mlr_pipeops_scalemaxabs,
mlr_pipeops_scalerange,
mlr_pipeops_select,
mlr_pipeops_smote,
mlr_pipeops_smotenc,
mlr_pipeops_spatialsign,
mlr_pipeops_subsample,
mlr_pipeops_targetinvert,
mlr_pipeops_targetmutate,
mlr_pipeops_targettrafoscalerange,
mlr_pipeops_textvectorizer,
mlr_pipeops_threshold,
mlr_pipeops_tomek,
mlr_pipeops_tunethreshold,
mlr_pipeops_unbranch,
mlr_pipeops_updatetarget,
mlr_pipeops_vtreat,
mlr_pipeops_yeojohnson
Other Imputation PipeOps:
mlr_pipeops_imputeconstant,
mlr_pipeops_imputehist,
mlr_pipeops_imputelearner,
mlr_pipeops_imputemean,
mlr_pipeops_imputemedian,
mlr_pipeops_imputemode,
mlr_pipeops_imputeoor,
mlr_pipeops_imputesample
