Feature filtering using a `mlr3filters::Filter`

object, see the
mlr3filters package.

If a `Filter`

can only operate on a subset of columns based on column type, then only these features are considered and filtered.
`nfeat`

and `frac`

will count for the features of the type that the `Filter`

can operate on;
this means e.g. that setting `nfeat`

to 0 will only remove features of the type that the `Filter`

can work with.

## Format

`R6Class`

object inheriting from `PipeOpTaskPreprocSimple`

/`PipeOpTaskPreproc`

/`PipeOp`

.

## Construction

`filter`

::`Filter`

`Filter`

used for feature filtering. This argument is always cloned; to access the`Filter`

inside`PipeOpFilter`

by-reference, use`$filter`

.`id`

::`character(1)`

Identifier of the resulting object, defaulting to the`id`

of the`Filter`

being used.`param_vals`

:: named`list`

List of hyperparameter settings, overwriting the hyperparameter settings that would otherwise be set during construction. Default`list()`

.

## Input and Output Channels

Input and output channels are inherited from `PipeOpTaskPreproc`

.

The output is the input `Task`

with features removed that were filtered out.

## State

The `$state`

is a named `list`

with the `$state`

elements inherited from `PipeOpTaskPreproc`

, as well as:

`scores`

:: named`numeric`

Scores calculated for all features of the training`Task`

which are being used as cutoff for feature filtering. If`frac`

or`nfeat`

is given, the underlying`Filter`

may choose to not calculate scores for all features that are given. This only includes features on which the`Filter`

can operate; e.g. if the`Filter`

can only operate on numeric features, then scores for factorial features will not be given.`features`

::`character`

Names of features that are being kept. Features of types that the`Filter`

can not operate on are always being kept.

## Parameters

The parameters are the parameters inherited from the `PipeOpTaskPreproc`

, as well as the parameters of the `Filter`

used by this object. Besides, parameters introduced are:

`filter.nfeat`

::`numeric(1)`

Number of features to select. Mutually exclusive with`frac`

,`cutoff`

, and`permuted`

.`filter.frac`

::`numeric(1)`

Fraction of features to keep. Mutually exclusive with`nfeat`

,`cutoff`

, and`permuted`

.`filter.cutoff`

::`numeric(1)`

Minimum value of filter heuristic for which to keep features. Mutually exclusive with`nfeat`

,`frac`

, and`permuted`

.`filter.permuted`

::`integer(1)`

If this parameter is set, a random permutation of each feature is added to the task before applying the filter. All features selected before the`permuted`

-th permuted features is selected are kept. This is similar to the approach in Wu (2007) and Thomas (2017). Mutually exclusive with`nfeat`

,`frac`

, and`cutoff`

.

Note that at least one of `filter.nfeat`

, `filter.frac`

, `filter.cutoff`

, and `filter.permuted`

must be given.

## Internals

This does *not* use the `$.select_cols`

feature of `PipeOpTaskPreproc`

to select only features compatible with the `Filter`

;
instead the whole `Task`

is used by `private$.get_state()`

and subset internally.

## Fields

Fields inherited from `PipeOpTaskPreproc`

, as well as:

## Methods

Methods inherited from `PipeOpTaskPreprocSimple`

/`PipeOpTaskPreproc`

/`PipeOp`

.

## References

Wu Y, Boos DD, Stefanski LA (2007).
“Controlling Variable Selection by the Addition of Pseudovariables.”
*Journal of the American Statistical Association*, **102**(477), 235–243.
doi:10.1198/016214506000000843
.

Thomas J, Hepp T, Mayr A, Bischl B (2017).
“Probing for Sparse and Fast Variable Selection with Model-Based Boosting.”
*Computational and Mathematical Methods in Medicine*, **2017**, 1–8.
doi:10.1155/2017/1421409
.

## See also

https://mlr-org.com/pipeops.html

Other PipeOps:
`PipeOp`

,
`PipeOpEnsemble`

,
`PipeOpImpute`

,
`PipeOpTargetTrafo`

,
`PipeOpTaskPreproc`

,
`PipeOpTaskPreprocSimple`

,
`mlr_pipeops`

,
`mlr_pipeops_boxcox`

,
`mlr_pipeops_branch`

,
`mlr_pipeops_chunk`

,
`mlr_pipeops_classbalancing`

,
`mlr_pipeops_classifavg`

,
`mlr_pipeops_classweights`

,
`mlr_pipeops_colapply`

,
`mlr_pipeops_collapsefactors`

,
`mlr_pipeops_colroles`

,
`mlr_pipeops_copy`

,
`mlr_pipeops_datefeatures`

,
`mlr_pipeops_encode`

,
`mlr_pipeops_encodeimpact`

,
`mlr_pipeops_encodelmer`

,
`mlr_pipeops_featureunion`

,
`mlr_pipeops_fixfactors`

,
`mlr_pipeops_histbin`

,
`mlr_pipeops_ica`

,
`mlr_pipeops_imputeconstant`

,
`mlr_pipeops_imputehist`

,
`mlr_pipeops_imputelearner`

,
`mlr_pipeops_imputemean`

,
`mlr_pipeops_imputemedian`

,
`mlr_pipeops_imputemode`

,
`mlr_pipeops_imputeoor`

,
`mlr_pipeops_imputesample`

,
`mlr_pipeops_kernelpca`

,
`mlr_pipeops_learner`

,
`mlr_pipeops_missind`

,
`mlr_pipeops_modelmatrix`

,
`mlr_pipeops_multiplicityexply`

,
`mlr_pipeops_multiplicityimply`

,
`mlr_pipeops_mutate`

,
`mlr_pipeops_nmf`

,
`mlr_pipeops_nop`

,
`mlr_pipeops_ovrsplit`

,
`mlr_pipeops_ovrunite`

,
`mlr_pipeops_pca`

,
`mlr_pipeops_proxy`

,
`mlr_pipeops_quantilebin`

,
`mlr_pipeops_randomprojection`

,
`mlr_pipeops_randomresponse`

,
`mlr_pipeops_regravg`

,
`mlr_pipeops_removeconstants`

,
`mlr_pipeops_renamecolumns`

,
`mlr_pipeops_replicate`

,
`mlr_pipeops_rowapply`

,
`mlr_pipeops_scale`

,
`mlr_pipeops_scalemaxabs`

,
`mlr_pipeops_scalerange`

,
`mlr_pipeops_select`

,
`mlr_pipeops_smote`

,
`mlr_pipeops_spatialsign`

,
`mlr_pipeops_subsample`

,
`mlr_pipeops_targetinvert`

,
`mlr_pipeops_targetmutate`

,
`mlr_pipeops_targettrafoscalerange`

,
`mlr_pipeops_textvectorizer`

,
`mlr_pipeops_threshold`

,
`mlr_pipeops_tunethreshold`

,
`mlr_pipeops_unbranch`

,
`mlr_pipeops_updatetarget`

,
`mlr_pipeops_vtreat`

,
`mlr_pipeops_yeojohnson`