Feature filtering using a mlr3filters::Filter object, see the mlr3filters package.

If a Filter can only operate on a subset of columns based on column type, then only these features are considered and filtered. nfeat and frac will count for the features of the type that the Filter can operate on; this means e.g. that setting nfeat to 0 will only remove features of the type that the Filter can work with.


R6Class object inheriting from PipeOpTaskPreprocSimple/PipeOpTaskPreproc/PipeOp.


PipeOpFilter$new(filter, id = filter$id, param_vals = list())
  • filter :: Filter
    Filter used for feature filtering.

  • id :: character(1) Identifier of the resulting object, defaulting to the id of the Filter being used.

  • param_vals :: named list
    List of hyperparameter settings, overwriting the hyperparameter settings that would otherwise be set during construction. Default list().

Input and Output Channels

Input and output channels are inherited from PipeOpTaskPreproc.

The output is the input Task with features removed that were filtered out.


The $state is a named list with the $state elements inherited from PipeOpTaskPreproc, as well as:

  • scores :: named numeric
    Scores calculated for all features of the training Task which are being used as cutoff for feature filtering. If frac or nfeat is given, the underlying Filter may choose to not calculate scores for all features that are given. This only includes features on which the Filter can operate; e.g. if the Filter can only operate on numeric features, then scores for factorial features will not be given.

  • features :: character
    Names of features that are being kept. Features of types that the Filter can not operate on are always being kept.


The parameters are the parameters inherited from the PipeOpTaskPreproc, as well as the parameters of the Filter used by this object. Besides, parameters introduced are:

  • filter.nfeat :: numeric(1)
    Number of features to select. Mutually exclusive with frac and cutoff.

  • filter.frac :: numeric(1)
    Fraction of features to keep. Mutually exclusive with nfeat and cutoff.

  • filter.cutoff :: numeric(1)
    Minimum value of filter heuristic for which to keep features. Mutually exclusive with nfeat and frac.

Note that at least one of filter.nfeat, filter.frac, or filter.cutoff must be given.


This does not use the $.select_cols feature of PipeOpTaskPreproc to select only features compatible with the Filter; instead the whole Task is used by private$.get_state() and subset internally.


Fields inherited from PipeOpTaskPreproc, as well as:

  • filter :: Filter
    Filter that is being used for feature filtering. Do not use this slot to get to the feature filtering scores after training; instead, use $state$scores. Read-only.


Methods inherited from PipeOpTaskPreprocSimple/PipeOpTaskPreproc/PipeOp.

See also

Other PipeOps: PipeOpEnsemble, PipeOpImpute, PipeOpTargetTrafo, PipeOpTaskPreprocSimple, PipeOpTaskPreproc, PipeOp, mlr_pipeops_boxcox, mlr_pipeops_branch, mlr_pipeops_chunk, mlr_pipeops_classbalancing, mlr_pipeops_classifavg, mlr_pipeops_classweights, mlr_pipeops_colapply, mlr_pipeops_collapsefactors, mlr_pipeops_colroles, mlr_pipeops_copy, mlr_pipeops_datefeatures, mlr_pipeops_encodeimpact, mlr_pipeops_encodelmer, mlr_pipeops_encode, mlr_pipeops_featureunion, mlr_pipeops_fixfactors, mlr_pipeops_histbin, mlr_pipeops_ica, mlr_pipeops_imputeconstant, mlr_pipeops_imputehist, mlr_pipeops_imputelearner, mlr_pipeops_imputemean, mlr_pipeops_imputemedian, mlr_pipeops_imputemode, mlr_pipeops_imputeoor, mlr_pipeops_imputesample, mlr_pipeops_kernelpca, mlr_pipeops_learner, mlr_pipeops_missind, mlr_pipeops_modelmatrix, mlr_pipeops_multiplicityexply, mlr_pipeops_multiplicityimply, mlr_pipeops_mutate, mlr_pipeops_nmf, mlr_pipeops_nop, mlr_pipeops_ovrsplit, mlr_pipeops_ovrunite, mlr_pipeops_pca, mlr_pipeops_proxy, mlr_pipeops_quantilebin, mlr_pipeops_randomprojection, mlr_pipeops_randomresponse, mlr_pipeops_regravg, mlr_pipeops_removeconstants, mlr_pipeops_renamecolumns, mlr_pipeops_replicate, mlr_pipeops_scalemaxabs, mlr_pipeops_scalerange, mlr_pipeops_scale, mlr_pipeops_select, mlr_pipeops_smote, mlr_pipeops_spatialsign, mlr_pipeops_subsample, mlr_pipeops_targetinvert, mlr_pipeops_targetmutate, mlr_pipeops_targettrafoscalerange, mlr_pipeops_textvectorizer, mlr_pipeops_threshold, mlr_pipeops_tunethreshold, mlr_pipeops_unbranch, mlr_pipeops_updatetarget, mlr_pipeops_vtreat, mlr_pipeops_yeojohnson, mlr_pipeops


library("mlr3") library("mlr3filters") # setup PipeOpFilter to keep the 5 most important # features of the spam task w.r.t. their AUC task = tsk("spam") filter = flt("auc") po = po("filter", filter = filter) po$param_set
#> <ParamSetCollection:auc> #> id class lower upper levels default value #> 1: filter.nfeat ParamInt 0 Inf <NoDefault[3]> #> 2: filter.frac ParamDbl 0 1 <NoDefault[3]> #> 3: filter.cutoff ParamDbl -Inf Inf <NoDefault[3]> #> 4: affect_columns ParamUty NA NA <Selector[1]>
po$param_set$values$filter.nfeat = 5 # filter the task filtered_task = po$train(list(task))[[1]] # filtered task + extracted AUC scores filtered_task$feature_names
#> [1] "capitalAve" "capitalLong" "charDollar" "charExclamation" #> [5] "your"
head(po$state$scores, 10)
#> charExclamation capitalLong capitalAve your charDollar #> 0.3290461 0.3041626 0.2882004 0.2801659 0.2721394 #> capitalTotal free our you remove #> 0.2622801 0.2327285 0.2109325 0.2104681 0.2031303
# feature selection embedded in a 3-fold cross validation # keep 30% of features based on their AUC score task = tsk("spam") gr = po("filter", filter = flt("auc"), filter.frac = 0.5) %>>% po("learner", lrn("classif.rpart")) learner = GraphLearner$new(gr) rr = resample(task, learner, rsmp("holdout"), store_models = TRUE) rr$learners[[1]]$model$auc$scores
#> charExclamation capitalLong capitalAve your #> 3.290018e-01 3.084719e-01 2.924356e-01 2.850997e-01 #> charDollar capitalTotal free you #> 2.760477e-01 2.690304e-01 2.328002e-01 2.133331e-01 #> our remove money all #> 2.127344e-01 2.049659e-01 1.848303e-01 1.800999e-01 #> hp num000 business over #> 1.768315e-01 1.592152e-01 1.529875e-01 1.490547e-01 #> mail internet hpl george #> 1.395390e-01 1.362281e-01 1.362075e-01 1.341867e-01 #> email receive address order #> 1.316039e-01 1.303801e-01 1.246968e-01 1.142778e-01 #> make num1999 charHash credit #> 1.090133e-01 1.049933e-01 1.024926e-01 9.926152e-02 #> will people labs addresses #> 9.423281e-02 9.040350e-02 7.689188e-02 7.541491e-02 #> num650 num85 edu lab #> 6.979414e-02 6.939648e-02 6.787860e-02 6.004967e-02 #> technology telnet meeting data #> 5.498094e-02 5.137943e-02 4.946566e-02 4.597672e-02 #> pm report project num857 #> 3.984151e-02 3.941819e-02 3.742082e-02 3.490039e-02 #> charSquarebracket num415 original conference #> 3.485239e-02 3.285303e-02 2.864972e-02 2.808021e-02 #> cs re font charSemicolon #> 2.658932e-02 2.658113e-02 2.309021e-02 2.247249e-02 #> charRoundbracket direct num3d table #> 1.810618e-02 1.206585e-02 9.208792e-03 2.783626e-03 #> parts #> 5.883081e-05