Skip to contents

Remove constant features from a mlr3::Task. For each feature, calculates the ratio of features which differ from their mode value. All features with a ratio below a settable threshold are removed from the task. Missing values can be ignored or treated as a regular value distinct from non-missing values.

Format

R6Class object inheriting from PipeOpTaskPreprocSimple/PipeOpTaskPreproc/PipeOp.

Construction

PipeOpRemoveConstants$new(id = "removeconstants")

  • id :: character(1) Identifier of the resulting object, defaulting to "removeconstants".

  • param_vals :: named list
    List of hyperparameter settings, overwriting the hyperparameter settings that would otherwise be set during construction. Default list().

State

$state is a named list with the $state elements inherited from PipeOpTaskPreproc, as well as:

  • features :: character()
    Names of features that are being kept. Features of types that the Filter can not operate on are always being kept.

Parameters

The parameters are the parameters inherited from the PipeOpTaskPreproc, as well as:

  • ratio :: numeric(1)
    Ratio of values which must be different from the mode value in order to keep a feature in the task. Initialized to 0, which means only constant features with exactly one observed level are removed.

  • rel_tol :: numeric(1)
    Relative tolerance within which to consider a numeric feature constant. Set to 0 to disregard relative tolerance. Initialized to 1e-8.

  • abs_tol :: numeric(1)
    Absolute tolerance within which to consider a numeric feature constant. Set to 0 to disregard absolute tolerance. Initialized to 1e-8.

  • na_ignore :: logical(1)
    If TRUE, the ratio is calculated after removing all missing values first, so a column can be "constant" even if some but not all values are NA. Initialized to TRUE.

Fields

Fields inherited from PipeOpTaskPreproc/PipeOp.

Methods

Methods inherited from PipeOpTaskPreprocSimple/PipeOpTaskPreproc/PipeOp.

See also

https://mlr-org.com/pipeops.html

Other PipeOps: PipeOpEnsemble, PipeOpImpute, PipeOpTargetTrafo, PipeOpTaskPreprocSimple, PipeOpTaskPreproc, PipeOp, mlr_pipeops_boxcox, mlr_pipeops_branch, mlr_pipeops_chunk, mlr_pipeops_classbalancing, mlr_pipeops_classifavg, mlr_pipeops_classweights, mlr_pipeops_colapply, mlr_pipeops_collapsefactors, mlr_pipeops_colroles, mlr_pipeops_copy, mlr_pipeops_datefeatures, mlr_pipeops_encodeimpact, mlr_pipeops_encodelmer, mlr_pipeops_encode, mlr_pipeops_featureunion, mlr_pipeops_filter, mlr_pipeops_fixfactors, mlr_pipeops_histbin, mlr_pipeops_ica, mlr_pipeops_imputeconstant, mlr_pipeops_imputehist, mlr_pipeops_imputelearner, mlr_pipeops_imputemean, mlr_pipeops_imputemedian, mlr_pipeops_imputemode, mlr_pipeops_imputeoor, mlr_pipeops_imputesample, mlr_pipeops_kernelpca, mlr_pipeops_learner, mlr_pipeops_missind, mlr_pipeops_modelmatrix, mlr_pipeops_multiplicityexply, mlr_pipeops_multiplicityimply, mlr_pipeops_mutate, mlr_pipeops_nmf, mlr_pipeops_nop, mlr_pipeops_ovrsplit, mlr_pipeops_ovrunite, mlr_pipeops_pca, mlr_pipeops_proxy, mlr_pipeops_quantilebin, mlr_pipeops_randomprojection, mlr_pipeops_randomresponse, mlr_pipeops_regravg, mlr_pipeops_renamecolumns, mlr_pipeops_replicate, mlr_pipeops_scalemaxabs, mlr_pipeops_scalerange, mlr_pipeops_scale, mlr_pipeops_select, mlr_pipeops_smote, mlr_pipeops_spatialsign, mlr_pipeops_subsample, mlr_pipeops_targetinvert, mlr_pipeops_targetmutate, mlr_pipeops_targettrafoscalerange, mlr_pipeops_textvectorizer, mlr_pipeops_threshold, mlr_pipeops_tunethreshold, mlr_pipeops_unbranch, mlr_pipeops_updatetarget, mlr_pipeops_vtreat, mlr_pipeops_yeojohnson, mlr_pipeops

Examples

library("mlr3")
data = data.table::data.table(y = runif(10), a = 1:10, b = rep(1, 10), c = rep(1:2, each = 5))

task = TaskRegr$new("example", data, target = "y")

po = po("removeconstants")

po$train(list(task = task))[[1]]$data()
#>              y     a     c
#>          <num> <int> <int>
#>  1: 0.84982510     1     1
#>  2: 0.33041754     2     1
#>  3: 0.36344250     3     1
#>  4: 0.76530451     4     1
#>  5: 0.54021206     5     1
#>  6: 0.06709176     6     2
#>  7: 0.02093449     7     2
#>  8: 0.80938970     8     2
#>  9: 0.60619806     9     2
#> 10: 0.81683238    10     2

po$state
#> $features
#> [1] "a" "c"
#> 
#> $affected_cols
#> [1] "a" "b" "c"
#> 
#> $intasklayout
#> Key: <id>
#>        id    type
#>    <char>  <char>
#> 1:      a integer
#> 2:      b numeric
#> 3:      c integer
#> 
#> $outtasklayout
#> Key: <id>
#>        id    type
#>    <char>  <char>
#> 1:      a integer
#> 2:      c integer
#> 
#> $outtaskshell
#> Empty data.table (0 rows and 3 cols): y,a,c
#>