Centers all numeric features to mean = 0 (if center
parameter is TRUE
) and scales them
by dividing them by their root-mean-square (if scale
parameter is TRUE
).
The root-mean-square here is defined as sqrt(sum(x^2)/(length(x)-1))
. If the center
parameter
is TRUE
, this corresponds to the sd()
.
Format
R6Class
object inheriting from PipeOpTaskPreproc
/PipeOp
.
Construction
id
::character(1)
Identifier of resulting object, default"scale"
.param_vals
:: namedlist
List of hyperparameter settings, overwriting the hyperparameter settings that would otherwise be set during construction. Defaultlist()
.
Input and Output Channels
Input and output channels are inherited from PipeOpTaskPreproc
.
The output is the input Task
with all affected numeric parameters centered and/or scaled.
State
The $state
is a named list
with the $state
elements inherited from PipeOpTaskPreproc
, as well as:
center
::numeric
The mean / median (depending onrobust
) of each numeric feature during training, or 0 ifcenter
isFALSE
. Will be subtracted during the predict phase.scale
::numeric
The value by which features are divided. 1 ifscale
isFALSE
Ifrobust
isFALSE
, this is the root mean square, defined assqrt(sum(x^2)/(length(x)-1))
, of each feature, possibly after centering. Ifrobust
isTRUE
, this is the mean absolute deviation multiplied by 1.4826 (see stats::mad of each feature, possibly after centering. This is 1 for features that are constant during training ifcenter
isTRUE
, to avoid division-by-zero.
Parameters
The parameters are the parameters inherited from PipeOpTaskPreproc
, as well as:
center
::logical(1)
Whether to center features, i.e. subtract theirmean()
from them. DefaultTRUE
.scale
::logical(1)
Whether to scale features, i.e. divide them bysqrt(sum(x^2)/(length(x)-1))
. DefaultTRUE
.robust
::logical(1)
Whether to use robust scaling; instead of scaling / centering with mean / standard deviation, median and median absolute deviationmad
are used. Initialized toFALSE
.
Internals
Imitates the scale()
function for robust = FALSE
and alternatively subtracts the
median
and divides by mad
for robust = TRUE
.
Methods
Only methods inherited from PipeOpTaskPreproc
/PipeOp
.
See also
https://mlr-org.com/pipeops.html
Other PipeOps:
PipeOp
,
PipeOpEnsemble
,
PipeOpImpute
,
PipeOpTargetTrafo
,
PipeOpTaskPreproc
,
PipeOpTaskPreprocSimple
,
mlr_pipeops
,
mlr_pipeops_boxcox
,
mlr_pipeops_branch
,
mlr_pipeops_chunk
,
mlr_pipeops_classbalancing
,
mlr_pipeops_classifavg
,
mlr_pipeops_classweights
,
mlr_pipeops_colapply
,
mlr_pipeops_collapsefactors
,
mlr_pipeops_colroles
,
mlr_pipeops_copy
,
mlr_pipeops_datefeatures
,
mlr_pipeops_encode
,
mlr_pipeops_encodeimpact
,
mlr_pipeops_encodelmer
,
mlr_pipeops_featureunion
,
mlr_pipeops_filter
,
mlr_pipeops_fixfactors
,
mlr_pipeops_histbin
,
mlr_pipeops_ica
,
mlr_pipeops_imputeconstant
,
mlr_pipeops_imputehist
,
mlr_pipeops_imputelearner
,
mlr_pipeops_imputemean
,
mlr_pipeops_imputemedian
,
mlr_pipeops_imputemode
,
mlr_pipeops_imputeoor
,
mlr_pipeops_imputesample
,
mlr_pipeops_kernelpca
,
mlr_pipeops_learner
,
mlr_pipeops_missind
,
mlr_pipeops_modelmatrix
,
mlr_pipeops_multiplicityexply
,
mlr_pipeops_multiplicityimply
,
mlr_pipeops_mutate
,
mlr_pipeops_nmf
,
mlr_pipeops_nop
,
mlr_pipeops_ovrsplit
,
mlr_pipeops_ovrunite
,
mlr_pipeops_pca
,
mlr_pipeops_proxy
,
mlr_pipeops_quantilebin
,
mlr_pipeops_randomprojection
,
mlr_pipeops_randomresponse
,
mlr_pipeops_regravg
,
mlr_pipeops_removeconstants
,
mlr_pipeops_renamecolumns
,
mlr_pipeops_replicate
,
mlr_pipeops_rowapply
,
mlr_pipeops_scalemaxabs
,
mlr_pipeops_scalerange
,
mlr_pipeops_select
,
mlr_pipeops_smote
,
mlr_pipeops_spatialsign
,
mlr_pipeops_subsample
,
mlr_pipeops_targetinvert
,
mlr_pipeops_targetmutate
,
mlr_pipeops_targettrafoscalerange
,
mlr_pipeops_textvectorizer
,
mlr_pipeops_threshold
,
mlr_pipeops_tunethreshold
,
mlr_pipeops_unbranch
,
mlr_pipeops_updatetarget
,
mlr_pipeops_vtreat
,
mlr_pipeops_yeojohnson
Examples
library("mlr3")
task = tsk("iris")
pos = po("scale")
pos$train(list(task))[[1]]$data()
#> Species Petal.Length Petal.Width Sepal.Length Sepal.Width
#> <fctr> <num> <num> <num> <num>
#> 1: setosa -1.3357516 -1.3110521 -0.89767388 1.01560199
#> 2: setosa -1.3357516 -1.3110521 -1.13920048 -0.13153881
#> 3: setosa -1.3923993 -1.3110521 -1.38072709 0.32731751
#> 4: setosa -1.2791040 -1.3110521 -1.50149039 0.09788935
#> 5: setosa -1.3357516 -1.3110521 -1.01843718 1.24503015
#> ---
#> 146: virginica 0.8168591 1.4439941 1.03453895 -0.13153881
#> 147: virginica 0.7035638 0.9192234 0.55148575 -1.27867961
#> 148: virginica 0.8168591 1.0504160 0.79301235 -0.13153881
#> 149: virginica 0.9301544 1.4439941 0.43072244 0.78617383
#> 150: virginica 0.7602115 0.7880307 0.06843254 -0.13153881
one_line_of_iris = task$filter(13)
one_line_of_iris$data()
#> Species Petal.Length Petal.Width Sepal.Length Sepal.Width
#> <fctr> <num> <num> <num> <num>
#> 1: setosa 1.4 0.1 4.8 3
pos$predict(list(one_line_of_iris))[[1]]$data()
#> Species Petal.Length Petal.Width Sepal.Length Sepal.Width
#> <fctr> <num> <num> <num> <num>
#> 1: setosa -1.335752 -1.442245 -1.259964 -0.1315388