Yeo-Johnson Transformation of Numeric Features
Source:R/PipeOpYeoJohnson.R
mlr_pipeops_yeojohnson.Rd
Conducts a Yeo-Johnson transformation on numeric features. It therefore estimates
the optimal value of lambda for the transformation.
See bestNormalize::yeojohnson()
for details.
Format
R6Class
object inheriting from PipeOpTaskPreproc
/PipeOp
.
Construction
id
::character(1)
Identifier of resulting object, default"yeojohnson"
.param_vals
:: namedlist
List of hyperparameter settings, overwriting the hyperparameter settings that would otherwise be set during construction. Defaultlist()
.
Input and Output Channels
Input and output channels are inherited from PipeOpTaskPreproc
.
The output is the input Task
with all affected numeric features replaced by their transformed versions.
State
The $state
is a named list
with the $state
elements inherited from PipeOpTaskPreproc
,
as well as a list of class yeojohnson
for each column, which is transformed.
Parameters
The parameters are the parameters inherited from PipeOpTaskPreproc
, as well as:
eps
::numeric(1)
Tolerance parameter to identify the lambda parameter as zero. For details seeyeojohnson()
.standardize
::logical
Whether to center and scale the transformed values to attempt a standard normal distribution. For details seeyeojohnson()
.lower
::numeric(1)
Lower value for estimation of lambda parameter. For details seeyeojohnson()
.upper
::numeric(1)
Upper value for estimation of lambda parameter. For details seeyeojohnson()
.
Internals
Uses the bestNormalize::yeojohnson
function.
Methods
Only methods inherited from PipeOpTaskPreproc
/PipeOp
.
See also
https://mlr-org.com/pipeops.html
Other PipeOps:
PipeOp
,
PipeOpEnsemble
,
PipeOpImpute
,
PipeOpTargetTrafo
,
PipeOpTaskPreproc
,
PipeOpTaskPreprocSimple
,
mlr_pipeops
,
mlr_pipeops_adas
,
mlr_pipeops_blsmote
,
mlr_pipeops_boxcox
,
mlr_pipeops_branch
,
mlr_pipeops_chunk
,
mlr_pipeops_classbalancing
,
mlr_pipeops_classifavg
,
mlr_pipeops_classweights
,
mlr_pipeops_colapply
,
mlr_pipeops_collapsefactors
,
mlr_pipeops_colroles
,
mlr_pipeops_copy
,
mlr_pipeops_datefeatures
,
mlr_pipeops_encode
,
mlr_pipeops_encodeimpact
,
mlr_pipeops_encodelmer
,
mlr_pipeops_featureunion
,
mlr_pipeops_filter
,
mlr_pipeops_fixfactors
,
mlr_pipeops_histbin
,
mlr_pipeops_ica
,
mlr_pipeops_imputeconstant
,
mlr_pipeops_imputehist
,
mlr_pipeops_imputelearner
,
mlr_pipeops_imputemean
,
mlr_pipeops_imputemedian
,
mlr_pipeops_imputemode
,
mlr_pipeops_imputeoor
,
mlr_pipeops_imputesample
,
mlr_pipeops_kernelpca
,
mlr_pipeops_learner
,
mlr_pipeops_learner_pi_cvplus
,
mlr_pipeops_learner_quantiles
,
mlr_pipeops_missind
,
mlr_pipeops_modelmatrix
,
mlr_pipeops_multiplicityexply
,
mlr_pipeops_multiplicityimply
,
mlr_pipeops_mutate
,
mlr_pipeops_nearmiss
,
mlr_pipeops_nmf
,
mlr_pipeops_nop
,
mlr_pipeops_ovrsplit
,
mlr_pipeops_ovrunite
,
mlr_pipeops_pca
,
mlr_pipeops_proxy
,
mlr_pipeops_quantilebin
,
mlr_pipeops_randomprojection
,
mlr_pipeops_randomresponse
,
mlr_pipeops_regravg
,
mlr_pipeops_removeconstants
,
mlr_pipeops_renamecolumns
,
mlr_pipeops_replicate
,
mlr_pipeops_rowapply
,
mlr_pipeops_scale
,
mlr_pipeops_scalemaxabs
,
mlr_pipeops_scalerange
,
mlr_pipeops_select
,
mlr_pipeops_smote
,
mlr_pipeops_smotenc
,
mlr_pipeops_spatialsign
,
mlr_pipeops_subsample
,
mlr_pipeops_targetinvert
,
mlr_pipeops_targetmutate
,
mlr_pipeops_targettrafoscalerange
,
mlr_pipeops_textvectorizer
,
mlr_pipeops_threshold
,
mlr_pipeops_tomek
,
mlr_pipeops_tunethreshold
,
mlr_pipeops_unbranch
,
mlr_pipeops_updatetarget
,
mlr_pipeops_vtreat
Examples
library("mlr3")
task = tsk("iris")
pop = po("yeojohnson")
task$data()
#> Species Petal.Length Petal.Width Sepal.Length Sepal.Width
#> <fctr> <num> <num> <num> <num>
#> 1: setosa 1.4 0.2 5.1 3.5
#> 2: setosa 1.4 0.2 4.9 3.0
#> 3: setosa 1.3 0.2 4.7 3.2
#> 4: setosa 1.5 0.2 4.6 3.1
#> 5: setosa 1.4 0.2 5.0 3.6
#> ---
#> 146: virginica 5.2 2.3 6.7 3.0
#> 147: virginica 5.0 1.9 6.3 2.5
#> 148: virginica 5.2 2.0 6.5 3.0
#> 149: virginica 5.4 2.3 6.2 3.4
#> 150: virginica 5.1 1.8 5.9 3.0
pop$train(list(task))[[1]]$data()
#> Species Petal.Length Petal.Width Sepal.Length Sepal.Width
#> <fctr> <num> <num> <num> <num>
#> 1: setosa -1.3278574 -1.3278174 -0.8926989 1.01949272
#> 2: setosa -1.3278574 -1.3278174 -1.1812158 -0.08164367
#> 3: setosa -1.3813346 -1.3278174 -1.4829526 0.37387369
#> 4: setosa -1.2741721 -1.3278174 -1.6391179 0.14878429
#> 5: setosa -1.3278574 -1.3278174 -1.0353690 1.22553197
#> ---
#> 146: virginica 0.8157632 1.4105911 1.0393000 -0.08164367
#> 147: virginica 0.6988626 0.9184998 0.6095086 -1.32389503
#> 148: virginica 0.8157632 1.0424849 0.8281903 -0.08164367
#> 149: virginica 0.9330158 1.4105911 0.4971768 0.80900587
#> 150: virginica 0.7572682 0.7938307 0.1474204 -0.08164367
pop$state
#> $bc
#> $bc$Petal.Length
#> Standardized Yeo-Johnson Transformation with 150 nonmissing obs.:
#> Estimated statistics:
#> - lambda = 1.093219
#> - mean (before standardization) = 4.156174
#> - sd (before standardization) = 2.024973
#>
#> $bc$Petal.Width
#> Standardized Yeo-Johnson Transformation with 150 nonmissing obs.:
#> Estimated statistics:
#> - lambda = 0.8404896
#> - mean (before standardization) = 1.098253
#> - sd (before standardization) = 0.6787234
#>
#> $bc$Sepal.Length
#> Standardized Yeo-Johnson Transformation with 150 nonmissing obs.:
#> Estimated statistics:
#> - lambda = -0.3212232
#> - mean (before standardization) = 1.429598
#> - sd (before standardization) = 0.06498438
#>
#> $bc$Sepal.Width
#> Standardized Yeo-Johnson Transformation with 150 nonmissing obs.:
#> Estimated statistics:
#> - lambda = 0.03907448
#> - mean (before standardization) = 1.433769
#> - sd (before standardization) = 0.1131792
#>
#>
#> $dt_columns
#> [1] "Petal.Length" "Petal.Width" "Sepal.Length" "Sepal.Width"
#>
#> $affected_cols
#> [1] "Petal.Length" "Petal.Width" "Sepal.Length" "Sepal.Width"
#>
#> $intasklayout
#> Key: <id>
#> id type
#> <char> <char>
#> 1: Petal.Length numeric
#> 2: Petal.Width numeric
#> 3: Sepal.Length numeric
#> 4: Sepal.Width numeric
#>
#> $outtasklayout
#> Key: <id>
#> id type
#> <char> <char>
#> 1: Petal.Length numeric
#> 2: Petal.Width numeric
#> 3: Sepal.Length numeric
#> 4: Sepal.Width numeric
#>
#> $outtaskshell
#> Empty data.table (0 rows and 5 cols): Species,Petal.Length,Petal.Width,Sepal.Length,Sepal.Width
#>