
Yeo-Johnson Transformation of Numeric Features
Source:R/PipeOpYeoJohnson.R
mlr_pipeops_yeojohnson.RdConducts a Yeo-Johnson transformation on numeric features. It therefore estimates
the optimal value of lambda for the transformation.
See bestNormalize::yeojohnson() for details.
Format
R6Class object inheriting from PipeOpTaskPreproc/PipeOp.
Construction
id::character(1)
Identifier of resulting object, default"yeojohnson".param_vals:: namedlist
List of hyperparameter settings, overwriting the hyperparameter settings that would otherwise be set during construction. Defaultlist().
Input and Output Channels
Input and output channels are inherited from PipeOpTaskPreproc.
The output is the input Task with all affected numeric features replaced by their transformed versions.
State
The $state is a named list with the $state elements inherited from PipeOpTaskPreproc,
as well as a list of class yeojohnson for each column, which is transformed.
Parameters
The parameters are the parameters inherited from PipeOpTaskPreproc, as well as:
eps::numeric(1)
Tolerance parameter to identify the lambda parameter as zero. For details seeyeojohnson().standardize::logical
Whether to center and scale the transformed values to attempt a standard normal distribution. For details seeyeojohnson().lower::numeric(1)
Lower value for estimation of lambda parameter. For details seeyeojohnson().upper::numeric(1)
Upper value for estimation of lambda parameter. For details seeyeojohnson().
Internals
Uses the bestNormalize::yeojohnson function.
Fields
Only fields inherited from PipeOp.
Methods
Only methods inherited from PipeOpTaskPreproc/PipeOp.
See also
https://mlr-org.com/pipeops.html
Other PipeOps:
PipeOp,
PipeOpEncodePL,
PipeOpEnsemble,
PipeOpImpute,
PipeOpTargetTrafo,
PipeOpTaskPreproc,
PipeOpTaskPreprocSimple,
mlr_pipeops,
mlr_pipeops_adas,
mlr_pipeops_blsmote,
mlr_pipeops_boxcox,
mlr_pipeops_branch,
mlr_pipeops_chunk,
mlr_pipeops_classbalancing,
mlr_pipeops_classifavg,
mlr_pipeops_classweights,
mlr_pipeops_colapply,
mlr_pipeops_collapsefactors,
mlr_pipeops_colroles,
mlr_pipeops_copy,
mlr_pipeops_datefeatures,
mlr_pipeops_decode,
mlr_pipeops_encode,
mlr_pipeops_encodeimpact,
mlr_pipeops_encodelmer,
mlr_pipeops_encodeplquantiles,
mlr_pipeops_encodepltree,
mlr_pipeops_featureunion,
mlr_pipeops_filter,
mlr_pipeops_fixfactors,
mlr_pipeops_histbin,
mlr_pipeops_ica,
mlr_pipeops_imputeconstant,
mlr_pipeops_imputehist,
mlr_pipeops_imputelearner,
mlr_pipeops_imputemean,
mlr_pipeops_imputemedian,
mlr_pipeops_imputemode,
mlr_pipeops_imputeoor,
mlr_pipeops_imputesample,
mlr_pipeops_kernelpca,
mlr_pipeops_learner,
mlr_pipeops_learner_pi_cvplus,
mlr_pipeops_learner_quantiles,
mlr_pipeops_missind,
mlr_pipeops_modelmatrix,
mlr_pipeops_multiplicityexply,
mlr_pipeops_multiplicityimply,
mlr_pipeops_mutate,
mlr_pipeops_nearmiss,
mlr_pipeops_nmf,
mlr_pipeops_nop,
mlr_pipeops_ovrsplit,
mlr_pipeops_ovrunite,
mlr_pipeops_pca,
mlr_pipeops_proxy,
mlr_pipeops_quantilebin,
mlr_pipeops_randomprojection,
mlr_pipeops_randomresponse,
mlr_pipeops_regravg,
mlr_pipeops_removeconstants,
mlr_pipeops_renamecolumns,
mlr_pipeops_replicate,
mlr_pipeops_rowapply,
mlr_pipeops_scale,
mlr_pipeops_scalemaxabs,
mlr_pipeops_scalerange,
mlr_pipeops_select,
mlr_pipeops_smote,
mlr_pipeops_smotenc,
mlr_pipeops_spatialsign,
mlr_pipeops_subsample,
mlr_pipeops_targetinvert,
mlr_pipeops_targetmutate,
mlr_pipeops_targettrafoscalerange,
mlr_pipeops_textvectorizer,
mlr_pipeops_threshold,
mlr_pipeops_tomek,
mlr_pipeops_tunethreshold,
mlr_pipeops_unbranch,
mlr_pipeops_updatetarget,
mlr_pipeops_vtreat
Examples
library("mlr3")
task = tsk("iris")
pop = po("yeojohnson")
task$data()
#> Species Petal.Length Petal.Width Sepal.Length Sepal.Width
#> <fctr> <num> <num> <num> <num>
#> 1: setosa 1.4 0.2 5.1 3.5
#> 2: setosa 1.4 0.2 4.9 3.0
#> 3: setosa 1.3 0.2 4.7 3.2
#> 4: setosa 1.5 0.2 4.6 3.1
#> 5: setosa 1.4 0.2 5.0 3.6
#> ---
#> 146: virginica 5.2 2.3 6.7 3.0
#> 147: virginica 5.0 1.9 6.3 2.5
#> 148: virginica 5.2 2.0 6.5 3.0
#> 149: virginica 5.4 2.3 6.2 3.4
#> 150: virginica 5.1 1.8 5.9 3.0
pop$train(list(task))[[1]]$data()
#> Species Petal.Length Petal.Width Sepal.Length Sepal.Width
#> <fctr> <num> <num> <num> <num>
#> 1: setosa -1.3278574 -1.3278174 -0.8926989 1.01949272
#> 2: setosa -1.3278574 -1.3278174 -1.1812158 -0.08164367
#> 3: setosa -1.3813346 -1.3278174 -1.4829526 0.37387369
#> 4: setosa -1.2741721 -1.3278174 -1.6391179 0.14878429
#> 5: setosa -1.3278574 -1.3278174 -1.0353690 1.22553197
#> ---
#> 146: virginica 0.8157632 1.4105911 1.0393000 -0.08164367
#> 147: virginica 0.6988626 0.9184998 0.6095086 -1.32389503
#> 148: virginica 0.8157632 1.0424849 0.8281903 -0.08164367
#> 149: virginica 0.9330158 1.4105911 0.4971768 0.80900587
#> 150: virginica 0.7572682 0.7938307 0.1474204 -0.08164367
pop$state
#> $bc
#> $bc$Petal.Length
#> Standardized Yeo-Johnson Transformation with 150 nonmissing obs.:
#> Estimated statistics:
#> - lambda = 1.093219
#> - mean (before standardization) = 4.156174
#> - sd (before standardization) = 2.024973
#>
#> $bc$Petal.Width
#> Standardized Yeo-Johnson Transformation with 150 nonmissing obs.:
#> Estimated statistics:
#> - lambda = 0.8404896
#> - mean (before standardization) = 1.098253
#> - sd (before standardization) = 0.6787234
#>
#> $bc$Sepal.Length
#> Standardized Yeo-Johnson Transformation with 150 nonmissing obs.:
#> Estimated statistics:
#> - lambda = -0.3212232
#> - mean (before standardization) = 1.429598
#> - sd (before standardization) = 0.06498438
#>
#> $bc$Sepal.Width
#> Standardized Yeo-Johnson Transformation with 150 nonmissing obs.:
#> Estimated statistics:
#> - lambda = 0.03907448
#> - mean (before standardization) = 1.433769
#> - sd (before standardization) = 0.1131792
#>
#>
#> $dt_columns
#> [1] "Petal.Length" "Petal.Width" "Sepal.Length" "Sepal.Width"
#>
#> $affected_cols
#> [1] "Petal.Length" "Petal.Width" "Sepal.Length" "Sepal.Width"
#>
#> $intasklayout
#> Key: <id>
#> id type
#> <char> <char>
#> 1: Petal.Length numeric
#> 2: Petal.Width numeric
#> 3: Sepal.Length numeric
#> 4: Sepal.Width numeric
#>
#> $outtasklayout
#> Key: <id>
#> id type
#> <char> <char>
#> 1: Petal.Length numeric
#> 2: Petal.Width numeric
#> 3: Sepal.Length numeric
#> 4: Sepal.Width numeric
#>
#> $outtaskshell
#> Empty data.table (0 rows and 5 cols): Species,Petal.Length,Petal.Width,Sepal.Length,Sepal.Width
#>