Yeo-Johnson Transformation of Numeric Features

Conducts a Yeo-Johnson transformation on numeric features. It therefore estimates the optimal value of lambda for the transformation. See bestNormalize::yeojohnson() for details.

Format

R6Class object inheriting from PipeOpTaskPreproc/PipeOp.

Construction

PipeOpYeoJohnson$new(id = "yeojohnson", param_vals = list())

id :: character(1)
Identifier of resulting object, default "yeojohnson".
param_vals :: named list
List of hyperparameter settings, overwriting the hyperparameter settings that would otherwise be set during construction. Default list().

Input and Output Channels

Input and output channels are inherited from PipeOpTaskPreproc.

The output is the input Task with all affected numeric features replaced by their transformed versions.

State

The $state is a named list with the $state elements inherited from PipeOpTaskPreproc, as well as a list of class yeojohnson for each column, which is transformed.

Parameters

The parameters are the parameters inherited from PipeOpTaskPreproc, as well as:

eps :: numeric(1)
Tolerance parameter to identify the lambda parameter as zero. For details see yeojohnson().
standardize :: logical
Whether to center and scale the transformed values to attempt a standard normal distribution. For details see yeojohnson().
lower :: numeric(1)
Lower value for estimation of lambda parameter. For details see yeojohnson().
upper :: numeric(1)
Upper value for estimation of lambda parameter. For details see yeojohnson().

Internals

Uses the bestNormalize::yeojohnson function.

Fields

Only fields inherited from PipeOp.

Methods

Only methods inherited from PipeOpTaskPreproc/PipeOp.

Examples

library("mlr3")

task = tsk("iris")
pop = po("yeojohnson")

task$data()
#>        Species Petal.Length Petal.Width Sepal.Length Sepal.Width
#>         <fctr>        <num>       <num>        <num>       <num>
#>   1:    setosa          1.4         0.2          5.1         3.5
#>   2:    setosa          1.4         0.2          4.9         3.0
#>   3:    setosa          1.3         0.2          4.7         3.2
#>   4:    setosa          1.5         0.2          4.6         3.1
#>   5:    setosa          1.4         0.2          5.0         3.6
#>  ---                                                            
#> 146: virginica          5.2         2.3          6.7         3.0
#> 147: virginica          5.0         1.9          6.3         2.5
#> 148: virginica          5.2         2.0          6.5         3.0
#> 149: virginica          5.4         2.3          6.2         3.4
#> 150: virginica          5.1         1.8          5.9         3.0
pop$train(list(task))[[1]]$data()
#>        Species Petal.Length Petal.Width Sepal.Length Sepal.Width
#>         <fctr>        <num>       <num>        <num>       <num>
#>   1:    setosa   -1.3278574  -1.3278174   -0.8926989  1.01949272
#>   2:    setosa   -1.3278574  -1.3278174   -1.1812158 -0.08164367
#>   3:    setosa   -1.3813346  -1.3278174   -1.4829526  0.37387369
#>   4:    setosa   -1.2741721  -1.3278174   -1.6391179  0.14878429
#>   5:    setosa   -1.3278574  -1.3278174   -1.0353690  1.22553197
#>  ---                                                            
#> 146: virginica    0.8157632   1.4105911    1.0393000 -0.08164367
#> 147: virginica    0.6988626   0.9184998    0.6095086 -1.32389503
#> 148: virginica    0.8157632   1.0424849    0.8281903 -0.08164367
#> 149: virginica    0.9330158   1.4105911    0.4971768  0.80900587
#> 150: virginica    0.7572682   0.7938307    0.1474204 -0.08164367

pop$state
#> $bc
#> $bc$Petal.Length
#> Standardized Yeo-Johnson Transformation with 150 nonmissing obs.:
#>  Estimated statistics:
#>  - lambda = 1.093219 
#>  - mean (before standardization) = 4.156174 
#>  - sd (before standardization) = 2.024973 
#> 
#> $bc$Petal.Width
#> Standardized Yeo-Johnson Transformation with 150 nonmissing obs.:
#>  Estimated statistics:
#>  - lambda = 0.8404896 
#>  - mean (before standardization) = 1.098253 
#>  - sd (before standardization) = 0.6787234 
#> 
#> $bc$Sepal.Length
#> Standardized Yeo-Johnson Transformation with 150 nonmissing obs.:
#>  Estimated statistics:
#>  - lambda = -0.3212232 
#>  - mean (before standardization) = 1.429598 
#>  - sd (before standardization) = 0.06498438 
#> 
#> $bc$Sepal.Width
#> Standardized Yeo-Johnson Transformation with 150 nonmissing obs.:
#>  Estimated statistics:
#>  - lambda = 0.03907448 
#>  - mean (before standardization) = 1.433769 
#>  - sd (before standardization) = 0.1131792 
#> 
#> 
#> $dt_columns
#> [1] "Petal.Length" "Petal.Width"  "Sepal.Length" "Sepal.Width" 
#> 
#> $affected_cols
#> [1] "Petal.Length" "Petal.Width"  "Sepal.Length" "Sepal.Width" 
#> 
#> $intasklayout
#> Key: <id>
#>              id    type
#>          <char>  <char>
#> 1: Petal.Length numeric
#> 2:  Petal.Width numeric
#> 3: Sepal.Length numeric
#> 4:  Sepal.Width numeric
#> 
#> $outtasklayout
#> Key: <id>
#>              id    type
#>          <char>  <char>
#> 1: Petal.Length numeric
#> 2:  Petal.Width numeric
#> 3: Sepal.Length numeric
#> 4:  Sepal.Width numeric
#> 
#> $outtaskshell
#> Empty data.table (0 rows and 5 cols): Species,Petal.Length,Petal.Width,Sepal.Length,Sepal.Width
#>