# Yeo-Johnson Transformation of Numeric Features

Source:`R/PipeOpYeoJohnson.R`

`mlr_pipeops_yeojohnson.Rd`

Conducts a Yeo-Johnson transformation on numeric features. It therefore estimates
the optimal value of lambda for the transformation.
See `bestNormalize::yeojohnson()`

for details.

## Format

`R6Class`

object inheriting from `PipeOpTaskPreproc`

/`PipeOp`

.

## Construction

`id`

::`character(1)`

Identifier of resulting object, default`"yeojohnson"`

.`param_vals`

:: named`list`

List of hyperparameter settings, overwriting the hyperparameter settings that would otherwise be set during construction. Default`list()`

.

## Input and Output Channels

Input and output channels are inherited from `PipeOpTaskPreproc`

.

The output is the input `Task`

with all affected numeric features replaced by their transformed versions.

## State

The `$state`

is a named `list`

with the `$state`

elements inherited from `PipeOpTaskPreproc`

,
as well as a list of class `yeojohnson`

for each column, which is transformed.

## Parameters

The parameters are the parameters inherited from `PipeOpTaskPreproc`

, as well as:

`eps`

::`numeric(1)`

Tolerance parameter to identify the lambda parameter as zero. For details see`yeojohnson()`

.`standardize`

::`logical`

Whether to center and scale the transformed values to attempt a standard normal distribution. For details see`yeojohnson()`

.`lower`

::`numeric(1)`

Lower value for estimation of lambda parameter. For details see`yeojohnson()`

.`upper`

::`numeric(1)`

Upper value for estimation of lambda parameter. For details see`yeojohnson()`

.

## Internals

Uses the `bestNormalize::yeojohnson`

function.

## Methods

Only methods inherited from `PipeOpTaskPreproc`

/`PipeOp`

.

## See also

https://mlr-org.com/pipeops.html

Other PipeOps:
`PipeOp`

,
`PipeOpEnsemble`

,
`PipeOpImpute`

,
`PipeOpTargetTrafo`

,
`PipeOpTaskPreproc`

,
`PipeOpTaskPreprocSimple`

,
`mlr_pipeops`

,
`mlr_pipeops_boxcox`

,
`mlr_pipeops_branch`

,
`mlr_pipeops_chunk`

,
`mlr_pipeops_classbalancing`

,
`mlr_pipeops_classifavg`

,
`mlr_pipeops_classweights`

,
`mlr_pipeops_colapply`

,
`mlr_pipeops_collapsefactors`

,
`mlr_pipeops_colroles`

,
`mlr_pipeops_copy`

,
`mlr_pipeops_datefeatures`

,
`mlr_pipeops_encode`

,
`mlr_pipeops_encodeimpact`

,
`mlr_pipeops_encodelmer`

,
`mlr_pipeops_featureunion`

,
`mlr_pipeops_filter`

,
`mlr_pipeops_fixfactors`

,
`mlr_pipeops_histbin`

,
`mlr_pipeops_ica`

,
`mlr_pipeops_imputeconstant`

,
`mlr_pipeops_imputehist`

,
`mlr_pipeops_imputelearner`

,
`mlr_pipeops_imputemean`

,
`mlr_pipeops_imputemedian`

,
`mlr_pipeops_imputemode`

,
`mlr_pipeops_imputeoor`

,
`mlr_pipeops_imputesample`

,
`mlr_pipeops_kernelpca`

,
`mlr_pipeops_learner`

,
`mlr_pipeops_missind`

,
`mlr_pipeops_modelmatrix`

,
`mlr_pipeops_multiplicityexply`

,
`mlr_pipeops_multiplicityimply`

,
`mlr_pipeops_mutate`

,
`mlr_pipeops_nmf`

,
`mlr_pipeops_nop`

,
`mlr_pipeops_ovrsplit`

,
`mlr_pipeops_ovrunite`

,
`mlr_pipeops_pca`

,
`mlr_pipeops_proxy`

,
`mlr_pipeops_quantilebin`

,
`mlr_pipeops_randomprojection`

,
`mlr_pipeops_randomresponse`

,
`mlr_pipeops_regravg`

,
`mlr_pipeops_removeconstants`

,
`mlr_pipeops_renamecolumns`

,
`mlr_pipeops_replicate`

,
`mlr_pipeops_scale`

,
`mlr_pipeops_scalemaxabs`

,
`mlr_pipeops_scalerange`

,
`mlr_pipeops_select`

,
`mlr_pipeops_smote`

,
`mlr_pipeops_spatialsign`

,
`mlr_pipeops_subsample`

,
`mlr_pipeops_targetinvert`

,
`mlr_pipeops_targetmutate`

,
`mlr_pipeops_targettrafoscalerange`

,
`mlr_pipeops_textvectorizer`

,
`mlr_pipeops_threshold`

,
`mlr_pipeops_tunethreshold`

,
`mlr_pipeops_unbranch`

,
`mlr_pipeops_updatetarget`

,
`mlr_pipeops_vtreat`

## Examples

```
if (requireNamespace("bestNormalize")) {
library("mlr3")
task = tsk("iris")
pop = po("yeojohnson")
task$data()
pop$train(list(task))[[1]]$data()
pop$state
}
#> $bc
#> $bc$Petal.Length
#> Standardized Yeo-Johnson Transformation with 150 nonmissing obs.:
#> Estimated statistics:
#> - lambda = 1.093219
#> - mean (before standardization) = 4.156174
#> - sd (before standardization) = 2.024973
#>
#> $bc$Petal.Width
#> Standardized Yeo-Johnson Transformation with 150 nonmissing obs.:
#> Estimated statistics:
#> - lambda = 0.8404896
#> - mean (before standardization) = 1.098253
#> - sd (before standardization) = 0.6787234
#>
#> $bc$Sepal.Length
#> Standardized Yeo-Johnson Transformation with 150 nonmissing obs.:
#> Estimated statistics:
#> - lambda = -0.3212232
#> - mean (before standardization) = 1.429598
#> - sd (before standardization) = 0.06498438
#>
#> $bc$Sepal.Width
#> Standardized Yeo-Johnson Transformation with 150 nonmissing obs.:
#> Estimated statistics:
#> - lambda = 0.03907448
#> - mean (before standardization) = 1.433769
#> - sd (before standardization) = 0.1131792
#>
#>
#> $dt_columns
#> [1] "Petal.Length" "Petal.Width" "Sepal.Length" "Sepal.Width"
#>
#> $affected_cols
#> [1] "Petal.Length" "Petal.Width" "Sepal.Length" "Sepal.Width"
#>
#> $intasklayout
#> Key: <id>
#> id type
#> <char> <char>
#> 1: Petal.Length numeric
#> 2: Petal.Width numeric
#> 3: Sepal.Length numeric
#> 4: Sepal.Width numeric
#>
#> $outtasklayout
#> Key: <id>
#> id type
#> <char> <char>
#> 1: Petal.Length numeric
#> 2: Petal.Width numeric
#> 3: Sepal.Length numeric
#> 4: Sepal.Width numeric
#>
#> $outtaskshell
#> Empty data.table (0 rows and 5 cols): Species,Petal.Length,Petal.Width,Sepal.Length,Sepal.Width
#>
```