Skip to contents

Conducts a Box-Cox transformation on numeric features. The lambda parameter of the transformation is estimated during training and used for both training and prediction transformation. See bestNormalize::boxcox() for details.

Format

R6Class object inheriting from PipeOpTaskPreproc/PipeOp.

Construction

PipeOpBoxCox$new(id = "boxcox", param_vals = list())

  • id :: character(1)
    Identifier of resulting object, default "boxcox".

  • param_vals :: named list
    List of hyperparameter settings, overwriting the hyperparameter settings that would otherwise be set during construction. Default list().

Input and Output Channels

Input and output channels are inherited from PipeOpTaskPreproc.

The output is the input Task with all affected numeric features replaced by their transformed versions.

State

The $state is a named list with the $state elements inherited from PipeOpTaskPreproc, as well as a list of class boxcox for each column, which is transformed.

Parameters

The parameters are the parameters inherited from PipeOpTaskPreproc, as well as:

  • standardize :: logical(1)
    Whether to center and scale the transformed values to attempt a standard normal distribution. For details see boxcox().

  • eps :: numeric(1)
    Tolerance parameter to identify if lambda parameter is equal to zero. For details see boxcox().

  • lower :: numeric(1)
    Lower value for estimation of lambda parameter. For details see boxcox().

  • upper :: numeric(1)
    Upper value for estimation of lambda parameter. For details see boxcox().

Internals

Uses the bestNormalize::boxcox function.

Methods

Only methods inherited from PipeOpTaskPreproc/PipeOp.

See also

https://mlr-org.com/pipeops.html

Other PipeOps: PipeOpEnsemble, PipeOpImpute, PipeOpTargetTrafo, PipeOpTaskPreprocSimple, PipeOpTaskPreproc, PipeOp, mlr_pipeops_branch, mlr_pipeops_chunk, mlr_pipeops_classbalancing, mlr_pipeops_classifavg, mlr_pipeops_classweights, mlr_pipeops_colapply, mlr_pipeops_collapsefactors, mlr_pipeops_colroles, mlr_pipeops_copy, mlr_pipeops_datefeatures, mlr_pipeops_encodeimpact, mlr_pipeops_encodelmer, mlr_pipeops_encode, mlr_pipeops_featureunion, mlr_pipeops_filter, mlr_pipeops_fixfactors, mlr_pipeops_histbin, mlr_pipeops_ica, mlr_pipeops_imputeconstant, mlr_pipeops_imputehist, mlr_pipeops_imputelearner, mlr_pipeops_imputemean, mlr_pipeops_imputemedian, mlr_pipeops_imputemode, mlr_pipeops_imputeoor, mlr_pipeops_imputesample, mlr_pipeops_kernelpca, mlr_pipeops_learner, mlr_pipeops_missind, mlr_pipeops_modelmatrix, mlr_pipeops_multiplicityexply, mlr_pipeops_multiplicityimply, mlr_pipeops_mutate, mlr_pipeops_nmf, mlr_pipeops_nop, mlr_pipeops_ovrsplit, mlr_pipeops_ovrunite, mlr_pipeops_pca, mlr_pipeops_proxy, mlr_pipeops_quantilebin, mlr_pipeops_randomprojection, mlr_pipeops_randomresponse, mlr_pipeops_regravg, mlr_pipeops_removeconstants, mlr_pipeops_renamecolumns, mlr_pipeops_replicate, mlr_pipeops_scalemaxabs, mlr_pipeops_scalerange, mlr_pipeops_scale, mlr_pipeops_select, mlr_pipeops_smote, mlr_pipeops_spatialsign, mlr_pipeops_subsample, mlr_pipeops_targetinvert, mlr_pipeops_targetmutate, mlr_pipeops_targettrafoscalerange, mlr_pipeops_textvectorizer, mlr_pipeops_threshold, mlr_pipeops_tunethreshold, mlr_pipeops_unbranch, mlr_pipeops_updatetarget, mlr_pipeops_vtreat, mlr_pipeops_yeojohnson, mlr_pipeops

Examples

library("mlr3")

task = tsk("iris")
pop = po("boxcox")

task$data()
#>        Species Petal.Length Petal.Width Sepal.Length Sepal.Width
#>         <fctr>        <num>       <num>        <num>       <num>
#>   1:    setosa          1.4         0.2          5.1         3.5
#>   2:    setosa          1.4         0.2          4.9         3.0
#>   3:    setosa          1.3         0.2          4.7         3.2
#>   4:    setosa          1.5         0.2          4.6         3.1
#>   5:    setosa          1.4         0.2          5.0         3.6
#>  ---                                                            
#> 146: virginica          5.2         2.3          6.7         3.0
#> 147: virginica          5.0         1.9          6.3         2.5
#> 148: virginica          5.2         2.0          6.5         3.0
#> 149: virginica          5.4         2.3          6.2         3.4
#> 150: virginica          5.1         1.8          5.9         3.0
pop$train(list(task))[[1]]$data()
#>        Species Petal.Length Petal.Width Sepal.Length Sepal.Width
#>         <fctr>        <num>       <num>        <num>       <num>
#>   1:    setosa   -1.3431567  -1.3850773   -0.8917547  1.01831791
#>   2:    setosa   -1.3431567  -1.3850773   -1.1812229 -0.08167295
#>   3:    setosa   -1.4033413  -1.3850773   -1.4845435  0.37307046
#>   4:    setosa   -1.2832670  -1.3850773   -1.6417967  0.14833599
#>   5:    setosa   -1.3431567  -1.3850773   -1.0348319  1.22454068
#>  ---                                                            
#> 146: virginica    0.8174171   1.2930924    1.0385560 -0.08167295
#> 147: virginica    0.7075555   0.9020852    0.6097200 -1.32264877
#> 148: virginica    0.8174171   1.0023867    0.8279148 -0.08167295
#> 149: virginica    0.9269887   1.2930924    0.4976284  0.80781419
#> 150: virginica    0.7625234   0.7998822    0.1485189 -0.08167295

pop$state
#> $bc
#> $bc$Petal.Length
#> Standardized Box Cox Transformation with 150 nonmissing obs.:
#>  Estimated statistics:
#>  - lambda = 0.931286 
#>  - mean (before standardization) = 2.58137 
#>  - sd (before standardization) = 1.627669 
#> 
#> $bc$Petal.Width
#> Standardized Box Cox Transformation with 150 nonmissing obs.:
#>  Estimated statistics:
#>  - lambda = 0.6433629 
#>  - mean (before standardization) = 0.08586719 
#>  - sd (before standardization) = 0.7857394 
#> 
#> $bc$Sepal.Length
#> Standardized Box Cox Transformation with 150 nonmissing obs.:
#>  Estimated statistics:
#>  - lambda = -0.144751 
#>  - mean (before standardization) = 1.549011 
#>  - sd (before standardization) = 0.1094848 
#> 
#> $bc$Sepal.Width
#> Standardized Box Cox Transformation with 150 nonmissing obs.:
#>  Estimated statistics:
#>  - lambda = 0.2810121 
#>  - mean (before standardization) = 1.30301 
#>  - sd (before standardization) = 0.1950175 
#> 
#> 
#> $dt_columns
#> [1] "Petal.Length" "Petal.Width"  "Sepal.Length" "Sepal.Width" 
#> 
#> $affected_cols
#> [1] "Petal.Length" "Petal.Width"  "Sepal.Length" "Sepal.Width" 
#> 
#> $intasklayout
#> Key: <id>
#>              id    type
#>          <char>  <char>
#> 1: Petal.Length numeric
#> 2:  Petal.Width numeric
#> 3: Sepal.Length numeric
#> 4:  Sepal.Width numeric
#> 
#> $outtasklayout
#> Key: <id>
#>              id    type
#>          <char>  <char>
#> 1: Petal.Length numeric
#> 2:  Petal.Width numeric
#> 3: Sepal.Length numeric
#> 4:  Sepal.Width numeric
#> 
#> $outtaskshell
#> Empty data.table (0 rows and 5 cols): Species,Petal.Length,Petal.Width,Sepal.Length,Sepal.Width
#>