Encodes columns of type `factor`

, `character`

and `ordered`

.

Possible encodings are `"one-hot"`

encoding, as well as encoding according to `stats::contr.helmert()`

, `stats::contr.poly()`

,
`stats::contr.sum()`

and `stats::contr.treatment()`

.
Newly created columns are named via pattern `[column-name].[x]`

where `x`

is the respective factor level for `"one-hot"`

and
`"treatment"`

encoding, and an integer sequence otherwise.

Use the `PipeOpTaskPreproc`

`$affect_columns`

functionality to only encode a subset of columns, or only encode columns of a certain type.

`R6Class`

object inheriting from `PipeOpTaskPreprocSimple`

/`PipeOpTaskPreproc`

/`PipeOp`

.

PipeOpEncode$new(id = "encode", param_vals = list())

`id`

::`character(1)`

Identifier of resulting object, default`"encode"`

.`param_vals`

:: named`list`

List of hyperparameter settings, overwriting the hyperparameter settings that would otherwise be set during construction. Default`list()`

.

Input and output channels are inherited from `PipeOpTaskPreproc`

.

The output is the input `Task`

with all affected `factor`

, `character`

or `ordered`

parameters encoded according to the `method`

parameter.

The `$state`

is a named `list`

with the `$state`

elements inherited from `PipeOpTaskPreproc`

, as well as:

`constrasts`

:: named`list`

of`matrix`

List of contrast matrices, one for each affected discrete feature. The rows of each matrix correspond to (training task) levels, the the columns to the new columns that replace the old discrete feature. See`stats::contrasts`

.

The parameters are the parameters inherited from `PipeOpTaskPreproc`

, as well as:

`method`

::`character(1)`

Initialized to`"one-hot"`

. One of:`"one-hot"`

: create a new column for each factor level.`"treatment"`

: create \(n-1\) columns leaving out the first factor level of each factor variable (see`stats::contr.treatment()`

).`"helmert"`

: create columns according to Helmert contrasts (see`stats::contr.helmert()`

).`"poly"`

: create columns with contrasts based on orthogonal polynomials (see`stats::contr.poly()`

).`"sum"`

: create columns with contrasts summing to zero, (see`stats::contr.sum()`

).

Uses the `stats::contrasts`

functions. This is relatively inefficient for features with a large number of levels.

Only methods inherited from `PipeOpTaskPreprocSimple`

/`PipeOpTaskPreproc`

/`PipeOp`

.

https://mlr3book.mlr-org.com/list-pipeops.html

Other PipeOps:
`PipeOpEnsemble`

,
`PipeOpImpute`

,
`PipeOpTargetTrafo`

,
`PipeOpTaskPreprocSimple`

,
`PipeOpTaskPreproc`

,
`PipeOp`

,
`mlr_pipeops_boxcox`

,
`mlr_pipeops_branch`

,
`mlr_pipeops_chunk`

,
`mlr_pipeops_classbalancing`

,
`mlr_pipeops_classifavg`

,
`mlr_pipeops_classweights`

,
`mlr_pipeops_colapply`

,
`mlr_pipeops_collapsefactors`

,
`mlr_pipeops_colroles`

,
`mlr_pipeops_copy`

,
`mlr_pipeops_datefeatures`

,
`mlr_pipeops_encodeimpact`

,
`mlr_pipeops_encodelmer`

,
`mlr_pipeops_featureunion`

,
`mlr_pipeops_filter`

,
`mlr_pipeops_fixfactors`

,
`mlr_pipeops_histbin`

,
`mlr_pipeops_ica`

,
`mlr_pipeops_imputeconstant`

,
`mlr_pipeops_imputehist`

,
`mlr_pipeops_imputelearner`

,
`mlr_pipeops_imputemean`

,
`mlr_pipeops_imputemedian`

,
`mlr_pipeops_imputemode`

,
`mlr_pipeops_imputeoor`

,
`mlr_pipeops_imputesample`

,
`mlr_pipeops_kernelpca`

,
`mlr_pipeops_learner`

,
`mlr_pipeops_missind`

,
`mlr_pipeops_modelmatrix`

,
`mlr_pipeops_multiplicityexply`

,
`mlr_pipeops_multiplicityimply`

,
`mlr_pipeops_mutate`

,
`mlr_pipeops_nmf`

,
`mlr_pipeops_nop`

,
`mlr_pipeops_ovrsplit`

,
`mlr_pipeops_ovrunite`

,
`mlr_pipeops_pca`

,
`mlr_pipeops_proxy`

,
`mlr_pipeops_quantilebin`

,
`mlr_pipeops_randomprojection`

,
`mlr_pipeops_randomresponse`

,
`mlr_pipeops_regravg`

,
`mlr_pipeops_removeconstants`

,
`mlr_pipeops_renamecolumns`

,
`mlr_pipeops_replicate`

,
`mlr_pipeops_scalemaxabs`

,
`mlr_pipeops_scalerange`

,
`mlr_pipeops_scale`

,
`mlr_pipeops_select`

,
`mlr_pipeops_smote`

,
`mlr_pipeops_spatialsign`

,
`mlr_pipeops_subsample`

,
`mlr_pipeops_targetinvert`

,
`mlr_pipeops_targetmutate`

,
`mlr_pipeops_targettrafoscalerange`

,
`mlr_pipeops_textvectorizer`

,
`mlr_pipeops_threshold`

,
`mlr_pipeops_tunethreshold`

,
`mlr_pipeops_unbranch`

,
`mlr_pipeops_updatetarget`

,
`mlr_pipeops_vtreat`

,
`mlr_pipeops_yeojohnson`

,
`mlr_pipeops`

library("mlr3") data = data.table::data.table(x = factor(letters[1:3]), y = factor(letters[1:3])) task = TaskClassif$new("task", data, "x") poe = po("encode") # poe is initialized with encoding: "one-hot" poe$train(list(task))[[1]]$data() #> x y.a y.b y.c #> 1: a 1 0 0 #> 2: b 0 1 0 #> 3: c 0 0 1 # other kinds of encoding: poe$param_set$values$method = "treatment" poe$train(list(task))[[1]]$data() #> x y.b y.c #> 1: a 0 0 #> 2: b 1 0 #> 3: c 0 1 poe$param_set$values$method = "helmert" poe$train(list(task))[[1]]$data() #> x y.1 y.2 #> 1: a -1 -1 #> 2: b 1 -1 #> 3: c 0 2 poe$param_set$values$method = "poly" poe$train(list(task))[[1]]$data() #> x y.1 y.2 #> 1: a -7.071068e-01 0.4082483 #> 2: b -7.850462e-17 -0.8164966 #> 3: c 7.071068e-01 0.4082483 poe$param_set$values$method = "sum" poe$train(list(task))[[1]]$data() #> x y.1 y.2 #> 1: a 1 0 #> 2: b 0 1 #> 3: c -1 -1