Encodes columns of type factor
, character
and ordered
.
PipeOpEncodeLmer() converts factor levels of each factorial column to the
estimated coefficients of a simple random intercept model.
Models are fitted with the glmer function of the lme4 package and are
of the type target ~ 1 + (1 | factor)
.
If the task is a regression task, the numeric target
variable is used as dependent variable and the factor is used for grouping.
If the task is a classification task, the target variable is used as dependent variable
and the factor is used for grouping.
If the target variable is multiclass, for each level of the multiclass target variable,
binary "one vs. rest" models are fitted.
For training, multiple models can be estimated in a cross-validation scheme to ensure that the same factor level does not always result in identical values in the converted numerical feature. For prediction, a global model (which was fitted on all observations during training) is used for each factor. New factor levels are converted to the value of the intercept coefficient of the global model for prediction. NAs are ignored by the CPO.
Use the PipeOpTaskPreproc
$affect_columns
functionality to only encode a subset of
columns, or only encode columns of a certain type.
Format
R6Class
object inheriting from PipeOpTaskPreprocSimple
/PipeOpTaskPreproc
/PipeOp
.
Construction
id
::character(1)
Identifier of resulting object, default"encodelmer"
.param_vals
:: namedlist
List of hyperparameter settings, overwriting the hyperparameter settings that would otherwise be set during construction. Defaultlist()
.
Input and Output Channels
Input and output channels are inherited from PipeOpTaskPreproc
.
The output is the input Task
with all affected factor
, character
or
ordered
parameters encoded according to the method
parameter.
State
The $state
is a named list
with the $state
elements inherited from PipeOpTaskPreproc
, as well as:
target_levels
::character
Levels of the target columns.control
:: a namedlist
List of coefficients learned viaglmer
Parameters
fast_optim
::logical(1)
Initialized toTRUE
. If "fast_optim" isTRUE
(default), a faster (up to 50 percent) optimizer from the nloptr package is used when fitting the lmer models. This uses additional stopping criteria which can give suboptimal results.
Internals
Uses the lme4::glmer
. This is relatively inefficient for features with a large number of levels.
Methods
Only methods inherited PipeOpTaskPreproc
/PipeOp
.
See also
https://mlr-org.com/pipeops.html
Other PipeOps:
PipeOp
,
PipeOpEnsemble
,
PipeOpImpute
,
PipeOpTargetTrafo
,
PipeOpTaskPreproc
,
PipeOpTaskPreprocSimple
,
mlr_pipeops
,
mlr_pipeops_adas
,
mlr_pipeops_blsmote
,
mlr_pipeops_boxcox
,
mlr_pipeops_branch
,
mlr_pipeops_chunk
,
mlr_pipeops_classbalancing
,
mlr_pipeops_classifavg
,
mlr_pipeops_classweights
,
mlr_pipeops_colapply
,
mlr_pipeops_collapsefactors
,
mlr_pipeops_colroles
,
mlr_pipeops_copy
,
mlr_pipeops_datefeatures
,
mlr_pipeops_encode
,
mlr_pipeops_encodeimpact
,
mlr_pipeops_featureunion
,
mlr_pipeops_filter
,
mlr_pipeops_fixfactors
,
mlr_pipeops_histbin
,
mlr_pipeops_ica
,
mlr_pipeops_imputeconstant
,
mlr_pipeops_imputehist
,
mlr_pipeops_imputelearner
,
mlr_pipeops_imputemean
,
mlr_pipeops_imputemedian
,
mlr_pipeops_imputemode
,
mlr_pipeops_imputeoor
,
mlr_pipeops_imputesample
,
mlr_pipeops_kernelpca
,
mlr_pipeops_learner
,
mlr_pipeops_learner_pi_cvplus
,
mlr_pipeops_learner_quantiles
,
mlr_pipeops_missind
,
mlr_pipeops_modelmatrix
,
mlr_pipeops_multiplicityexply
,
mlr_pipeops_multiplicityimply
,
mlr_pipeops_mutate
,
mlr_pipeops_nearmiss
,
mlr_pipeops_nmf
,
mlr_pipeops_nop
,
mlr_pipeops_ovrsplit
,
mlr_pipeops_ovrunite
,
mlr_pipeops_pca
,
mlr_pipeops_proxy
,
mlr_pipeops_quantilebin
,
mlr_pipeops_randomprojection
,
mlr_pipeops_randomresponse
,
mlr_pipeops_regravg
,
mlr_pipeops_removeconstants
,
mlr_pipeops_renamecolumns
,
mlr_pipeops_replicate
,
mlr_pipeops_rowapply
,
mlr_pipeops_scale
,
mlr_pipeops_scalemaxabs
,
mlr_pipeops_scalerange
,
mlr_pipeops_select
,
mlr_pipeops_smote
,
mlr_pipeops_smotenc
,
mlr_pipeops_spatialsign
,
mlr_pipeops_subsample
,
mlr_pipeops_targetinvert
,
mlr_pipeops_targetmutate
,
mlr_pipeops_targettrafoscalerange
,
mlr_pipeops_textvectorizer
,
mlr_pipeops_threshold
,
mlr_pipeops_tomek
,
mlr_pipeops_tunethreshold
,
mlr_pipeops_unbranch
,
mlr_pipeops_updatetarget
,
mlr_pipeops_vtreat
,
mlr_pipeops_yeojohnson
Examples
library("mlr3")
poe = po("encodelmer")
task = TaskClassif$new("task",
data.table::data.table(
x = factor(c("a", "a", "a", "b", "b")),
y = factor(c("a", "a", "b", "b", "b"))),
"x")
poe$train(list(task))[[1]]$data()
#> x y
#> <fctr> <num>
#> 1: a -0.5525584
#> 2: a -0.5525584
#> 3: a -0.3310264
#> 4: b -0.3310264
#> 5: b -0.3310264
poe$state
#> $target_levels
#> [1] "a" "b"
#>
#> $control
#> $control$y
#> a b ..new..level..
#> -0.5525584 -0.3310264 -0.4429541
#>
#>
#> $dt_columns
#> [1] "y"
#>
#> $affected_cols
#> [1] "y"
#>
#> $intasklayout
#> Key: <id>
#> id type
#> <char> <char>
#> 1: y factor
#>
#> $outtasklayout
#> Key: <id>
#> id type
#> <char> <char>
#> 1: y numeric
#>
#> $outtaskshell
#> Empty data.table (0 rows and 2 cols): x,y
#>