Encodes columns of type factor
, character
and ordered
.
Impact coding for classification Tasks converts factor levels of each (factorial) column to the difference between each target level's conditional log-likelihood given this level, and the target level's global log-likelihood.
Impact coding for regression Tasks converts factor levels of each (factorial) column to the difference between the target's conditional mean given this level, and the target's global mean.
Treats new levels during prediction like missing values.
Format
R6Class
object inheriting from PipeOpTaskPreprocSimple
/PipeOpTaskPreproc
/PipeOp
.
Construction
id
::character(1)
Identifier of resulting object, default"encodeimpact"
.param_vals
:: namedlist
List of hyperparameter settings, overwriting the hyperparameter settings that would otherwise be set during construction. Defaultlist()
.
Input and Output Channels
Input and output channels are inherited from PipeOpTaskPreproc
.
The output is the input Task
with all affected factor
, character
or
ordered
parameters encoded.
State
The $state
is a named list
with the $state
elements inherited from PipeOpTaskPreproc
, as well as:
impact
:: a namedlist
A list with an element for each affected feature:
For regression each element is a single column matrix of impact values for each level of that feature.
For classification, it is a list with an element for each feature level, which is a vector giving the impact of this feature level on each outcome level.
Parameters
smoothing
::numeric(1)
A finite positive value used for smoothing. Mostly relevant for classification Tasks if a factor does not coincide with a target factor level (and would otherwise give an infinite logit value). Initialized to1e-4
.impute_zero
::logical(1)
IfTRUE
, impute missing values as impact 0; otherwise the respective impact is coded asNA
. DefaultFALSE
.
Internals
Uses Laplace smoothing, mostly to avoid infinite values for classification Task.
Methods
Only methods inherited PipeOpTaskPreproc
/PipeOp
.
See also
https://mlr-org.com/pipeops.html
Other PipeOps:
PipeOp
,
PipeOpEnsemble
,
PipeOpImpute
,
PipeOpTargetTrafo
,
PipeOpTaskPreproc
,
PipeOpTaskPreprocSimple
,
mlr_pipeops
,
mlr_pipeops_adas
,
mlr_pipeops_blsmote
,
mlr_pipeops_boxcox
,
mlr_pipeops_branch
,
mlr_pipeops_chunk
,
mlr_pipeops_classbalancing
,
mlr_pipeops_classifavg
,
mlr_pipeops_classweights
,
mlr_pipeops_colapply
,
mlr_pipeops_collapsefactors
,
mlr_pipeops_colroles
,
mlr_pipeops_copy
,
mlr_pipeops_datefeatures
,
mlr_pipeops_encode
,
mlr_pipeops_encodelmer
,
mlr_pipeops_featureunion
,
mlr_pipeops_filter
,
mlr_pipeops_fixfactors
,
mlr_pipeops_histbin
,
mlr_pipeops_ica
,
mlr_pipeops_imputeconstant
,
mlr_pipeops_imputehist
,
mlr_pipeops_imputelearner
,
mlr_pipeops_imputemean
,
mlr_pipeops_imputemedian
,
mlr_pipeops_imputemode
,
mlr_pipeops_imputeoor
,
mlr_pipeops_imputesample
,
mlr_pipeops_kernelpca
,
mlr_pipeops_learner
,
mlr_pipeops_learner_pi_cvplus
,
mlr_pipeops_learner_quantiles
,
mlr_pipeops_missind
,
mlr_pipeops_modelmatrix
,
mlr_pipeops_multiplicityexply
,
mlr_pipeops_multiplicityimply
,
mlr_pipeops_mutate
,
mlr_pipeops_nearmiss
,
mlr_pipeops_nmf
,
mlr_pipeops_nop
,
mlr_pipeops_ovrsplit
,
mlr_pipeops_ovrunite
,
mlr_pipeops_pca
,
mlr_pipeops_proxy
,
mlr_pipeops_quantilebin
,
mlr_pipeops_randomprojection
,
mlr_pipeops_randomresponse
,
mlr_pipeops_regravg
,
mlr_pipeops_removeconstants
,
mlr_pipeops_renamecolumns
,
mlr_pipeops_replicate
,
mlr_pipeops_rowapply
,
mlr_pipeops_scale
,
mlr_pipeops_scalemaxabs
,
mlr_pipeops_scalerange
,
mlr_pipeops_select
,
mlr_pipeops_smote
,
mlr_pipeops_smotenc
,
mlr_pipeops_spatialsign
,
mlr_pipeops_subsample
,
mlr_pipeops_targetinvert
,
mlr_pipeops_targetmutate
,
mlr_pipeops_targettrafoscalerange
,
mlr_pipeops_textvectorizer
,
mlr_pipeops_threshold
,
mlr_pipeops_tomek
,
mlr_pipeops_tunethreshold
,
mlr_pipeops_unbranch
,
mlr_pipeops_updatetarget
,
mlr_pipeops_vtreat
,
mlr_pipeops_yeojohnson
Examples
library("mlr3")
poe = po("encodeimpact")
task = TaskClassif$new("task",
data.table::data.table(
x = factor(c("a", "a", "a", "b", "b")),
y = factor(c("a", "a", "b", "b", "b"))),
"x")
poe$train(list(task))[[1]]$data()
#> x y.a y.b
#> <fctr> <num> <num>
#> 1: a 9.498089 -9.498089
#> 2: a 9.498089 -9.498089
#> 3: a -1.098546 1.098546
#> 4: b -1.098546 1.098546
#> 5: b -1.098546 1.098546
poe$state
#> $impact
#> $impact$y
#> a b
#> a 9.498089 -9.498089
#> b -1.098546 1.098546
#> .TEMP.MISSING NA NA
#>
#>
#> $dt_columns
#> [1] "y"
#>
#> $affected_cols
#> [1] "y"
#>
#> $intasklayout
#> Key: <id>
#> id type
#> <char> <char>
#> 1: y factor
#>
#> $outtasklayout
#> Key: <id>
#> id type
#> <char> <char>
#> 1: y.a numeric
#> 2: y.b numeric
#>
#> $outtaskshell
#> Empty data.table (0 rows and 3 cols): x,y.a,y.b
#>