data:image/s3,"s3://crabby-images/0924c/0924c675f1571693acd6eedb962bf522c75dc220" alt=""
Piecewise Linear Encoding using Quantiles
Source:R/PipeOpEncodePL.R
mlr_pipeops_encodeplquantiles.Rd
Encodes numeric
and integer
feature columns using piecewise lienar encoding. For details, see documentation of
PipeOpEncodePL
or the paper referenced below.
Bins are constructed by taking the quantiles of the respective feature column as bin boundaries. The first and
last boundaries are set to the minimum and maximum value of the feature, respectively. The number of bins can be
controlled with the numsplits
hyperparameter.
Affected feature columns may contain NA
s. These are ignored when calculating quantiles.
Format
R6Class
object inheriting from PipeOpEncodePL
/PipeOpTaskPreprocSimple
/PipeOpTaskPreproc
/PipeOp
.
Construction
id
::character(1)
Identifier of resulting object, default"encodeplquantiles"
.param_vals
:: namedlist
List of hyperparameter settings, overwriting the hyperparameter settings that would otherwise be set during construction. Defaultlist()
.
Input and Output Channels
Input and output channels are inherited from PipeOpTaskPreproc
.
The output is the input Task
with all affected numeric
and integer
columns encoded using piecewise
linear encoding with bins being derived from the quantiles of the respective original feature column.
State
The $state
is a named list
with the $state
elements inherited from PipeOpEncodePL
/PipeOpTaskPreproc
.
Parameters
The parameters are the parameters inherited from PipeOpTaskPreproc
, as well as:
numsplits
::integer(1)
Number of bins to create. Initialized to2
.type
::integer(1)
Method used to calculate sample quantiles. See help ofstats::quantile
. Default is7
.
Internals
This overloads the private$.get_bins()
method of PipeOpEncodePL
and uses the stats::quantile
function
to derive the bins used for piecewise linear encoding.
Fields
Only fields inherited from PipeOp
.
Methods
Only methods inherited from PipeOpEncodePL
/PipeOpTaskPreproc
/PipeOp
.
References
Gorishniy Y, Rubachev I, Babenko A (2022). “On Embeddings for Numerical Features in Tabular Deep Learning.” In Advances in Neural Information Processing Systems, volume 35, 24991–25004. https://proceedings.neurips.cc/paper_files/paper/2022/hash/9e9f0ffc3d836836ca96cbf8fe14b105-Abstract-Conference.html.
See also
https://mlr-org.com/pipeops.html
Other PipeOps:
PipeOp
,
PipeOpEnsemble
,
PipeOpImpute
,
PipeOpTargetTrafo
,
PipeOpTaskPreproc
,
PipeOpTaskPreprocSimple
,
mlr_pipeops
,
mlr_pipeops_adas
,
mlr_pipeops_blsmote
,
mlr_pipeops_boxcox
,
mlr_pipeops_branch
,
mlr_pipeops_chunk
,
mlr_pipeops_classbalancing
,
mlr_pipeops_classifavg
,
mlr_pipeops_classweights
,
mlr_pipeops_colapply
,
mlr_pipeops_collapsefactors
,
mlr_pipeops_colroles
,
mlr_pipeops_copy
,
mlr_pipeops_datefeatures
,
mlr_pipeops_decode
,
mlr_pipeops_encode
,
mlr_pipeops_encodeimpact
,
mlr_pipeops_encodelmer
,
mlr_pipeops_encodepl
,
mlr_pipeops_encodepltree
,
mlr_pipeops_featureunion
,
mlr_pipeops_filter
,
mlr_pipeops_fixfactors
,
mlr_pipeops_histbin
,
mlr_pipeops_ica
,
mlr_pipeops_imputeconstant
,
mlr_pipeops_imputehist
,
mlr_pipeops_imputelearner
,
mlr_pipeops_imputemean
,
mlr_pipeops_imputemedian
,
mlr_pipeops_imputemode
,
mlr_pipeops_imputeoor
,
mlr_pipeops_imputesample
,
mlr_pipeops_kernelpca
,
mlr_pipeops_learner
,
mlr_pipeops_learner_pi_cvplus
,
mlr_pipeops_learner_quantiles
,
mlr_pipeops_missind
,
mlr_pipeops_modelmatrix
,
mlr_pipeops_multiplicityexply
,
mlr_pipeops_multiplicityimply
,
mlr_pipeops_mutate
,
mlr_pipeops_nearmiss
,
mlr_pipeops_nmf
,
mlr_pipeops_nop
,
mlr_pipeops_ovrsplit
,
mlr_pipeops_ovrunite
,
mlr_pipeops_pca
,
mlr_pipeops_proxy
,
mlr_pipeops_quantilebin
,
mlr_pipeops_randomprojection
,
mlr_pipeops_randomresponse
,
mlr_pipeops_regravg
,
mlr_pipeops_removeconstants
,
mlr_pipeops_renamecolumns
,
mlr_pipeops_replicate
,
mlr_pipeops_rowapply
,
mlr_pipeops_scale
,
mlr_pipeops_scalemaxabs
,
mlr_pipeops_scalerange
,
mlr_pipeops_select
,
mlr_pipeops_smote
,
mlr_pipeops_smotenc
,
mlr_pipeops_spatialsign
,
mlr_pipeops_subsample
,
mlr_pipeops_targetinvert
,
mlr_pipeops_targetmutate
,
mlr_pipeops_targettrafoscalerange
,
mlr_pipeops_textvectorizer
,
mlr_pipeops_threshold
,
mlr_pipeops_tomek
,
mlr_pipeops_tunethreshold
,
mlr_pipeops_unbranch
,
mlr_pipeops_updatetarget
,
mlr_pipeops_vtreat
,
mlr_pipeops_yeojohnson
Other Piecewise Linear Encoding PipeOps:
mlr_pipeops_encodepl
,
mlr_pipeops_encodepltree
Examples
library(mlr3)
task = tsk("iris")$select(c("Petal.Width", "Petal.Length"))
pop = po("encodeplquantiles")
train_out = pop$train(list(task))[[1L]]
# Calculated bin boundaries per feature
pop$state$bins
#> $Petal.Length
#> [1] 1.00 4.35 6.90
#>
#> $Petal.Width
#> [1] 0.1 1.3 2.5
#>
# Each feature was split into two encoded features using piecewise linear encoding
train_out$head()
#> Species Petal.Length.bin1 Petal.Length.bin2 Petal.Width.bin1
#> <fctr> <num> <num> <num>
#> 1: setosa 0.11940299 0 0.08333333
#> 2: setosa 0.11940299 0 0.08333333
#> 3: setosa 0.08955224 0 0.08333333
#> 4: setosa 0.14925373 0 0.08333333
#> 5: setosa 0.11940299 0 0.08333333
#> 6: setosa 0.20895522 0 0.25000000
#> Petal.Width.bin2
#> <num>
#> 1: 0
#> 2: 0
#> 3: 0
#> 4: 0
#> 5: 0
#> 6: 0
# Prediction works the same as training, using the bins learned during training
predict_out = pop$predict(list(task))[[1L]]
predict_out$head()
#> Species Petal.Length.bin1 Petal.Length.bin2 Petal.Width.bin1
#> <fctr> <num> <num> <num>
#> 1: setosa 0.11940299 0 0.08333333
#> 2: setosa 0.11940299 0 0.08333333
#> 3: setosa 0.08955224 0 0.08333333
#> 4: setosa 0.14925373 0 0.08333333
#> 5: setosa 0.11940299 0 0.08333333
#> 6: setosa 0.20895522 0 0.25000000
#> Petal.Width.bin2
#> <num>
#> 1: 0
#> 2: 0
#> 3: 0
#> 4: 0
#> 5: 0
#> 6: 0
# Binning into three bins per feature
# Using the nearest even order statistic for caluclating quantiles
pop$param_set$set_values(numsplits = 4, type = 3)
train_out = pop$train(list(task))[[1L]]
# Calculated bin boundaries per feature
pop$state$bins
#> $Petal.Length
#> [1] 1.0 1.6 4.3 5.1 6.9
#>
#> $Petal.Width
#> [1] 0.1 0.3 1.3 1.8 2.5
#>
# Each feature was split into three encoded features using
# piecewise linear encoding
train_out$head()
#> Species Petal.Length.bin1 Petal.Length.bin2 Petal.Length.bin3
#> <fctr> <num> <num> <num>
#> 1: setosa 0.6666667 0.00000000 0
#> 2: setosa 0.6666667 0.00000000 0
#> 3: setosa 0.5000000 0.00000000 0
#> 4: setosa 0.8333333 0.00000000 0
#> 5: setosa 0.6666667 0.00000000 0
#> 6: setosa 1.0000000 0.03703704 0
#> Petal.Length.bin4 Petal.Width.bin1 Petal.Width.bin2 Petal.Width.bin3
#> <num> <num> <num> <num>
#> 1: 0 0.5 0.0 0
#> 2: 0 0.5 0.0 0
#> 3: 0 0.5 0.0 0
#> 4: 0 0.5 0.0 0
#> 5: 0 0.5 0.0 0
#> 6: 0 1.0 0.1 0
#> Petal.Width.bin4
#> <num>
#> 1: 0
#> 2: 0
#> 3: 0
#> 4: 0
#> 5: 0
#> 6: 0