Abstract base class for piecewise linear encoding.
Piecewise linear encoding works by splitting values of features into distinct bins, through an algorithm implemented
in private$.get_bins()
, and then creating new feature columns through a continuous alternative to one-hot encoding.
Here, one new feature per bin is constructed, with values being either
0
, if the original value was below the lower bin boundary,1
, if the original value was above or equal to the upper bin boundary, ora scaled value between
0
and1
, if the original value was inside the bin boundaries. Scaling is done by offsetting the original value by the lower bin boundary and dividing by the bin width.
PipeOp
s inheriting from this encode columns of type numeric
and integer
. Use the PipeOpTaskPreproc
$affect_columns
functionality to only encode a subset of columns, or only encode columns of a certain type, etc.
Format
Abstract R6Class
object inheriting from PipeOpTaskPreprocSimple
/PipeOpTaskPreproc
/PipeOp
.
Construction
PipeOpEncodePL$new(id = "encodepl", param_set = ps(), param_vals = list(), packages = character(0), task_type = "Task")
id
::character(1)
Identifier of resulting object. See$id
slot ofPipeOp
.param_set
::ParamSet
Parameter space description. This should be created by the subclass and given tosuper$initialize()
.param_vals
:: namedlist
List of hyperparameter settings, overwriting the hyperparameter settings given inparam_set
. The subclass should have its ownparam_vals
parameter and pass it on tosuper$initialize()
. Defaultlist()
.packages
::character
Set of all required packages for thePipeOp
'sprivate$.train()
andprivate$.predict()
methods. See$packages
slot. Default ischaracter(0)
.task_type
::character(1)
The class ofTask
that should be accepted as input and will be returned as output. This should generally be acharacter(1)
identifying a type ofTask
, e.g."Task"
,"TaskClassif"
or"TaskRegr"
(or another subclass introduced by other packages). Default is"Task"
.
Input and Output Channels
Input and output channels are inherited from PipeOpTaskPreproc
.
The output is the input Task
with all affected numeric
and integer
columns encoded using piecewise linear encoding.
State
The $state
is a named list
with the $state
elements inherited from PipeOpTaskPreproc
, as well as:
bins
:: namedlist
Named list of numeric vectors. Each element corresponds to and is named after one of the affected feature columns and contains the bin boundaries derived throughprivate$.get_bins()
.
Parameters
The parameters are the parameters inherited from PipeOpTaskPreproc
.
Internals
PipeOpEncodePL
is an abstract class inheriting from PipeOpTaskPreprocSimple
that allows easier implementation
of different binning algorithms for piecewise linear encoding. The respective binning algorithm should be implemented
as private$.get_bins()
.
Fields
Only fields inherited from PipeOp
.
Methods
Methods inherited from PipeOpTaskPreprocSimple
/PipeOpTaskPreproc
/PipeOp
as well as
.get_bins(task, cols)
(Task
,character
) -> namedlist
Abstract method for splitting the value range of a feature column into distinct bins. The argumentcols
should give the names of the feature columns of thetask
for which bins should be derived. Returns a named list of numeric vectors containing the bin boundaries for each affected feature column, named by that corresponding feature column.
References
Gorishniy Y, Rubachev I, Babenko A (2022). “On Embeddings for Numerical Features in Tabular Deep Learning.” In Advances in Neural Information Processing Systems, volume 35, 24991–25004. https://proceedings.neurips.cc/paper_files/paper/2022/hash/9e9f0ffc3d836836ca96cbf8fe14b105-Abstract-Conference.html.
See also
https://mlr-org.com/pipeops.html
Other PipeOps:
PipeOp
,
PipeOpEnsemble
,
PipeOpImpute
,
PipeOpTargetTrafo
,
PipeOpTaskPreproc
,
PipeOpTaskPreprocSimple
,
mlr_pipeops
,
mlr_pipeops_adas
,
mlr_pipeops_blsmote
,
mlr_pipeops_boxcox
,
mlr_pipeops_branch
,
mlr_pipeops_chunk
,
mlr_pipeops_classbalancing
,
mlr_pipeops_classifavg
,
mlr_pipeops_classweights
,
mlr_pipeops_colapply
,
mlr_pipeops_collapsefactors
,
mlr_pipeops_colroles
,
mlr_pipeops_copy
,
mlr_pipeops_datefeatures
,
mlr_pipeops_decode
,
mlr_pipeops_encode
,
mlr_pipeops_encodeimpact
,
mlr_pipeops_encodelmer
,
mlr_pipeops_encodeplquantiles
,
mlr_pipeops_encodepltree
,
mlr_pipeops_featureunion
,
mlr_pipeops_filter
,
mlr_pipeops_fixfactors
,
mlr_pipeops_histbin
,
mlr_pipeops_ica
,
mlr_pipeops_imputeconstant
,
mlr_pipeops_imputehist
,
mlr_pipeops_imputelearner
,
mlr_pipeops_imputemean
,
mlr_pipeops_imputemedian
,
mlr_pipeops_imputemode
,
mlr_pipeops_imputeoor
,
mlr_pipeops_imputesample
,
mlr_pipeops_kernelpca
,
mlr_pipeops_learner
,
mlr_pipeops_learner_pi_cvplus
,
mlr_pipeops_learner_quantiles
,
mlr_pipeops_missind
,
mlr_pipeops_modelmatrix
,
mlr_pipeops_multiplicityexply
,
mlr_pipeops_multiplicityimply
,
mlr_pipeops_mutate
,
mlr_pipeops_nearmiss
,
mlr_pipeops_nmf
,
mlr_pipeops_nop
,
mlr_pipeops_ovrsplit
,
mlr_pipeops_ovrunite
,
mlr_pipeops_pca
,
mlr_pipeops_proxy
,
mlr_pipeops_quantilebin
,
mlr_pipeops_randomprojection
,
mlr_pipeops_randomresponse
,
mlr_pipeops_regravg
,
mlr_pipeops_removeconstants
,
mlr_pipeops_renamecolumns
,
mlr_pipeops_replicate
,
mlr_pipeops_rowapply
,
mlr_pipeops_scale
,
mlr_pipeops_scalemaxabs
,
mlr_pipeops_scalerange
,
mlr_pipeops_select
,
mlr_pipeops_smote
,
mlr_pipeops_smotenc
,
mlr_pipeops_spatialsign
,
mlr_pipeops_subsample
,
mlr_pipeops_targetinvert
,
mlr_pipeops_targetmutate
,
mlr_pipeops_targettrafoscalerange
,
mlr_pipeops_textvectorizer
,
mlr_pipeops_threshold
,
mlr_pipeops_tomek
,
mlr_pipeops_tunethreshold
,
mlr_pipeops_unbranch
,
mlr_pipeops_updatetarget
,
mlr_pipeops_vtreat
,
mlr_pipeops_yeojohnson
Other Piecewise Linear Encoding PipeOps:
mlr_pipeops_encodeplquantiles
,
mlr_pipeops_encodepltree