Abstract base class for piecewise linear encoding.
Piecewise linear encoding works by splitting values of features into distinct bins, through an algorithm implemented
in private$.get_bins(), and then creating new feature columns through a continuous alternative to one-hot encoding.
Here, one new feature per bin is constructed, with values being either
0, if the original value was below the lower bin boundary,1, if the original value was above or equal to the upper bin boundary, ora scaled value between
0and1, if the original value was inside the bin boundaries. Scaling is done by offsetting the original value by the lower bin boundary and dividing by the bin width.
PipeOps inheriting from this encode columns of type numeric and integer. Use the PipeOpTaskPreproc
$affect_columns functionality to only encode a subset of columns, or only encode columns of a certain type, etc.
Format
Abstract R6Class object inheriting from PipeOpTaskPreprocSimple/PipeOpTaskPreproc/PipeOp.
Construction
PipeOpEncodePL$new(id = "encodepl", param_set = ps(), param_vals = list(), packages = character(0), task_type = "Task")id::character(1)
Identifier of resulting object. See$idslot ofPipeOp.param_set::ParamSet
Parameter space description. This should be created by the subclass and given tosuper$initialize().param_vals:: namedlist
List of hyperparameter settings, overwriting the hyperparameter settings given inparam_set. The subclass should have its ownparam_valsparameter and pass it on tosuper$initialize(). Defaultlist().packages::character
Set of all required packages for thePipeOp'sprivate$.train()andprivate$.predict()methods. See$packagesslot. Default ischaracter(0).task_type::character(1)
The class ofTaskthat should be accepted as input and will be returned as output. This should generally be acharacter(1)identifying a type ofTask, e.g."Task","TaskClassif"or"TaskRegr"(or another subclass introduced by other packages). Default is"Task".
Input and Output Channels
Input and output channels are inherited from PipeOpTaskPreproc.
The output is the input Task with all affected numeric and integer columns encoded using piecewise linear encoding.
State
The $state is a named list with the $state elements inherited from PipeOpTaskPreproc, as well as:
bins:: namedlist
Named list of numeric vectors. Each element corresponds to and is named after one of the affected feature columns and contains the bin boundaries derived throughprivate$.get_bins().
Parameters
The parameters are the parameters inherited from PipeOpTaskPreproc.
Internals
PipeOpEncodePL is an abstract class inheriting from PipeOpTaskPreprocSimple that allows easier implementation
of different binning algorithms for piecewise linear encoding. The respective binning algorithm should be implemented
as private$.get_bins().
Fields
Only fields inherited from PipeOp.
Methods
Methods inherited from PipeOpTaskPreprocSimple/PipeOpTaskPreproc/PipeOp as well as
.get_bins(task, cols)
(Task,character) -> namedlist
Abstract method for splitting the value range of a feature column into distinct bins. The argumentcolsshould give the names of the feature columns of thetaskfor which bins should be derived. Returns a named list of numeric vectors containing the bin boundaries for each affected feature column, named by that corresponding feature column.
References
Gorishniy Y, Rubachev I, Babenko A (2022). “On Embeddings for Numerical Features in Tabular Deep Learning.” In Advances in Neural Information Processing Systems, volume 35, 24991–25004. https://proceedings.neurips.cc/paper_files/paper/2022/hash/9e9f0ffc3d836836ca96cbf8fe14b105-Abstract-Conference.html.
See also
https://mlr-org.com/pipeops.html
Other PipeOps:
PipeOp,
PipeOpEnsemble,
PipeOpImpute,
PipeOpTargetTrafo,
PipeOpTaskPreproc,
PipeOpTaskPreprocSimple,
mlr_pipeops,
mlr_pipeops_adas,
mlr_pipeops_blsmote,
mlr_pipeops_boxcox,
mlr_pipeops_branch,
mlr_pipeops_chunk,
mlr_pipeops_classbalancing,
mlr_pipeops_classifavg,
mlr_pipeops_classweights,
mlr_pipeops_colapply,
mlr_pipeops_collapsefactors,
mlr_pipeops_colroles,
mlr_pipeops_copy,
mlr_pipeops_datefeatures,
mlr_pipeops_decode,
mlr_pipeops_encode,
mlr_pipeops_encodeimpact,
mlr_pipeops_encodelmer,
mlr_pipeops_encodeplquantiles,
mlr_pipeops_encodepltree,
mlr_pipeops_featureunion,
mlr_pipeops_filter,
mlr_pipeops_fixfactors,
mlr_pipeops_histbin,
mlr_pipeops_ica,
mlr_pipeops_imputeconstant,
mlr_pipeops_imputehist,
mlr_pipeops_imputelearner,
mlr_pipeops_imputemean,
mlr_pipeops_imputemedian,
mlr_pipeops_imputemode,
mlr_pipeops_imputeoor,
mlr_pipeops_imputesample,
mlr_pipeops_kernelpca,
mlr_pipeops_learner,
mlr_pipeops_learner_pi_cvplus,
mlr_pipeops_learner_quantiles,
mlr_pipeops_missind,
mlr_pipeops_modelmatrix,
mlr_pipeops_multiplicityexply,
mlr_pipeops_multiplicityimply,
mlr_pipeops_mutate,
mlr_pipeops_nearmiss,
mlr_pipeops_nmf,
mlr_pipeops_nop,
mlr_pipeops_ovrsplit,
mlr_pipeops_ovrunite,
mlr_pipeops_pca,
mlr_pipeops_proxy,
mlr_pipeops_quantilebin,
mlr_pipeops_randomprojection,
mlr_pipeops_randomresponse,
mlr_pipeops_regravg,
mlr_pipeops_removeconstants,
mlr_pipeops_renamecolumns,
mlr_pipeops_replicate,
mlr_pipeops_rowapply,
mlr_pipeops_scale,
mlr_pipeops_scalemaxabs,
mlr_pipeops_scalerange,
mlr_pipeops_select,
mlr_pipeops_smote,
mlr_pipeops_smotenc,
mlr_pipeops_spatialsign,
mlr_pipeops_subsample,
mlr_pipeops_targetinvert,
mlr_pipeops_targetmutate,
mlr_pipeops_targettrafoscalerange,
mlr_pipeops_textvectorizer,
mlr_pipeops_threshold,
mlr_pipeops_tomek,
mlr_pipeops_tunethreshold,
mlr_pipeops_unbranch,
mlr_pipeops_updatetarget,
mlr_pipeops_vtreat,
mlr_pipeops_yeojohnson
Other Piecewise Linear Encoding PipeOps:
mlr_pipeops_encodeplquantiles,
mlr_pipeops_encodepltree
