Collapses factors of type factor
, ordered
: Collapses the rarest factors in the
training samples, until target_level_count
levels remain. Levels that have prevalence above no_collapse_above_prevalence
are retained, however. For factor
variables, these are collapsed to the next larger level, for ordered
variables,
rare variables are collapsed to the neighbouring class, whichever has fewer samples.
Levels not seen during training are not touched during prediction; Therefore it is useful to combine this with the
PipeOpFixFactors
.
Format
R6Class
object inheriting from PipeOpTaskPreprocSimple
/PipeOpTaskPreproc
/PipeOp
.
Construction
id
::character(1)
Identifier of resulting object, default"collapsefactors"
.param_vals
:: namedlist
List of hyperparameter settings, overwriting the hyperparameter settings that would otherwise be set during construction. Defaultlist()
.
Input and Output Channels
Input and output channels are inherited from PipeOpTaskPreproc
.
The output is the input Task
with rare affected factor
and ordered
feature levels collapsed.
State
The $state
is a named list
with the $state
elements inherited from PipeOpTaskPreproc
, as well as:
collapse_map
:: namedlist
of namedlist
ofcharacter
List of factor level maps. For each factor,collapse_map
contains a namedlist
that indicates what levels of the input task get mapped to what levels of the output task. Ifcollapse_map
has an entryfeat_1
with an entrya = c("x", "y")
, it means that levels"x"
and"y"
get collapsed to level"a"
in feature"feat_1"
.
Parameters
The parameters are the parameters inherited from PipeOpTaskPreproc
, as well as:
no_collapse_above_prevalence
::numeric(1)
Fraction of samples below which factor levels get collapsed. Default is 1, which causes all levels to be collapsed untiltarget_level_count
remain.target_level_count
::integer(1)
Number of levels to retain. Default is 2.
Internals
Makes use of the fact that levels(fact_var) = list(target1 = c("source1", "source2"), target2 = "source2")
causes
renaming of level "source1"
and "source2"
both to "target1"
, and also "source2"
to "target2"
.
Methods
Only methods inherited from PipeOpTaskPreprocSimple
/PipeOpTaskPreproc
/PipeOp
.
See also
https://mlr-org.com/pipeops.html
Other PipeOps:
PipeOp
,
PipeOpEnsemble
,
PipeOpImpute
,
PipeOpTargetTrafo
,
PipeOpTaskPreproc
,
PipeOpTaskPreprocSimple
,
mlr_pipeops
,
mlr_pipeops_boxcox
,
mlr_pipeops_branch
,
mlr_pipeops_chunk
,
mlr_pipeops_classbalancing
,
mlr_pipeops_classifavg
,
mlr_pipeops_classweights
,
mlr_pipeops_colapply
,
mlr_pipeops_colroles
,
mlr_pipeops_copy
,
mlr_pipeops_datefeatures
,
mlr_pipeops_encode
,
mlr_pipeops_encodeimpact
,
mlr_pipeops_encodelmer
,
mlr_pipeops_featureunion
,
mlr_pipeops_filter
,
mlr_pipeops_fixfactors
,
mlr_pipeops_histbin
,
mlr_pipeops_ica
,
mlr_pipeops_imputeconstant
,
mlr_pipeops_imputehist
,
mlr_pipeops_imputelearner
,
mlr_pipeops_imputemean
,
mlr_pipeops_imputemedian
,
mlr_pipeops_imputemode
,
mlr_pipeops_imputeoor
,
mlr_pipeops_imputesample
,
mlr_pipeops_kernelpca
,
mlr_pipeops_learner
,
mlr_pipeops_missind
,
mlr_pipeops_modelmatrix
,
mlr_pipeops_multiplicityexply
,
mlr_pipeops_multiplicityimply
,
mlr_pipeops_mutate
,
mlr_pipeops_nmf
,
mlr_pipeops_nop
,
mlr_pipeops_ovrsplit
,
mlr_pipeops_ovrunite
,
mlr_pipeops_pca
,
mlr_pipeops_proxy
,
mlr_pipeops_quantilebin
,
mlr_pipeops_randomprojection
,
mlr_pipeops_randomresponse
,
mlr_pipeops_regravg
,
mlr_pipeops_removeconstants
,
mlr_pipeops_renamecolumns
,
mlr_pipeops_replicate
,
mlr_pipeops_rowapply
,
mlr_pipeops_scale
,
mlr_pipeops_scalemaxabs
,
mlr_pipeops_scalerange
,
mlr_pipeops_select
,
mlr_pipeops_smote
,
mlr_pipeops_spatialsign
,
mlr_pipeops_subsample
,
mlr_pipeops_targetinvert
,
mlr_pipeops_targetmutate
,
mlr_pipeops_targettrafoscalerange
,
mlr_pipeops_textvectorizer
,
mlr_pipeops_threshold
,
mlr_pipeops_tunethreshold
,
mlr_pipeops_unbranch
,
mlr_pipeops_updatetarget
,
mlr_pipeops_vtreat
,
mlr_pipeops_yeojohnson