Generates a more balanced data set by creating synthetic instances of the minority class for nominal and continuous data using the SMOTENC algorithm.
The algorithm generates for each minority instance a new data point based on the k
nearest
neighbors of that data point.
It treats integer features as numeric. To not change feature types, the numeric, synthetic data
generated for these features are rounded back to integer.
Because of this, data generated through usage of this PipeOp
is not exactly equal to data generated by
calling themis::smotenc
directly on the same data set.
It can only be applied to classification tasks with factor (or ordered) features and at least one numeric (or integer) feature that have no missing values.
See themis::smotenc
for details.
Format
R6Class
object inheriting from PipeOpTaskPreproc
/PipeOp
.
Construction
id
::character(1)
Identifier of resulting object, default"smotenc"
.param_vals
:: namedlist
List of hyperparameter settings, overwriting the hyperparameter settings that would otherwise be set during construction. Defaultlist()
.
Input and Output Channels
Input and output channels are inherited from PipeOpTaskPreproc
.
The output during training is the input Task
with added synthetic rows for the minority class.
The output during prediction is the unchanged input.
State
The $state
is a named list
with the $state
elements inherited from PipeOpTaskPreproc
.
Parameters
The parameters are the parameters inherited from PipeOpTaskPreproc
, as well as:
k
::integer(1)
Number of nearest neighbors used for generating new values from the minority class. Default is5
.over_ratio
::numeric(1)
Ratio of the majority to minority class. Default is1
. For details, seethemis::smotenc
.
Fields
Only fields inherited from PipeOpTaskPreproc
/PipeOp
.
Methods
Only methods inherited from PipeOpTaskPreproc
/PipeOp
.
References
Chawla NV, Bowyer KW, Hall LO, Kegelmeyer WP (2002). “SMOTE: Synthetic Minority Over-sampling Technique.” Journal of Artificial Intelligence Research, 16, 321–357. doi:10.1613/jair.953 .
See also
https://mlr-org.com/pipeops.html
Other PipeOps:
PipeOp
,
PipeOpEnsemble
,
PipeOpImpute
,
PipeOpTargetTrafo
,
PipeOpTaskPreproc
,
PipeOpTaskPreprocSimple
,
mlr_pipeops
,
mlr_pipeops_adas
,
mlr_pipeops_blsmote
,
mlr_pipeops_boxcox
,
mlr_pipeops_branch
,
mlr_pipeops_chunk
,
mlr_pipeops_classbalancing
,
mlr_pipeops_classifavg
,
mlr_pipeops_classweights
,
mlr_pipeops_colapply
,
mlr_pipeops_collapsefactors
,
mlr_pipeops_colroles
,
mlr_pipeops_copy
,
mlr_pipeops_datefeatures
,
mlr_pipeops_encode
,
mlr_pipeops_encodeimpact
,
mlr_pipeops_encodelmer
,
mlr_pipeops_featureunion
,
mlr_pipeops_filter
,
mlr_pipeops_fixfactors
,
mlr_pipeops_histbin
,
mlr_pipeops_ica
,
mlr_pipeops_imputeconstant
,
mlr_pipeops_imputehist
,
mlr_pipeops_imputelearner
,
mlr_pipeops_imputemean
,
mlr_pipeops_imputemedian
,
mlr_pipeops_imputemode
,
mlr_pipeops_imputeoor
,
mlr_pipeops_imputesample
,
mlr_pipeops_kernelpca
,
mlr_pipeops_learner
,
mlr_pipeops_learner_pi_cvplus
,
mlr_pipeops_learner_quantiles
,
mlr_pipeops_missind
,
mlr_pipeops_modelmatrix
,
mlr_pipeops_multiplicityexply
,
mlr_pipeops_multiplicityimply
,
mlr_pipeops_mutate
,
mlr_pipeops_nearmiss
,
mlr_pipeops_nmf
,
mlr_pipeops_nop
,
mlr_pipeops_ovrsplit
,
mlr_pipeops_ovrunite
,
mlr_pipeops_pca
,
mlr_pipeops_proxy
,
mlr_pipeops_quantilebin
,
mlr_pipeops_randomprojection
,
mlr_pipeops_randomresponse
,
mlr_pipeops_regravg
,
mlr_pipeops_removeconstants
,
mlr_pipeops_renamecolumns
,
mlr_pipeops_replicate
,
mlr_pipeops_rowapply
,
mlr_pipeops_scale
,
mlr_pipeops_scalemaxabs
,
mlr_pipeops_scalerange
,
mlr_pipeops_select
,
mlr_pipeops_smote
,
mlr_pipeops_spatialsign
,
mlr_pipeops_subsample
,
mlr_pipeops_targetinvert
,
mlr_pipeops_targetmutate
,
mlr_pipeops_targettrafoscalerange
,
mlr_pipeops_textvectorizer
,
mlr_pipeops_threshold
,
mlr_pipeops_tomek
,
mlr_pipeops_tunethreshold
,
mlr_pipeops_unbranch
,
mlr_pipeops_updatetarget
,
mlr_pipeops_vtreat
,
mlr_pipeops_yeojohnson
Examples
library("mlr3")
# Create example task
data = data.frame(
target = factor(sample(c("c1", "c2"), size = 200, replace = TRUE, prob = c(0.1, 0.9))),
feature = rnorm(200)
)
task = TaskClassif$new(id = "example", backend = data, target = "target")
task$head()
#> target feature
#> <fctr> <num>
#> 1: c2 -1.0084861
#> 2: c2 -0.2084793
#> 3: c2 -0.2704835
#> 4: c2 -0.7940955
#> 5: c1 -0.9476550
#> 6: c2 -1.5796010
table(task$data(cols = "target"))
#> target
#> c1 c2
#> 28 172
# Generate synthetic data for minority class
pop = po("smotenc")
smotenc_result = pop$train(list(task))[[1]]$data()
nrow(smotenc_result)
#> [1] 344
table(smotenc_result$target)
#>
#> c1 c2
#> 172 172