Skip to contents

Adds new data points by generating synthetic instances for the minority class using the Borderline-SMOTE algorithm. This can only be applied to classification tasks with numeric features that have no missing values. See smotefamily::BLSMOTE for details.

Format

R6Class object inheriting from PipeOpTaskPreproc/PipeOp.

Construction

PipeOpBLSmote$new(id = "blsmote", param_vals = list())

  • id :: character(1)
    Identifier of resulting object, default "smote".

  • param_vals :: named list
    List of hyperparameter settings, overwriting the hyperparameter settings that would otherwise be set during construction. Default list().

Input and Output Channels

Input and output channels are inherited from PipeOpTaskPreproc.

The output during training is the input Task with added synthetic rows for the minority class. The output during prediction is the unchanged input.

State

The $state is a named list with the $state elements inherited from PipeOpTaskPreproc.

Parameters

The parameters are the parameters inherited from PipeOpTaskPreproc, as well as:

  • K :: numeric(1)
    The number of nearest neighbors used for sampling from the minority class. Default is 5. See BLSMOTE().

  • C :: numeric(1)
    The number of nearest neighbors used for classifying sample points as SAFE/DANGER/NOISE. Default is 5. See BLSMOTE().

  • dup_size :: numeric
    Desired times of synthetic minority instances over the original number of majority instances. 0 leads to balancing minority and majority class. Default is 0. See BLSMOTE().

  • method :: character(1)
    The type of Borderline-SMOTE algorithm to use. Default is "type1". See BLSMOTE().

  • quiet :: logical(1)
    Whether to suppress printing status during training. Initialized to TRUE.

Fields

Only fields inherited from PipeOpTaskPreproc/PipeOp.

Methods

Only methods inherited from PipeOpTaskPreproc/PipeOp.

References

Han, Hui, Wang, Wen-Yuan, Mao, Bing-Huan (2005). “Borderline-SMOTE: A New Over-Sampling Method in Imbalanced Data Sets Learning.” In Huang, De-Shuang, Zhang, Xiao-Ping, Huang, Guang-Bin (eds.), Advances in Intelligent Computing, 878–887. ISBN 978-3-540-31902-3, doi:10.1007/11538059_91 .

See also

https://mlr-org.com/pipeops.html

Other PipeOps: PipeOp, PipeOpEnsemble, PipeOpImpute, PipeOpTargetTrafo, PipeOpTaskPreproc, PipeOpTaskPreprocSimple, mlr_pipeops, mlr_pipeops_adas, mlr_pipeops_boxcox, mlr_pipeops_branch, mlr_pipeops_chunk, mlr_pipeops_classbalancing, mlr_pipeops_classifavg, mlr_pipeops_classweights, mlr_pipeops_colapply, mlr_pipeops_collapsefactors, mlr_pipeops_colroles, mlr_pipeops_copy, mlr_pipeops_datefeatures, mlr_pipeops_encode, mlr_pipeops_encodeimpact, mlr_pipeops_encodelmer, mlr_pipeops_featureunion, mlr_pipeops_filter, mlr_pipeops_fixfactors, mlr_pipeops_histbin, mlr_pipeops_ica, mlr_pipeops_imputeconstant, mlr_pipeops_imputehist, mlr_pipeops_imputelearner, mlr_pipeops_imputemean, mlr_pipeops_imputemedian, mlr_pipeops_imputemode, mlr_pipeops_imputeoor, mlr_pipeops_imputesample, mlr_pipeops_kernelpca, mlr_pipeops_learner, mlr_pipeops_missind, mlr_pipeops_modelmatrix, mlr_pipeops_multiplicityexply, mlr_pipeops_multiplicityimply, mlr_pipeops_mutate, mlr_pipeops_nearmiss, mlr_pipeops_nmf, mlr_pipeops_nop, mlr_pipeops_ovrsplit, mlr_pipeops_ovrunite, mlr_pipeops_pca, mlr_pipeops_proxy, mlr_pipeops_quantilebin, mlr_pipeops_randomprojection, mlr_pipeops_randomresponse, mlr_pipeops_regravg, mlr_pipeops_removeconstants, mlr_pipeops_renamecolumns, mlr_pipeops_replicate, mlr_pipeops_rowapply, mlr_pipeops_scale, mlr_pipeops_scalemaxabs, mlr_pipeops_scalerange, mlr_pipeops_select, mlr_pipeops_smote, mlr_pipeops_smotenc, mlr_pipeops_spatialsign, mlr_pipeops_subsample, mlr_pipeops_targetinvert, mlr_pipeops_targetmutate, mlr_pipeops_targettrafoscalerange, mlr_pipeops_textvectorizer, mlr_pipeops_threshold, mlr_pipeops_tomek, mlr_pipeops_tunethreshold, mlr_pipeops_unbranch, mlr_pipeops_updatetarget, mlr_pipeops_vtreat, mlr_pipeops_yeojohnson

Examples

library("mlr3")

# Create example task
data = smotefamily::sample_generator(500, 0.8)
data$result = factor(data$result)
task = TaskClassif$new(id = "example", backend = data, target = "result")
task$head()
#>    result        X1        X2
#>    <fctr>     <num>     <num>
#> 1:      n 0.9844391 0.1597977
#> 2:      n 0.2391228 0.3475170
#> 3:      n 0.5945652 0.2016805
#> 4:      p 0.6814483 0.9776465
#> 5:      p 0.5109275 0.4190668
#> 6:      n 0.3266779 0.6078258
table(task$data(cols = "result"))
#> result
#>   n   p 
#> 391 109 

# Generate synthetic data for minority class
pop = po("blsmote")
bls_result = pop$train(list(task))[[1]]$data()
nrow(bls_result)
#> [1] 773
table(bls_result$result)
#> 
#>   n   p 
#> 391 382