Generates a more balanced data set by creating synthetic instances of the minority class using the ADASYN algorithm.
The algorithm generates for each minority instance new data points based on its K
nearest neighbors and the difficulty of learning for that data point.
It can only be applied to tasks with numeric features that have no missing values.
See smotefamily::ADAS
for details.
Format
R6Class
object inheriting from PipeOpTaskPreproc
/PipeOp
.
Construction
id
::character(1)
Identifier of resulting object, default"adas"
.param_vals
:: namedlist
List of hyperparameter settings, overwriting the hyperparameter settings that would otherwise be set during construction. Defaultlist()
.
Input and Output Channels
Input and output channels are inherited from PipeOpTaskPreproc
. Instead of a Task
, a
TaskClassif
is used as input and output during training and prediction.
The output during training is the input Task
with added synthetic rows for the minority class.
The output during prediction is the unchanged input.
State
The $state
is a named list
with the $state
elements inherited from PipeOpTaskPreproc
.
Parameters
The parameters are the parameters inherited from PipeOpTaskPreproc
, as well as:
K
::numeric(1)
The number of nearest neighbors used for sampling new values. Default is5
. SeeADAS()
.
Fields
Only fields inherited from PipeOp
.
Methods
Only methods inherited from PipeOpTaskPreproc
/PipeOp
.
References
He H, Bai Y, Garcia, A. E, Li S (2008). “ADASYN: Adaptive synthetic sampling approach for imbalanced learning.” In 2008 IEEE International Joint Conference on Neural Networks (IEEE World Congress on Computational Intelligence), 1322-1328. doi:10.1109/IJCNN.2008.4633969 .
See also
https://mlr-org.com/pipeops.html
Other PipeOps:
PipeOp
,
PipeOpEncodePL
,
PipeOpEnsemble
,
PipeOpImpute
,
PipeOpTargetTrafo
,
PipeOpTaskPreproc
,
PipeOpTaskPreprocSimple
,
mlr_pipeops
,
mlr_pipeops_blsmote
,
mlr_pipeops_boxcox
,
mlr_pipeops_branch
,
mlr_pipeops_chunk
,
mlr_pipeops_classbalancing
,
mlr_pipeops_classifavg
,
mlr_pipeops_classweights
,
mlr_pipeops_colapply
,
mlr_pipeops_collapsefactors
,
mlr_pipeops_colroles
,
mlr_pipeops_copy
,
mlr_pipeops_datefeatures
,
mlr_pipeops_decode
,
mlr_pipeops_encode
,
mlr_pipeops_encodeimpact
,
mlr_pipeops_encodelmer
,
mlr_pipeops_encodeplquantiles
,
mlr_pipeops_encodepltree
,
mlr_pipeops_featureunion
,
mlr_pipeops_filter
,
mlr_pipeops_fixfactors
,
mlr_pipeops_histbin
,
mlr_pipeops_ica
,
mlr_pipeops_imputeconstant
,
mlr_pipeops_imputehist
,
mlr_pipeops_imputelearner
,
mlr_pipeops_imputemean
,
mlr_pipeops_imputemedian
,
mlr_pipeops_imputemode
,
mlr_pipeops_imputeoor
,
mlr_pipeops_imputesample
,
mlr_pipeops_kernelpca
,
mlr_pipeops_learner
,
mlr_pipeops_learner_pi_cvplus
,
mlr_pipeops_learner_quantiles
,
mlr_pipeops_missind
,
mlr_pipeops_modelmatrix
,
mlr_pipeops_multiplicityexply
,
mlr_pipeops_multiplicityimply
,
mlr_pipeops_mutate
,
mlr_pipeops_nearmiss
,
mlr_pipeops_nmf
,
mlr_pipeops_nop
,
mlr_pipeops_ovrsplit
,
mlr_pipeops_ovrunite
,
mlr_pipeops_pca
,
mlr_pipeops_proxy
,
mlr_pipeops_quantilebin
,
mlr_pipeops_randomprojection
,
mlr_pipeops_randomresponse
,
mlr_pipeops_regravg
,
mlr_pipeops_removeconstants
,
mlr_pipeops_renamecolumns
,
mlr_pipeops_replicate
,
mlr_pipeops_rowapply
,
mlr_pipeops_scale
,
mlr_pipeops_scalemaxabs
,
mlr_pipeops_scalerange
,
mlr_pipeops_select
,
mlr_pipeops_smote
,
mlr_pipeops_smotenc
,
mlr_pipeops_spatialsign
,
mlr_pipeops_subsample
,
mlr_pipeops_targetinvert
,
mlr_pipeops_targetmutate
,
mlr_pipeops_targettrafoscalerange
,
mlr_pipeops_textvectorizer
,
mlr_pipeops_threshold
,
mlr_pipeops_tomek
,
mlr_pipeops_tunethreshold
,
mlr_pipeops_unbranch
,
mlr_pipeops_updatetarget
,
mlr_pipeops_vtreat
,
mlr_pipeops_yeojohnson
Examples
library("mlr3")
# Create example task
data = data.frame(
target = factor(sample(c("c1", "c2"), size = 300, replace = TRUE, prob = c(0.1, 0.9))),
x1 = rnorm(300),
x2 = rnorm(300)
)
task = TaskClassif$new(id = "example", backend = data, target = "target")
task$head()
#> target x1 x2
#> <fctr> <num> <num>
#> 1: c2 -0.3002703 0.1054926
#> 2: c2 -0.3084548 0.8072830
#> 3: c2 0.5561903 -0.1168897
#> 4: c2 0.6400171 -1.4728225
#> 5: c1 -1.8017986 0.4839497
#> 6: c2 -0.7361059 0.1665491
table(task$data(cols = "target"))
#> target
#> c1 c2
#> 26 274
# Generate synthetic data for minority class
pop = po("adas")
adas_result = pop$train(list(task))[[1]]$data()
nrow(adas_result)
#> [1] 546
table(adas_result$target)
#>
#> c1 c2
#> 272 274