Generates a more balanced data set by creating
synthetic instances of the minority class using the SMOTE algorithm.
The algorithm samples for each minority instance a new data point based on the
neighbors of that data point.
It can only be applied to tasks with purely numeric features.
smotefamily::SMOTE for details.
PipeOpSmote$new(id = "smote", param_vals = list())
Identifier of resulting object, default
param_vals :: named
List of hyperparameter settings, overwriting the hyperparameter settings that would otherwise be set during construction. Default
Input and output channels are inherited from
The output during training is the input
Task with added synthetic rows for the minority class.
The output during prediction is the unchanged input.
$state is a named
list with the
$state elements inherited from
The parameters are the parameters inherited from
PipeOpTaskPreproc, as well as:
The number of nearest neighbors used for sampling new values. See
Desired times of synthetic minority instances over the original number of majority instances. See
Chawla NV, Bowyer KW, Hall LO, Kegelmeyer WP (2002). “SMOTE: Synthetic Minority Over-sampling Technique.” Journal of Artificial Intelligence Research, 16, 321--357. doi: 10.1613/jair.953 .
library("mlr3") # Create example task data = smotefamily::sample_generator(1000, ratio = 0.80) data$result = factor(data$result) task = TaskClassif$new(id = "example", backend = data, target = "result") task$data() #> result X1 X2 #> 1: p 0.546145996 0.67961492 #> 2: n 0.079991565 0.61547644 #> 3: n 0.643280776 0.03632103 #> 4: n 0.731377352 0.32976618 #> 5: n 0.004454134 0.94679939 #> --- #> 996: n 0.629311925 0.85093931 #> 997: p 0.607156249 0.52193177 #> 998: n 0.026633458 0.57191021 #> 999: n 0.380717913 0.93177893 #> 1000: n 0.430693496 0.74375332 table(task$data()$result) #> #> n p #> 835 165 # Generate synthetic data for minority class pop = po("smote") smotedata = pop$train(list(task))[]$data() table(smotedata$result) #> #> n p #> 835 825