FilterEnsemble aggregates several Filters by averaging their scores
(or ranks) with user-defined weights. Each wrapped filter is evaluated on the supplied task,
and the resulting feature scores are combined feature-wise by a convex combination determined
through the weights parameter. This allows leveraging complementary inductive biases of
multiple filters without committing to a single criterion. The concept was introduced by
Binder et al. (2020). This implementation follows the idea but leaves the exact choice of
weights to the user.
Construction
filters::listofFilter
Filters that are evaluated and aggregated. Each filter must be cloneable and support the task type and feature types of the ensemble. The ensemble identifier defaults to the wrapped filter ids concatenated by".".
Parameters
weights::numeric()
Required non-negative weights, one for each wrapped filter, with at least one strictly positive value. Values are used as given when calculating the weighted mean. If named, names must match the wrapped filter ids.rank_transform::logical(1)
IfTRUE, ranks of individual filter scores are used instead of the raw scores. Initialized toFALSE.filter_score_transform::function
Function to be applied to the vector of individual filter scores after they were potentially transformed byrank_transformbut before weighting and aggregation. Initialized toidentity.aggregator::function
Function to aggregate the (potentially transformed) and weighted filter scores across filters. Must take argumentswfor weights andna.rm, the latter of which is always set toTRUE. Defaults tostats::weighted.mean.result_score_transform::function
Function to be applied to the vector of aggregated scores after they were potentially transformed byrank_transformand/orfilter_score_transform. Initialized toidentity.
Parameters of wrapped filters are available via $param_set and can be referenced using
the wrapped filter id followed by ".", e.g. "variance.na.rm".
Fields
$wrapped:: namedlistofFilter
Read-only access to the wrapped filters.
Methods
get_weights_search_space(weights_param_name = "weights", normalize_weights = "uniform", prefix = "w")
(character(1),character(1),character(1)) ->ParamSet
Construct aParamSetdescribing a weight search space.get_weights_tunetoken(normalize_weights = "uniform")
(character(1)) ->TuneToken
Shortcut returning aTuneTokenfor tuning the weights.set_weights_to_tune(normalize_weights = "uniform")
(character(1)) ->self
Convenience wrapper that stores theTuneTokenreturned byget_weights_tunetoken()in$param_set$values$weights.
Internals
All wrapped filters are called with nfeat equal to the number of features to ensure that
complete score vectors are available for aggregation.
Scores are combined per feature by computing a weighted aggregation of transformed (default: identity)
scores or ranks. Additionally, the final scores may also be transformed (default: identity).
The order of transformations is as follows:
$calculatethe filter's scores for all features;If
rank_transformisTRUE, convert filter scores to ranks;Apply
filter_score_transformto the scores / ranks;Calculate the weighted aggregation across all filters using
aggregator;Potentially apply
result_score_transformto the vector of scores for each feature aggreagted across filters.
References
Binder M, Moosbauer J, Thomas J, Bischl B (2020). “Multi-objective hyperparameter tuning and feature selection using filter ensembles.” In Proceedings of the 2020 Genetic and Evolutionary Computation Conference, 471–479. doi:10.1145/3377930.3389815 .
Examples
library("mlr3")
library("mlr3filters")
task = tsk("sonar")
filter = flt("ensemble",
filters = list(FilterVariance$new(), FilterAUC$new()))
filter$param_set$values$weights = c(variance = 0.5, auc = 0.5)
filter$calculate(task)
head(as.data.table(filter))
#> feature score
#> <char> <num>
#> 1: V11 0.1493737
#> 2: V12 0.1312692
#> 3: V10 0.1253847
#> 4: V9 0.1224299
#> 5: V36 0.1174703
#> 6: V49 0.1162774
# Weighted median as aggregator
filter$param_set$set_values(aggregator = function(x, w, na.rm) {
if (na.rm) x <- x[!is.na(x)]
o <- order(x)
x <- x[o]
w <- w[o]
x[match(TRUE, which(cumsum(w) >= sum(w) / 2))]
})
filter$calculate(task)
head(as.data.table(filter))
#> feature score
#> <char> <num>
#> 1: V36 0.06975989
#> 2: V20 0.06898649
#> 3: V35 0.06714950
#> 4: V19 0.06655799
#> 5: V21 0.06647019
#> 6: V22 0.06547617
# Aggregate reciprocal ranking
filter$param_set$set_values(rank_transform = TRUE,
filter_score_transform = function(x) 1 / x,
result_score_transform = function(x) rank(1 / x, ties.method = "average"))
filter$calculate(task)
head(as.data.table(filter))
#> feature score
#> <char> <num>
#> 1: V36 59.5
#> 2: V11 59.5
#> 3: V12 57.5
#> 4: V17 57.5
#> 5: V10 55.5
#> 6: V20 55.5
