Perform (weighted) majority vote prediction from classification Predictions by connecting
PipeOpClassifAvg to multiple PipeOpLearner outputs.
Always returns a "prob" prediction, regardless of the incoming Learner's
$predict_type. The label of the class with the highest predicted probability is selected as the
"response" prediction. If the Learner's $predict_type is set to "prob",
the probability aggregation is controlled by prob_aggr (see below). If $predict_type = "response",
predictions are internally converted to one-hot probability vectors (point mass on the predicted class) before aggregation.
"prob" aggregation:
prob_aggr = "mean"– Linear opinion pool (arithmetic mean of probabilities; default). Interpretation. Mixture semantics: choose a base model with probabilityw[i], then draw from its class distribution. Decision-theoretically, this is the minimizer ofsum(w[i] * KL(p[i] || p))over probability vectorsp, whereKL(x || y)is the Kullback-Leibler divergence. Typical behavior. Conservative / better calibrated and robust to near-zero probabilities (never assigns zero unless all do). This is the standard choice for probability averaging in ensembles and stacking.prob_aggr = "log"– Log opinion pool / product of experts (geometric mean in probability space): Average per-model logs (or equivalently, logits) and apply softmax. Interpretation. Product semantics:p_ens ~ prod_i p_i^{w[i]}; minimizessum(w[i] * KL(p || p[i])). Typical behavior. Sharper / lower entropy (emphasizes consensus regions), but can be overconfident and is sensitive to zeros; useprob_aggr_epsto clip small probabilities for numerical stability. Often beneficial with strong, similarly calibrated members (e.g., neural networks), less so when calibration is the priority.
All incoming Learner's $predict_type must agree.
Weights can be set as a parameter; if none are provided, defaults to equal weights for each prediction. Defaults to equal weights for each model.
Format
R6Class inheriting from PipeOpEnsemble/PipeOp.
Construction
PipeOpClassifAvg$new(innum = 0, collect_multiplicity = FALSE, id = "classifavg", param_vals = list())innum::numeric(1)
Determines the number of input channels. Ifinnumis 0 (default), a vararg input channel is created that can take an arbitrary number of inputs.collect_multiplicity::logical(1)
IfTRUE, the input is aMultiplicitycollecting channel. This means, aMultiplicityinput, instead of multiple normal inputs, is accepted and the members are aggregated. This requiresinnumto be 0. Default isFALSE.id::character(1)Identifier of the resulting object, default"classifavg".param_vals:: namedlist
List of hyperparameter settings, overwriting the hyperparameter settings that would otherwise be set during construction. Defaultlist().
Input and Output Channels
Input and output channels are inherited from PipeOpEnsemble. Instead of a Prediction, a PredictionClassif
is used as input and output during prediction.
State
The $state is left empty (list()).
Parameters
The parameters are the parameters inherited from the PipeOpEnsemble, as well as:
prob_aggr::character(1)
Controls how incoming class probabilities are aggregated. One of"mean"(linear opinion pool; default) or"log"(log opinion pool / product of experts). See the description above for definitions and interpretation. Only has an effect if the incoming predictions have"prob"values.prob_aggr_eps::numeric(1)
Small positive constant used only forprob_aggr = "log"to clamp probabilities before taking logs, improving numerical stability and avoiding-Inf. Ignored forprob_aggr = "mean". Default is1e-12.
Internals
Inherits from PipeOpEnsemble by implementing the private$weighted_avg_predictions() method.
Fields
Only fields inherited from PipeOp.
Methods
Only methods inherited from PipeOpEnsemble/PipeOp.
See also
https://mlr-org.com/pipeops.html
Other PipeOps:
PipeOp,
PipeOpEncodePL,
PipeOpEnsemble,
PipeOpImpute,
PipeOpTargetTrafo,
PipeOpTaskPreproc,
PipeOpTaskPreprocSimple,
mlr_pipeops,
mlr_pipeops_adas,
mlr_pipeops_blsmote,
mlr_pipeops_boxcox,
mlr_pipeops_branch,
mlr_pipeops_chunk,
mlr_pipeops_classbalancing,
mlr_pipeops_classweights,
mlr_pipeops_colapply,
mlr_pipeops_collapsefactors,
mlr_pipeops_colroles,
mlr_pipeops_copy,
mlr_pipeops_datefeatures,
mlr_pipeops_decode,
mlr_pipeops_encode,
mlr_pipeops_encodeimpact,
mlr_pipeops_encodelmer,
mlr_pipeops_encodeplquantiles,
mlr_pipeops_encodepltree,
mlr_pipeops_featureunion,
mlr_pipeops_filter,
mlr_pipeops_fixfactors,
mlr_pipeops_histbin,
mlr_pipeops_ica,
mlr_pipeops_imputeconstant,
mlr_pipeops_imputehist,
mlr_pipeops_imputelearner,
mlr_pipeops_imputemean,
mlr_pipeops_imputemedian,
mlr_pipeops_imputemode,
mlr_pipeops_imputeoor,
mlr_pipeops_imputesample,
mlr_pipeops_info,
mlr_pipeops_isomap,
mlr_pipeops_kernelpca,
mlr_pipeops_learner,
mlr_pipeops_learner_pi_cvplus,
mlr_pipeops_learner_quantiles,
mlr_pipeops_missind,
mlr_pipeops_modelmatrix,
mlr_pipeops_multiplicityexply,
mlr_pipeops_multiplicityimply,
mlr_pipeops_mutate,
mlr_pipeops_nearmiss,
mlr_pipeops_nmf,
mlr_pipeops_nop,
mlr_pipeops_ovrsplit,
mlr_pipeops_ovrunite,
mlr_pipeops_pca,
mlr_pipeops_proxy,
mlr_pipeops_quantilebin,
mlr_pipeops_randomprojection,
mlr_pipeops_randomresponse,
mlr_pipeops_regravg,
mlr_pipeops_removeconstants,
mlr_pipeops_renamecolumns,
mlr_pipeops_replicate,
mlr_pipeops_rowapply,
mlr_pipeops_scale,
mlr_pipeops_scalemaxabs,
mlr_pipeops_scalerange,
mlr_pipeops_select,
mlr_pipeops_smote,
mlr_pipeops_smotenc,
mlr_pipeops_spatialsign,
mlr_pipeops_subsample,
mlr_pipeops_targetinvert,
mlr_pipeops_targetmutate,
mlr_pipeops_targettrafoscalerange,
mlr_pipeops_textvectorizer,
mlr_pipeops_threshold,
mlr_pipeops_tomek,
mlr_pipeops_tunethreshold,
mlr_pipeops_unbranch,
mlr_pipeops_updatetarget,
mlr_pipeops_vtreat,
mlr_pipeops_yeojohnson
Other Multiplicity PipeOps:
Multiplicity(),
PipeOpEnsemble,
mlr_pipeops_featureunion,
mlr_pipeops_multiplicityexply,
mlr_pipeops_multiplicityimply,
mlr_pipeops_ovrsplit,
mlr_pipeops_ovrunite,
mlr_pipeops_regravg,
mlr_pipeops_replicate
Other Ensembles:
PipeOpEnsemble,
mlr_learners_avg,
mlr_pipeops_ovrunite,
mlr_pipeops_regravg
Examples
# \donttest{
library("mlr3")
# Simple Bagging
gr = ppl("greplicate",
po("subsample") %>>%
po("learner", lrn("classif.rpart")),
n = 3
) %>>%
po("classifavg")
resample(tsk("iris"), GraphLearner$new(gr), rsmp("holdout"))
#>
#> ── <ResampleResult> with 1 resampling iterations ───────────────────────────────
#> task_id
#> iris
#> learner_id
#> subsample_1.subsample_2.subsample_3.classif.rpart_1.classif.rpart_2.classif.rpart_3.classifavg
#> resampling_id iteration prediction_test warnings errors
#> holdout 1 <PredictionClassif> 0 0
# }
