Skip to contents

Perform (weighted) majority vote prediction from classification Predictions by connecting PipeOpClassifAvg to multiple PipeOpLearner outputs.

Always returns a "prob" prediction, regardless of the incoming Learner's $predict_type. The label of the class with the highest predicted probability is selected as the "response" prediction. If the Learner's $predict_type is set to "prob", the probability aggregation is controlled by prob_aggr (see below). If $predict_type = "response", predictions are internally converted to one-hot probability vectors (point mass on the predicted class) before aggregation.

"prob" aggregation:

  • prob_aggr = "mean"Linear opinion pool (arithmetic mean of probabilities; default). Interpretation. Mixture semantics: choose a base model with probability w[i], then draw from its class distribution. Decision-theoretically, this is the minimizer of sum(w[i] * KL(p[i] || p)) over probability vectors p, where KL(x || y) is the Kullback-Leibler divergence. Typical behavior. Conservative / better calibrated and robust to near-zero probabilities (never assigns zero unless all do). This is the standard choice for probability averaging in ensembles and stacking.

  • prob_aggr = "log"Log opinion pool / product of experts (geometric mean in probability space): Average per-model logs (or equivalently, logits) and apply softmax. Interpretation. Product semantics: p_ens ~ prod_i p_i^{w[i]}; minimizes sum(w[i] * KL(p || p[i])). Typical behavior. Sharper / lower entropy (emphasizes consensus regions), but can be overconfident and is sensitive to zeros; use prob_aggr_eps to clip small probabilities for numerical stability. Often beneficial with strong, similarly calibrated members (e.g., neural networks), less so when calibration is the priority.

All incoming Learner's $predict_type must agree.

Weights can be set as a parameter; if none are provided, defaults to equal weights for each prediction. Defaults to equal weights for each model.

Format

R6Class inheriting from PipeOpEnsemble/PipeOp.

Construction

PipeOpClassifAvg$new(innum = 0, collect_multiplicity = FALSE, id = "classifavg", param_vals = list())

  • innum :: numeric(1)
    Determines the number of input channels. If innum is 0 (default), a vararg input channel is created that can take an arbitrary number of inputs.

  • collect_multiplicity :: logical(1)
    If TRUE, the input is a Multiplicity collecting channel. This means, a Multiplicity input, instead of multiple normal inputs, is accepted and the members are aggregated. This requires innum to be 0. Default is FALSE.

  • id :: character(1) Identifier of the resulting object, default "classifavg".

  • param_vals :: named list
    List of hyperparameter settings, overwriting the hyperparameter settings that would otherwise be set during construction. Default list().

Input and Output Channels

Input and output channels are inherited from PipeOpEnsemble. Instead of a Prediction, a PredictionClassif is used as input and output during prediction.

State

The $state is left empty (list()).

Parameters

The parameters are the parameters inherited from the PipeOpEnsemble, as well as:

  • prob_aggr :: character(1)
    Controls how incoming class probabilities are aggregated. One of "mean" (linear opinion pool; default) or "log" (log opinion pool / product of experts). See the description above for definitions and interpretation. Only has an effect if the incoming predictions have "prob" values.

  • prob_aggr_eps :: numeric(1)
    Small positive constant used only for prob_aggr = "log" to clamp probabilities before taking logs, improving numerical stability and avoiding -Inf. Ignored for prob_aggr = "mean". Default is 1e-12.

Internals

Inherits from PipeOpEnsemble by implementing the private$weighted_avg_predictions() method.

Fields

Only fields inherited from PipeOp.

Methods

Only methods inherited from PipeOpEnsemble/PipeOp.

See also

https://mlr-org.com/pipeops.html

Other PipeOps: PipeOp, PipeOpEncodePL, PipeOpEnsemble, PipeOpImpute, PipeOpTargetTrafo, PipeOpTaskPreproc, PipeOpTaskPreprocSimple, mlr_pipeops, mlr_pipeops_adas, mlr_pipeops_blsmote, mlr_pipeops_boxcox, mlr_pipeops_branch, mlr_pipeops_chunk, mlr_pipeops_classbalancing, mlr_pipeops_classweights, mlr_pipeops_colapply, mlr_pipeops_collapsefactors, mlr_pipeops_colroles, mlr_pipeops_copy, mlr_pipeops_datefeatures, mlr_pipeops_decode, mlr_pipeops_encode, mlr_pipeops_encodeimpact, mlr_pipeops_encodelmer, mlr_pipeops_encodeplquantiles, mlr_pipeops_encodepltree, mlr_pipeops_featureunion, mlr_pipeops_filter, mlr_pipeops_fixfactors, mlr_pipeops_histbin, mlr_pipeops_ica, mlr_pipeops_imputeconstant, mlr_pipeops_imputehist, mlr_pipeops_imputelearner, mlr_pipeops_imputemean, mlr_pipeops_imputemedian, mlr_pipeops_imputemode, mlr_pipeops_imputeoor, mlr_pipeops_imputesample, mlr_pipeops_info, mlr_pipeops_isomap, mlr_pipeops_kernelpca, mlr_pipeops_learner, mlr_pipeops_learner_pi_cvplus, mlr_pipeops_learner_quantiles, mlr_pipeops_missind, mlr_pipeops_modelmatrix, mlr_pipeops_multiplicityexply, mlr_pipeops_multiplicityimply, mlr_pipeops_mutate, mlr_pipeops_nearmiss, mlr_pipeops_nmf, mlr_pipeops_nop, mlr_pipeops_ovrsplit, mlr_pipeops_ovrunite, mlr_pipeops_pca, mlr_pipeops_proxy, mlr_pipeops_quantilebin, mlr_pipeops_randomprojection, mlr_pipeops_randomresponse, mlr_pipeops_regravg, mlr_pipeops_removeconstants, mlr_pipeops_renamecolumns, mlr_pipeops_replicate, mlr_pipeops_rowapply, mlr_pipeops_scale, mlr_pipeops_scalemaxabs, mlr_pipeops_scalerange, mlr_pipeops_select, mlr_pipeops_smote, mlr_pipeops_smotenc, mlr_pipeops_spatialsign, mlr_pipeops_subsample, mlr_pipeops_targetinvert, mlr_pipeops_targetmutate, mlr_pipeops_targettrafoscalerange, mlr_pipeops_textvectorizer, mlr_pipeops_threshold, mlr_pipeops_tomek, mlr_pipeops_tunethreshold, mlr_pipeops_unbranch, mlr_pipeops_updatetarget, mlr_pipeops_vtreat, mlr_pipeops_yeojohnson

Other Multiplicity PipeOps: Multiplicity(), PipeOpEnsemble, mlr_pipeops_featureunion, mlr_pipeops_multiplicityexply, mlr_pipeops_multiplicityimply, mlr_pipeops_ovrsplit, mlr_pipeops_ovrunite, mlr_pipeops_regravg, mlr_pipeops_replicate

Other Ensembles: PipeOpEnsemble, mlr_learners_avg, mlr_pipeops_ovrunite, mlr_pipeops_regravg

Examples

# \donttest{
library("mlr3")

# Simple Bagging
gr = ppl("greplicate",
  po("subsample") %>>%
  po("learner", lrn("classif.rpart")),
  n = 3
) %>>%
  po("classifavg")

resample(tsk("iris"), GraphLearner$new(gr), rsmp("holdout"))
#> 
#> ── <ResampleResult> with 1 resampling iterations ───────────────────────────────
#>  task_id
#>     iris
#>                                                                                      learner_id
#>  subsample_1.subsample_2.subsample_3.classif.rpart_1.classif.rpart_2.classif.rpart_3.classifavg
#>  resampling_id iteration     prediction_test warnings errors
#>        holdout         1 <PredictionClassif>        0      0
# }