Creates a Graph
that performs bagging for a supplied graph.
This is done as follows:
Subsample
the data in each step usingPipeOpSubsample
, afterwards applygraph
Replicate this step
iterations
times (in parallel via multiplicities)Average outputs of replicated
graph
s predictions using theaverager
(note that settingcollect_multipliciy = TRUE
is required)
All input arguments are cloned and have no references in common with the returned Graph
.
Arguments
- graph
PipeOp
|Graph
APipeOpLearner
orGraph
to create a robustifying pipeline for. Outputs from the replicatedgraph
s are connected with theaverager
.- iterations
integer(1)
Number of bagging iterations. Defaults to 10.- frac
numeric(1)
Percentage of rows to keep during subsampling. SeePipeOpSubsample
for more information. Defaults to 0.7.- averager
PipeOp
|Graph
APipeOp
orGraph
that averages the predictions from the replicated and subsampled graph's. In the simplest case,po("classifavg")
andpo("regravg")
can be used in order to perform simple averaging of classification and regression predictions respectively. IfNULL
(default), no averager is added to the end of the graph. Note that settingcollect_multipliciy = TRUE
during construction of the averager is required.- replace
logical(1)
Whether to sample with replacement. DefaultFALSE
.
Examples
# \donttest{
library(mlr3)
lrn_po = po("learner", lrn("regr.rpart"))
task = mlr_tasks$get("boston_housing")
gr = pipeline_bagging(lrn_po, 3, averager = po("regravg", collect_multiplicity = TRUE))
resample(task, GraphLearner$new(gr), rsmp("holdout"))$aggregate()
#> regr.mse
#> 22.72214
# The original bagging method uses boosting by sampling with replacement.
gr = ppl("bagging", lrn_po, frac = 1, replace = TRUE,
averager = po("regravg", collect_multiplicity = TRUE))
resample(task, GraphLearner$new(gr), rsmp("holdout"))$aggregate()
#> regr.mse
#> 23.23871
# }