Aggregates features from all input tasks by cbind()
ing them together into a single
Task
.
DataBackend
primary keys and Task
targets have to be equal
across all Task
s. Only the target column(s) of the first Task
are kept.
If assert_targets_equal
is TRUE
then target column names are compared and an error is thrown
if they differ across inputs.
If input tasks share some feature names but these features are not identical an error is thrown. This check is performed by first comparing the features names and if duplicates are found, also the values of these possibly duplicated features. True duplicated features are only added a single time to the output task.
Construction
PipeOpFeatureUnion$new(innum = 0, collect_multiplicity = FALSE, id = "featureunion", param_vals = list(),
assert_targets_equal = TRUE)
innum
::numeric(1)
|character
Determines the number of input channels. Ifinnum
is 0 (default), a vararg input channel is created that can take an arbitrary number of inputs. Ifinnum
is acharacter
vector, the number of input channels is the length ofinnum
, and the columns of the result are prefixed with the values.collect_multiplicity
::logical(1)
IfTRUE
, the input is aMultiplicity
collecting channel. This means, aMultiplicity
input, instead of multiple normal inputs, is accepted and the members are aggregated. This requiresinnum
to be 0. Default isFALSE
.id
::character(1)
Identifier of the resulting object, default"featureunion"
.param_vals
:: namedlist
List of hyperparameter settings, overwriting the hyperparameter settings that would otherwise be set during construction. Defaultlist()
.assert_targets_equal
::logical(1)
Ifassert_targets_equal
isTRUE
(Default), task target column names are checked for agreement. Disagreeing target column names are usually a bug, so this should often be left at the default.
Input and Output Channels
PipeOpFeatureUnion
has multiple input channels depending on the innum
construction
argument, named "input1"
, "input2"
, ... if innum
is nonzero; if innum
is 0, there is
only one vararg input channel named "..."
. All input channels take a Task
both during training and prediction.
PipeOpFeatureUnion
has one output channel named "output"
, producing a Task
both during training and prediction.
The output is a Task
constructed by cbind()
ing all features from all input
Task
s, both during training and prediction.
State
The $state
is left empty (list()
).
Internals
PipeOpFeatureUnion
uses the Task
$cbind()
method to bind the input values
beyond the first input to the first Task
. This means if the Task
s
are database-backed, all of them except the first will be fetched into R memory for this. This
behaviour may change in the future.
Fields
Only fields inherited from PipeOp
.
Methods
Only methods inherited from PipeOp
.
See also
https://mlr-org.com/pipeops.html
Other PipeOps:
PipeOp
,
PipeOpEnsemble
,
PipeOpImpute
,
PipeOpTargetTrafo
,
PipeOpTaskPreproc
,
PipeOpTaskPreprocSimple
,
mlr_pipeops
,
mlr_pipeops_adas
,
mlr_pipeops_blsmote
,
mlr_pipeops_boxcox
,
mlr_pipeops_branch
,
mlr_pipeops_chunk
,
mlr_pipeops_classbalancing
,
mlr_pipeops_classifavg
,
mlr_pipeops_classweights
,
mlr_pipeops_colapply
,
mlr_pipeops_collapsefactors
,
mlr_pipeops_colroles
,
mlr_pipeops_copy
,
mlr_pipeops_datefeatures
,
mlr_pipeops_encode
,
mlr_pipeops_encodeimpact
,
mlr_pipeops_encodelmer
,
mlr_pipeops_filter
,
mlr_pipeops_fixfactors
,
mlr_pipeops_histbin
,
mlr_pipeops_ica
,
mlr_pipeops_imputeconstant
,
mlr_pipeops_imputehist
,
mlr_pipeops_imputelearner
,
mlr_pipeops_imputemean
,
mlr_pipeops_imputemedian
,
mlr_pipeops_imputemode
,
mlr_pipeops_imputeoor
,
mlr_pipeops_imputesample
,
mlr_pipeops_kernelpca
,
mlr_pipeops_learner
,
mlr_pipeops_missind
,
mlr_pipeops_modelmatrix
,
mlr_pipeops_multiplicityexply
,
mlr_pipeops_multiplicityimply
,
mlr_pipeops_mutate
,
mlr_pipeops_nmf
,
mlr_pipeops_nop
,
mlr_pipeops_ovrsplit
,
mlr_pipeops_ovrunite
,
mlr_pipeops_pca
,
mlr_pipeops_proxy
,
mlr_pipeops_quantilebin
,
mlr_pipeops_randomprojection
,
mlr_pipeops_randomresponse
,
mlr_pipeops_regravg
,
mlr_pipeops_removeconstants
,
mlr_pipeops_renamecolumns
,
mlr_pipeops_replicate
,
mlr_pipeops_rowapply
,
mlr_pipeops_scale
,
mlr_pipeops_scalemaxabs
,
mlr_pipeops_scalerange
,
mlr_pipeops_select
,
mlr_pipeops_smote
,
mlr_pipeops_smotenc
,
mlr_pipeops_spatialsign
,
mlr_pipeops_subsample
,
mlr_pipeops_targetinvert
,
mlr_pipeops_targetmutate
,
mlr_pipeops_targettrafoscalerange
,
mlr_pipeops_textvectorizer
,
mlr_pipeops_threshold
,
mlr_pipeops_tunethreshold
,
mlr_pipeops_unbranch
,
mlr_pipeops_updatetarget
,
mlr_pipeops_vtreat
,
mlr_pipeops_yeojohnson
Other Multiplicity PipeOps:
Multiplicity()
,
PipeOpEnsemble
,
mlr_pipeops_classifavg
,
mlr_pipeops_multiplicityexply
,
mlr_pipeops_multiplicityimply
,
mlr_pipeops_ovrsplit
,
mlr_pipeops_ovrunite
,
mlr_pipeops_regravg
,
mlr_pipeops_replicate
Examples
library("mlr3")
task1 = tsk("iris")
gr = gunion(list(
po("nop"),
po("pca")
)) %>>% po("featureunion")
gr$train(task1)
#> $featureunion.output
#> <TaskClassif:iris> (150 x 9): Iris Flowers
#> * Target: Species
#> * Properties: multiclass
#> * Features (8):
#> - dbl (8): PC1, PC2, PC3, PC4, Petal.Length, Petal.Width,
#> Sepal.Length, Sepal.Width
#>
task2 = tsk("iris")
task3 = tsk("iris")
po = po("featureunion", innum = c("a", "b"))
po$train(list(task2, task3))
#> $output
#> <TaskClassif:iris> (150 x 9): Iris Flowers
#> * Target: Species
#> * Properties: multiclass
#> * Features (8):
#> - dbl (8): a.Petal.Length, a.Petal.Width, a.Sepal.Length,
#> a.Sepal.Width, b.Petal.Length, b.Petal.Width, b.Sepal.Length,
#> b.Sepal.Width
#>