Applies a function to each column of a task. Use the affect_columns
parameter inherited from
PipeOpTaskPreprocSimple
to limit the columns this function should be applied to. This can be used
for simple parameter transformations or type conversions (e.g. as.numeric
).
The same function is applied during training and prediction. One important relationship for
machine learning preprocessing is that during the prediction phase, the preprocessing on each
data row should be independent of other rows. Therefore, the applicator
function should always
return a vector / list where each result component only depends on the corresponding input component and
not on other components. As a rule of thumb, if the function f
generates output different
from Vectorize(f)
, it is not a function that should be used for applicator
.
Format
R6Class
object inheriting from PipeOpTaskPreprocSimple
/PipeOpTaskPreproc
/PipeOp
.
Construction
id
::character(1)
Identifier of resulting object, default"colapply"
.param_vals
:: namedlist
List of hyperparameter settings, overwriting the hyperparameter settings that would otherwise be set during construction. Defaultlist()
.
Input and Output Channels
Input and output channels are inherited from PipeOpTaskPreprocSimple
.
The output is the input Task
with features changed according to the applicator
parameter.
State
The $state
is a named list
with the $state
elements inherited from PipeOpTaskPreprocSimple
.
Parameters
The parameters are the parameters inherited from PipeOpTaskPreprocSimple
, as well as:
applicator
::function
Function to apply to each column of the task. The return value should be avector
of the same length as the input, i.e., the function vectorizes over the input. A typical example would beas.numeric
.
The return value can also be amatrix
,data.frame
, ordata.table
. In this case, the length of the input must match the number of returned rows. The names of the resulting features of the outputTask
is based on the (column) name(s) of the return value of the applicator function, prefixed with the original feature name separated by a dot (.
). UseVectorize
to create a vectorizing function from any function that ordinarily only takes one element input.
Internals
Calls map
on the data, using the value of applicator
as f.
and coerces the output via as.data.table
.
Fields
Only fields inherited from PipeOpTaskPreprocSimple
/PipeOpTaskPreproc
/PipeOp
.
Methods
Only methods inherited from PipeOpTaskPreprocSimple
/PipeOpTaskPreproc
/PipeOp
.
See also
https://mlr-org.com/pipeops.html
Other PipeOps:
PipeOp
,
PipeOpEnsemble
,
PipeOpImpute
,
PipeOpTargetTrafo
,
PipeOpTaskPreproc
,
PipeOpTaskPreprocSimple
,
mlr_pipeops
,
mlr_pipeops_adas
,
mlr_pipeops_blsmote
,
mlr_pipeops_boxcox
,
mlr_pipeops_branch
,
mlr_pipeops_chunk
,
mlr_pipeops_classbalancing
,
mlr_pipeops_classifavg
,
mlr_pipeops_classweights
,
mlr_pipeops_collapsefactors
,
mlr_pipeops_colroles
,
mlr_pipeops_copy
,
mlr_pipeops_datefeatures
,
mlr_pipeops_encode
,
mlr_pipeops_encodeimpact
,
mlr_pipeops_encodelmer
,
mlr_pipeops_featureunion
,
mlr_pipeops_filter
,
mlr_pipeops_fixfactors
,
mlr_pipeops_histbin
,
mlr_pipeops_ica
,
mlr_pipeops_imputeconstant
,
mlr_pipeops_imputehist
,
mlr_pipeops_imputelearner
,
mlr_pipeops_imputemean
,
mlr_pipeops_imputemedian
,
mlr_pipeops_imputemode
,
mlr_pipeops_imputeoor
,
mlr_pipeops_imputesample
,
mlr_pipeops_kernelpca
,
mlr_pipeops_learner
,
mlr_pipeops_missind
,
mlr_pipeops_modelmatrix
,
mlr_pipeops_multiplicityexply
,
mlr_pipeops_multiplicityimply
,
mlr_pipeops_mutate
,
mlr_pipeops_nmf
,
mlr_pipeops_nop
,
mlr_pipeops_ovrsplit
,
mlr_pipeops_ovrunite
,
mlr_pipeops_pca
,
mlr_pipeops_proxy
,
mlr_pipeops_quantilebin
,
mlr_pipeops_randomprojection
,
mlr_pipeops_randomresponse
,
mlr_pipeops_regravg
,
mlr_pipeops_removeconstants
,
mlr_pipeops_renamecolumns
,
mlr_pipeops_replicate
,
mlr_pipeops_rowapply
,
mlr_pipeops_scale
,
mlr_pipeops_scalemaxabs
,
mlr_pipeops_scalerange
,
mlr_pipeops_select
,
mlr_pipeops_smote
,
mlr_pipeops_smotenc
,
mlr_pipeops_spatialsign
,
mlr_pipeops_subsample
,
mlr_pipeops_targetinvert
,
mlr_pipeops_targetmutate
,
mlr_pipeops_targettrafoscalerange
,
mlr_pipeops_textvectorizer
,
mlr_pipeops_threshold
,
mlr_pipeops_tunethreshold
,
mlr_pipeops_unbranch
,
mlr_pipeops_updatetarget
,
mlr_pipeops_vtreat
,
mlr_pipeops_yeojohnson
Examples
library("mlr3")
task = tsk("iris")
poca = po("colapply", applicator = as.character)
poca$train(list(task))[[1]] # types are converted
#> <TaskClassif:iris> (150 x 5): Iris Flowers
#> * Target: Species
#> * Properties: multiclass
#> * Features (4):
#> - chr (4): Petal.Length, Petal.Width, Sepal.Length, Sepal.Width
# function that does not vectorize
f1 = function(x) {
# we could use `ifelse` here, but that is not the point
if (x > 1) {
"a"
} else {
"b"
}
}
poca$param_set$values$applicator = Vectorize(f1)
poca$train(list(task))[[1]]$data()
#> Species Petal.Length Petal.Width Sepal.Length Sepal.Width
#> <fctr> <char> <char> <char> <char>
#> 1: setosa a b a a
#> 2: setosa a b a a
#> 3: setosa a b a a
#> 4: setosa a b a a
#> 5: setosa a b a a
#> ---
#> 146: virginica a a a a
#> 147: virginica a a a a
#> 148: virginica a a a a
#> 149: virginica a a a a
#> 150: virginica a a a a
# only affect Petal.* columns
poca$param_set$values$affect_columns = selector_grep("^Petal")
poca$train(list(task))[[1]]$data()
#> Species Petal.Length Petal.Width Sepal.Length Sepal.Width
#> <fctr> <char> <char> <num> <num>
#> 1: setosa a b 5.1 3.5
#> 2: setosa a b 4.9 3.0
#> 3: setosa a b 4.7 3.2
#> 4: setosa a b 4.6 3.1
#> 5: setosa a b 5.0 3.6
#> ---
#> 146: virginica a a 6.7 3.0
#> 147: virginica a a 6.3 2.5
#> 148: virginica a a 6.5 3.0
#> 149: virginica a a 6.2 3.4
#> 150: virginica a a 5.9 3.0
# function returning multiple columns
f2 = function(x) {
cbind(floor = floor(x), ceiling = ceiling(x))
}
poca$param_set$values$applicator = f2
poca$param_set$values$affect_columns = selector_all()
poca$train(list(task))[[1]]$data()
#> Species Petal.Length.floor Petal.Length.ceiling Petal.Width.floor
#> <fctr> <num> <num> <num>
#> 1: setosa 1 2 0
#> 2: setosa 1 2 0
#> 3: setosa 1 2 0
#> 4: setosa 1 2 0
#> 5: setosa 1 2 0
#> ---
#> 146: virginica 5 6 2
#> 147: virginica 5 5 1
#> 148: virginica 5 6 2
#> 149: virginica 5 6 2
#> 150: virginica 5 6 1
#> Petal.Width.ceiling Sepal.Length.floor Sepal.Length.ceiling
#> <num> <num> <num>
#> 1: 1 5 6
#> 2: 1 4 5
#> 3: 1 4 5
#> 4: 1 4 5
#> 5: 1 5 5
#> ---
#> 146: 3 6 7
#> 147: 2 6 7
#> 148: 2 6 7
#> 149: 3 6 7
#> 150: 2 5 6
#> Sepal.Width.floor Sepal.Width.ceiling
#> <num> <num>
#> 1: 3 4
#> 2: 3 3
#> 3: 3 4
#> 4: 3 4
#> 5: 3 4
#> ---
#> 146: 3 3
#> 147: 2 3
#> 148: 3 3
#> 149: 3 4
#> 150: 3 3