Function that offers a simple and direct way to train or predict PipeOp
s and Graph
s on Task
s,
data.frame
s or data.table
s.
Training happens if predict
is set to FALSE
and no state
is passed to this function.
Prediction happens if predict
is set to TRUE
and if the passed Graph
or PipeOp
is either trained or a state
is explicitly passed to this function.
The passed PipeOp
or Graph
gets modified by-reference.
Usage
preproc(indata, processor, state = NULL, predict = !is.null(state))
Arguments
- indata
(
Task
|data.frame
|data.table
)
Data to be pre-processed.- processor
(
Graph
|PipeOp
)Graph
orPipeOp
accepting aTask
that has one output channel.
Wheneverindata
is passed adata.frame
ordata.table
, the output channel must return aTask
to be converted back into adata.frame
ordata.table
. Additionally,processor
s which only work on sub-classes ofTaskSupervised
will not acceptdata.frame
ordata.table
, as it would be unclear which column was thetarget
.
Be aware that theprocessor
gets modified by-reference both during training, and if astate
is passed to this function. This especially means that the state of a trainedprocessor
will get overwritten whenstate
is passed.
You may want to use dictionary sugar functions to select aprocessor
and to set its hyperparameters, e.g.po()
orppl()
.- state
(named
list
|NULL
)
Optional state to be used for prediction, if theprocessor
is untrained or if the currentstate
of theprocessor
should be overwritten. Must be a complete and correct state for the respectiveprocessor
. DefaultNULL
(do not overwriteprocessor
'sstate
).- predict
(
logical(1)
)
Whether to predict (TRUE
) or train (FALSE
). By default, this isFALSE
ifstate
isNULL
(state
's default), andTRUE
otherwise.
Value
any
| data.frame
| data.table
:
If indata
is a Task
, whatever is returned by the processor
's single output channel is returned.
If indata
is a data.frame
or data.table
, an object of the same class is returned, or
if the processor
's output channel does not return a Task
, an error is thrown.
Internals
If processor
is a PipeOp
, the S3 method preproc.PipeOp
gets called first, converting the PipeOp
into a
Graph
and wrapping the state
appropriately, before calling the S3 method preproc.Graph
with the modified objects.
If indata
is a data.frame
or data.table
, a
TaskUnsupervised
is constructed internally. This implies that processor
s which only work on sub-classes
of TaskSupervised
will not work with these input types for indata
.
Examples
library("mlr3")
task = tsk("iris")
pop = po("pca")
# Training
preproc(task, pop)
#> Error in preproc(task, pop): unused argument (pop)
# Note that the PipeOp gets trained through this
pop$is_trained
#> [1] FALSE
# Predicting a trained PipeOp (trained through previous call to preproc)
preproc(task, pop, predict = TRUE)
#> Error in preproc(task, pop, predict = TRUE): unused arguments (pop, predict = TRUE)
# Predicting using a given state
# We use the state of the PipeOp from the last example and then reset it
state = pop$state
pop$state = NULL
preproc(task, pop, state)
#> Error in preproc(task, pop, state): unused arguments (pop, state)
# Note that the PipeOp's state may get overwritten inadvertently during
# training or if a state is given
pop$state$sdev
#> NULL
preproc(tsk("wine"), pop)
#> Error in preproc(tsk("wine"), pop): unused argument (pop)
pop$state$sdev
#> NULL
# Piping multiple preproc() calls, using dictionary sugar to set parameters
# tsk("penguins") |>
# preproc(po("imputemode", affect_columns = selector_name("sex"))) |>
# preproc(po("imputemean"))
# Use preproc with a Graph
gr = po("pca", rank. = 4) %>>% po("learner", learner = lrn("classif.rpart"))
preproc(tsk("sonar"), gr) # returns NULL because of the learner
#> Error in preproc(tsk("sonar"), gr): unused argument (gr)
preproc(tsk("sonar"), gr, predict = TRUE)
#> Error in preproc(tsk("sonar"), gr, predict = TRUE): unused arguments (gr, predict = TRUE)
# Training with a data.table input
# Note that `$data()` drops the information that "Species" is the target.
# It gets handled like an ordinary feature here.
dt = tsk("iris")$data()
preproc(dt, pop)
#> Error in preproc(dt, pop): unused argument (pop)
# Predicting with a data.table input
preproc(dt, pop)
#> Error in preproc(dt, pop): unused argument (pop)