Creates a Graph
that can be used to robustify any subsequent learner.
Performs the following steps:
Drops empty factor levels using
PipeOpFixFactors
Imputes
numeric
features usingPipeOpImputeHist
andPipeOpMissInd
Imputes
factor
features usingPipeOpImputeOOR
Encodes
factors
usingone-hot-encoding
. Factors with a cardinality > max_cardinality are collapsed usingPipeOpCollapseFactors
The graph is built conservatively, i.e. the function always tries to assure everything works. If a learner is provided, some steps can be left out, i.e. if the learner can deal with factor variables, no encoding is performed.
All input arguments are cloned and have no references in common with the returned Graph
.
Usage
pipeline_robustify(
task = NULL,
learner = NULL,
impute_missings = NULL,
factors_to_numeric = NULL,
max_cardinality = 1000,
ordered_action = "factor",
character_action = "factor",
POSIXct_action = "numeric"
)
Arguments
- task
Task
ATask
to create a robustifying pipeline for. Optional, if omitted, the "worst possible"Task
is assumed and the full pipeline is created.- learner
Learner
A learner to create a robustifying pipeline for. Optional, if omitted, the "worst possible"Learner
is assumed and a more conservative pipeline is built.- impute_missings
logical(1)
|NULL
Should missing values be imputed? Defaults toNULL
: imputes if the task has missing values (or factors that are not encoded to numerics) and the learner can not handle them.- factors_to_numeric
logical(1)
|NULL
Should (ordered and unordered) factors be encoded? Defaults toNULL
: encodes if the task has factors (or character columns that get converted to factor) and the learner can not handle factors.- max_cardinality
integer(1)
Maximum number of factor levels allowed. See above. Default: 1000.- ordered_action
character(1)
How to handleordered
columns:"factor"
(default) or"factor!"
: convert tofactor
columns;"numeric"
or"numeric!"
: convert tonumeric
columns;"integer"
or"integer!"
: convert tointeger
columns;"ignore"
or"ignore!"
: ignore. Whentask
is given and has noordered
columns, or whenlearner
is given and can handleordered
, then"factor"
,"numeric"
and"integer"
are treated like"ignore"
. This means it is necessary to add the exclamation point to overrideTask
orLearner
properties when given."ignore"
and"ignore!"
therefore behave completely identically,"ignore!"
is only present for consistency.
Whenordered
features are converted tofactor
, then they are treated likefactor
features further down in the pipeline, and are possibly eventually converted tonumeric
s, but in a different way:factor
s get one-hot encoded,ordered_action
="numeric"
converts ordered usingas.numeric
to their integer-valued rank.- character_action
character(1)
How to handlecharacter
columns:"factor"
(default) or"factor!"
: convert tofactor
columns;"matrix"
or"matrix!"
: UsePipeOpTextVectorizer
."ignore"
or"ignore!"
: ignore. Whentask
is given and has nocharacter
columns, or whenlearner
is given and can handlecharacter
, then"factor"
and"matrix"
are treated like"ignore"
. This means it is necessary to add the exclamation point to overrideTask
orLearner
properties when given."ignore"
and"ignore!"
therefore behave completely identically,"ignore!"
is only present for consistency.
Whencharacter
columns are converted tofactor
, then they are treated likefactor
further down in the pipeline, and are possibly eventually converted tonumeric
s, using one-hot encoding.- POSIXct_action
character(1)
How to handlePOSIXct
columns:"numeric"
(default) or"numeric!"
: convert tonumeric
columns;"datefeatures"
or"datefeatures!"
: UsePipeOpDateFeatures
."ignore"
or"ignore!"
: ignore. Whentask
is given and has noPOSIXct
columns, or whenlearner
is given and can handlePOSIXct
, then"numeric"
and"datefeatures"
are treated like"ignore"
. This means it is necessary to add the exclamation point to overrideTask
orLearner
properties when given."ignore"
and"ignore!"
therefore behave completely identically,"ignore!"
is only present for consistency.
Examples
# \donttest{
library(mlr3)
lrn = lrn("regr.rpart")
task = mlr_tasks$get("boston_housing")
gr = pipeline_robustify(task, lrn) %>>% po("learner", lrn)
resample(task, GraphLearner$new(gr), rsmp("holdout"))
#> <ResampleResult> with 1 resampling iterations
#> task_id
#> boston_housing
#> learner_id
#> removeconstants_prerobustify.fixfactors.removeconstants_postrobustify.regr.rpart
#> resampling_id iteration prediction_test warnings errors
#> holdout 1 <PredictionRegr> 0 0
# }