Creates a Graph that can be used to robustify any subsequent learner.
Performs the following steps:
Drops empty factor levels using
PipeOpFixFactorsImputes
numericfeatures usingPipeOpImputeHistandPipeOpMissIndImputes
factorfeatures usingPipeOpImputeOOREncodes
factorsusingone-hot-encoding. Factors with a cardinality > max_cardinality are collapsed usingPipeOpCollapseFactors
The graph is built conservatively, i.e. the function always tries to assure everything works. If a learner is provided, some steps can be left out, i.e. if the learner can deal with factor variables, no encoding is performed.
All input arguments are cloned and have no references in common with the returned Graph.
Usage
pipeline_robustify(
task = NULL,
learner = NULL,
impute_missings = NULL,
factors_to_numeric = NULL,
max_cardinality = 1000,
ordered_action = "factor",
character_action = "factor",
POSIXct_action = "numeric"
)Arguments
- task
Task
ATaskto create a robustifying pipeline for. Optional, if omitted, the "worst possible"Taskis assumed and the full pipeline is created.- learner
Learner
A learner to create a robustifying pipeline for. Optional, if omitted, the "worst possible"Learneris assumed and a more conservative pipeline is built.- impute_missings
logical(1)|NULL
Should missing values be imputed? Defaults toNULL: imputes if the task has missing values (or factors that are not encoded to numerics) and the learner can not handle them.- factors_to_numeric
logical(1)|NULL
Should (ordered and unordered) factors be encoded? Defaults toNULL: encodes if the task has factors (or character columns that get converted to factor) and the learner can not handle factors.- max_cardinality
integer(1)
Maximum number of factor levels allowed. See above. Default: 1000.- ordered_action
character(1)
How to handleorderedcolumns:"factor"(default) or"factor!": convert tofactorcolumns;"numeric"or"numeric!": convert tonumericcolumns;"integer"or"integer!": convert tointegercolumns;"ignore"or"ignore!": ignore. Whentaskis given and has noorderedcolumns, or whenlearneris given and can handleordered, then"factor","numeric"and"integer"are treated like"ignore". This means it is necessary to add the exclamation point to overrideTaskorLearnerproperties when given."ignore"and"ignore!"therefore behave completely identically,"ignore!"is only present for consistency.
Whenorderedfeatures are converted tofactor, then they are treated likefactorfeatures further down in the pipeline, and are possibly eventually converted tonumerics, but in a different way:factors get one-hot encoded,ordered_action="numeric"converts ordered usingas.numericto their integer-valued rank.- character_action
character(1)
How to handlecharactercolumns:"factor"(default) or"factor!": convert tofactorcolumns;"matrix"or"matrix!": UsePipeOpTextVectorizer."ignore"or"ignore!": ignore. Whentaskis given and has nocharactercolumns, or whenlearneris given and can handlecharacter, then"factor"and"matrix"are treated like"ignore". This means it is necessary to add the exclamation point to overrideTaskorLearnerproperties when given."ignore"and"ignore!"therefore behave completely identically,"ignore!"is only present for consistency.
Whencharactercolumns are converted tofactor, then they are treated likefactorfurther down in the pipeline, and are possibly eventually converted tonumerics, using one-hot encoding.- POSIXct_action
character(1)
How to handlePOSIXctcolumns:"numeric"(default) or"numeric!": convert tonumericcolumns;"datefeatures"or"datefeatures!": UsePipeOpDateFeatures."ignore"or"ignore!": ignore. Whentaskis given and has noPOSIXctcolumns, or whenlearneris given and can handlePOSIXct, then"numeric"and"datefeatures"are treated like"ignore". This means it is necessary to add the exclamation point to overrideTaskorLearnerproperties when given."ignore"and"ignore!"therefore behave completely identically,"ignore!"is only present for consistency.
Examples
# \donttest{
library(mlr3)
lrn = lrn("regr.rpart")
task = mlr_tasks$get("boston_housing")
gr = pipeline_robustify(task, lrn) %>>% po("learner", lrn)
resample(task, GraphLearner$new(gr), rsmp("holdout"))
#>
#> ── <ResampleResult> with 1 resampling iterations ───────────────────────────────
#> task_id
#> boston_housing
#> learner_id
#> removeconstants_prerobustify.fixfactors.removeconstants_postrobustify.regr.rpart
#> resampling_id iteration prediction_test warnings errors
#> holdout 1 <PredictionRegr> 0 0
# }
