A Selector function is used by different PipeOps, most prominently PipeOpSelect and many PipeOps inheriting
from PipeOpTaskPreproc, to determine a subset of Tasks to operate on.
Even though a Selector is a function that can be written itself, it is preferable to use the Selector constructors
shown here. Each of these can be called with its arguments to create a Selector, which can then be given to the PipeOpSelect
selector parameter, or many PipeOpTaskPreprocs' affect_columns parameter. See there for examples of this usage.
Usage
selector_all()
selector_none()
selector_type(types)
selector_grep(pattern, ignore.case = FALSE, perl = FALSE, fixed = FALSE)
selector_name(feature_names, assert_present = FALSE)
selector_invert(selector)
selector_intersect(selector_x, selector_y)
selector_union(selector_x, selector_y)
selector_setdiff(selector_x, selector_y)
selector_missing()
selector_cardinality_greater_than(min_cardinality)Arguments
- types
(
character)
Type of feature to select- pattern
(
character(1))
grep pattern- ignore.case
(
logical(1))
ignore case- perl
(
logical(1))
perl regex- fixed
(
logical(1))
fixed pattern instead of regex- feature_names
(
character)
Select features by exact name match.- assert_present
(
logical(1))
Throw an error iffeature_namesare not all present in the task being operated on.- selector
(
Selector)Selectorto invert.- selector_x
(
Selector)
FirstSelectorto query.- selector_y
(
Selector)
SecondSelectorto query.- min_cardinality
(
integer)
Minimum number of levels required to be selected.
Value
function: A Selector function that takes a Task and returns the feature names to be processed.
Functions
selector_all():selector_allselects all features.selector_none():selector_noneselects none of the features.selector_type():selector_typeselects features according to type. Legal types are listed inmlr_reflections$task_feature_types.selector_grep():selector_grepselects features with names matching thegrep()pattern.selector_name():selector_nameselects features with names matching exactly the names listed.selector_invert():selector_invertinverts a givenSelector: It always selects the features that would be dropped by the otherSelector, and drops the features that would be kept.selector_intersect():selector_intersectselects the intersection of twoSelectors: Only features selected by bothSelectors are selected in the end.selector_union():selector_unionselects the union of twoSelectors: Features selected by eitherSelectorare selected in the end.selector_setdiff():selector_setdiffselects the setdiff of twoSelectors: Features selected byselector_xare selected, unless they are also selected byselector_y.selector_missing():selector_missingselects features with missing values.selector_cardinality_greater_than():selector_cardinality_greater_thanselects categorical features with cardinality greater then a given threshold.
Details
A Selector is a function
that has one input argument (commonly named task). The function is called with the Task that a PipeOp
is operating on. The return value of the function must be a character vector that is a subset of the feature names present
in the Task.
For example, a Selector that selects all columns is
function(task) {
task$feature_names
}(this is the selector_all()-Selector.) A Selector that selects
all columns that have names shorter than four letters would be:
function(task) {
task$feature_names[
nchar(task$feature_names) < 4
]
}A Selector that selects only the column "Sepal.Length" (as in the iris task), if present, is
function(task) {
intersect(task$feature_names, "Sepal.Length")
}It is preferable to use the Selector construction functions like select_type, select_grep etc. if possible, instead of writing custom Selectors.
See also
Other Selectors:
mlr_pipeops_select
Examples
library("mlr3")
iris_task = tsk("iris")
bh_task = tsk("boston_housing")
sela = selector_all()
sela(iris_task)
#> [1] "Petal.Length" "Petal.Width" "Sepal.Length" "Sepal.Width"
sela(bh_task)
#> [1] "age" "b" "chas" "crim" "dis" "indus" "lat"
#> [8] "lon" "lstat" "nox" "ptratio" "rad" "rm" "tax"
#> [15] "town" "tract" "zn"
self = selector_type("factor")
self(iris_task)
#> character(0)
self(bh_task)
#> [1] "chas" "town"
selg = selector_grep("a.*i")
selg(iris_task)
#> [1] "Petal.Width" "Sepal.Width"
selg(bh_task)
#> [1] "ptratio"
selgi = selector_invert(selg)
selgi(iris_task)
#> [1] "Petal.Length" "Sepal.Length"
selgi(bh_task)
#> [1] "age" "b" "chas" "crim" "dis" "indus" "lat" "lon" "lstat"
#> [10] "nox" "rad" "rm" "tax" "town" "tract" "zn"
selgf = selector_union(selg, self)
selgf(iris_task)
#> [1] "Petal.Width" "Sepal.Width"
selgf(bh_task)
#> [1] "ptratio" "chas" "town"
