Even though a
Selector is a
function that can be written itself, it is preferable to use the
shown here. Each of these can be called with its arguments to create a
Selector, which can then be given to the
selector parameter, or many
affect_columns parameter. See there for examples of this usage.
selector_all() selector_none() selector_type(types) selector_grep(pattern, ignore.case = FALSE, perl = FALSE, fixed = FALSE) selector_name(feature_names, assert_present = FALSE) selector_invert(selector) selector_intersect(selector_x, selector_y) selector_union(selector_x, selector_y) selector_setdiff(selector_x, selector_y) selector_missing() selector_cardinality_greater_than(min_cardinality)
Type of feature to select
fixed pattern instead of regex
Select features by exact name match.
Throw an error if
feature_namesare not all present in the task being operated on.
Minimum number of levels required to be selected.
Selector function that takes a
Task and returns the feature names to be processed.
selector_allselects all features.
selector_noneselects none of the features.
selector_typeselects features according to type. Legal types are listed in
selector_grepselects features with names matching the
selector_nameselects features with names matching exactly the names listed.
selector_invertinverts a given
Selector: It always selects the features that would be dropped by the other
Selector, and drops the features that would be kept.
selector_intersectselects the intersection of two
Selectors: Only features selected by both
Selectors are selected in the end.
selector_unionselects the union of two
Selectors: Features selected by either
Selectorare selected in the end.
selector_setdiffselects the setdiff of two
Selectors: Features selected by
selector_xare selected, unless they are also selected by
selector_missingselects features with missing values.
selector_cardinality_greater_thanselects categorical features with cardinality greater then a given threshold.
Selector is a
that has one input argument (commonly named
task). The function is called with the
Task that a
is operating on. The return value of the function must be a
character vector that is a subset of the feature names present
For example, a
Selector that selects all columns is
(this is the
Selector that selects
all columns that have names shorter than four letters would be:
Selector that selects only the column
"Sepal.Length" (as in the iris task), if present, is
It is preferable to use the
Selector construction functions like
select_grep etc. if possible, instead of writing custom
library("mlr3") iris_task = tsk("iris") bh_task = tsk("boston_housing") sela = selector_all() sela(iris_task) #>  "Petal.Length" "Petal.Width" "Sepal.Length" "Sepal.Width" sela(bh_task) #>  "age" "b" "chas" "cmedv" "crim" "dis" "indus" #>  "lat" "lon" "lstat" "nox" "ptratio" "rad" "rm" #>  "tax" "town" "tract" "zn" self = selector_type("factor") self(iris_task) #> character(0) self(bh_task) #>  "chas" "town" selg = selector_grep("a.*i") selg(iris_task) #>  "Petal.Width" "Sepal.Width" selg(bh_task) #>  "ptratio" selgi = selector_invert(selg) selgi(iris_task) #>  "Petal.Length" "Sepal.Length" selgi(bh_task) #>  "age" "b" "chas" "cmedv" "crim" "dis" "indus" "lat" "lon" #>  "lstat" "nox" "rad" "rm" "tax" "town" "tract" "zn" selgf = selector_union(selg, self) selgf(iris_task) #>  "Petal.Width" "Sepal.Width" selgf(bh_task) #>  "ptratio" "chas" "town"