Skip to contents

Converts all columns of type type_from to type_to, using the corresponding R function (e.g. as.numeric(), as.factor()). It is possible to further subset the columns that should be affected using the affect_columns argument. The resulting Graph contains a PipeOpColApply, followed, if appropriate, by a PipeOpFixFactors.

Unlike R's as.factor() function, ppl("convert_types") will convert ordered types into (unordered) factor vectors.

Usage

pipeline_convert_types(
  type_from,
  type_to,
  affect_columns = NULL,
  id = NULL,
  fixfactors = NULL,
  more_args = list()
)

Arguments

type_from

character
Which column types to convert. May be any combination of "logical", "integer", "numeric", "factor", "ordered", "character", or "POSIXct".

type_to

character(1)
Which type to convert to. Must be a scalar value, exactly one of the types allowed in type_from.

affect_columns

function | Selector | NULL
Which columns to affect. This argument can further restrict the columns being converted, beyond the type_from argument. Must be a Selector-like function, which takes a Task as argument and returns a character of features to use.

id

character(1) | NULL
ID to give to the constructed PipeOps. Defaults to an ID built automatically from type_from and type_to. If a PipeOpFixFactors is appended, its ID will be paste0(id, "_ff").

fixfactors

logical(1) | NULL
Whether to append a PipeOpFixFactors. Defaults to TRUE if and only if type_to is "factor" or "ordered".

more_args

list
Additional arguments to give to the conversion function. This could e.g. be used to pass the timezone to as.POSIXct.

Value

Graph

Examples

library("mlr3")

data_chr = data.table::data.table(
  x = factor(letters[1:3]),
  y = letters[1:3],
  z = letters[1:3]
)
task_chr = TaskClassif$new("task_chr", data_chr, "x")
str(task_chr$data())
#> Classes ‘data.table’ and 'data.frame':	3 obs. of  3 variables:
#>  $ x: Factor w/ 3 levels "a","b","c": 1 2 3
#>  $ y: chr  "a" "b" "c"
#>  $ z: chr  "a" "b" "c"
#>  - attr(*, ".internal.selfref")=<externalptr> 

graph = ppl("convert_types", "character", "factor")
str(graph$train(task_chr)[[1]]$data())
#> Classes ‘data.table’ and 'data.frame':	3 obs. of  3 variables:
#>  $ x: Factor w/ 3 levels "a","b","c": 1 2 3
#>  $ y: Factor w/ 3 levels "a","b","c": 1 2 3
#>  $ z: Factor w/ 3 levels "a","b","c": 1 2 3
#>  - attr(*, ".internal.selfref")=<externalptr> 

graph_z = ppl("convert_types", "character", "factor",
  affect_columns = selector_name("z"))
graph_z$train(task_chr)[[1]]$data()
#>         x      z      y
#>    <fctr> <fctr> <char>
#> 1:      a      a      a
#> 2:      b      b      b
#> 3:      c      c      c

# `affect_columns` and `type_from` are both applied. The following
# looks for a 'numeric' column with name 'z', which is not present;
# the task is therefore unchanged.
graph_z = ppl("convert_types", "numeric", "factor",
  affect_columns = selector_name("z"))
graph_z$train(task_chr)[[1]]$data()
#>         x      y      z
#>    <fctr> <char> <char>
#> 1:      a      a      a
#> 2:      b      b      b
#> 3:      c      c      c