A Graph is a representation of a machine learning pipeline graph. It can be trained, and subsequently used for prediction.

A Graph is most useful when used together with Learner objects encapsulated as PipeOpLearner. In this case, the Graph produces Prediction data during its $predict() phase and can be used as a Learner itself (using the GraphLearner wrapper). However, the Graph can also be used without Learner objects to simply perform preprocessing of data, and, in principle, does not even need to handle data at all but can be used for general processes with dependency structure (although the PipeOps for this would need to be written).

Format

R6Class Graph

Construction

Graph$new()

Internals

A Graph is made up of a list of PipeOps, and a data.table of edges. Both for training and prediction, the Graph performs topological sorting of the PipeOps and executes their respective $train() or $predict() functions in order, moving the PipeOp results along the edges as input to other PipeOps.

Fields

  • pipeops :: named list of PipeOp
    Contains all PipeOps in the Graph, named by the PipeOp's $ids.

  • edges :: data.table with columns src_id (character), src_channel (character), dst_id (character), dst_channel (character)
    Table of connections between the PipeOps. A data.table. src_id and dst_id are $ids of PipeOps that must be present in the $pipeops list. src_channel and dst_channel must respectively be $output and $input channel names of the respective PipeOps.

  • is_trained :: logical(1)
    Is the Graph, i.e. are all of its PipeOps, trained, and can the Graph be used for prediction?

  • lhs :: character
    Ids of the 'left-hand-side' PipeOps that have some unconnected input channels and therefore act as Graph input layer.

  • rhs :: character
    Ids of the 'right-hand-side' PipeOps that have some unconnected output channels and therefore act as Graph output layer.

  • input :: data.table with columns name (character), train (character), predict (character), op.id (character), channel.name (character)
    Input channels of the Graph. For each channel lists the name, input type during training, input type during prediction, PipeOp $id of the PipeOp the channel pertains to, and channel name as the PipeOp knows it.

  • output :: data.table with columns name (character), train (character), predict (character), op.id (character), channel.name (character)
    Output channels of the Graph. For each channel lists the name, output type during training, output type during prediction, PipeOp $id of the PipeOp the channel pertains to, and channel name as the PipeOp knows it.

  • packages :: character
    Set of all required packages for the various methods in the Graph, a set union of all required packages of all contained PipeOp objects.

  • state :: named list
    Get / Set the $state of each of the members of PipeOp.

  • param_set :: ParamSet
    Parameters and parameter constraints. Parameter values are in $param_set$values. These are the union of $param_sets of all PipeOps in the Graph. Parameter names as seen by the Graph have the naming scheme <PipeOp$id>.<PipeOp original parameter name>. Changing $param_set$values also propagates the changes directly to the contained PipeOps and is an alternative to changing a PipeOps $param_set$values directly.

  • hash :: character(1)
    Stores a checksum calculated on the Graph configuration, which includes all PipeOp hashes (and therefore their $param_set$values) and a hash of $edges.

  • keep_results :: logical(1)
    Whether to store intermediate results in the PipeOp's $.result slot, mostly for debugging purposes. Default FALSE.

Methods

  • ids(sorted = FALSE)
    (logical(1)) -> character
    Get IDs of all PipeOps. This is in order that PipeOps were added if sorted is FALSE, and topologically sorted if sorted is TRUE.

  • add_pipeop(op)
    (PipeOp | Learner | Filter | ...) -> self
    Mutates Graph by adding a PipeOp to the Graph. This does not add any edges, so the new PipeOp will not be connected within the Graph at first.
    Instead of supplying a PipeOp directly, an object that can naturally be converted to a PipeOp can also be supplied, e.g. a Learner or a Filter; see as_pipeop().

  • add_edge(src_id, dst_id, src_channel = NULL, dst_channel = NULL)
    (character(1), character(1), character(1) | numeric(1) | NULL, character(1) | numeric(1) | NULL) -> self
    Add an edge from PipeOp src_id, and its channel src_channel (identified by its name or number as listed in the PipeOp's $output), to PipeOp dst_id's channel dst_channel (identified by its name or number as listed in the PipeOp's $input). If source or destination PipeOp have only one input / output channel and src_channel / dst_channel are therefore unambiguous, they can be omitted (i.e. left as NULL).

  • plot(html)
    (logical(1)) -> NULL
    Plot the Graph, using either the igraph package (for html = FALSE, default) or the visNetwork package for html = TRUE producing a htmlWidget. The htmlWidget can be rescaled using visOptions.

  • print()
    () -> NULL
    Print a representation of the Graph on the console. Output is a table with one row for each contained PipeOp and columns ID ($id of PipeOp), State (short representation of $state of PipeOp), sccssors (PipeOps that take their input directly from the PipeOp on this line), and prdcssors (the PipeOps that produce the data that is read as input by the PipeOp on this line).

  • set_names(old, new)
    (character, character) -> self
    Rename PipeOps: Change ID of each PipeOp as identified by old to the corresponding item in new. This should be used instead of changing a PipeOp's $id value directly!

  • train(input, single_input = TRUE)
    (any, logical(1)) -> named list
    Train Graph by traversing the Graphs' edges and calling all the PipeOp's $train methods in turn. Return a named list of outputs for each unconnected PipeOp out-channel, named according to the Graph's $output name column. During training, the $state member of each PipeOps will be set and the $is_trained slot of the Graph (and each individual PipeOp) will consequently be set to TRUE.
    If single_input is TRUE, the input value will be sent to each unconnected PipeOp's input channel (as listed in the Graph's $input). Typically, input should be a Task, although this is dependent on the PipeOps in the Graph. If single_input is FALSE, then input should be a list with the same length as the Graph's $input table has rows; each list item will be sent to a corresponding input channel of the Graph. If input is a named list, names must correspond to input channel names ($input$name) and inputs will be sent to the channels by name; otherwise they will be sent to the channels in order in which they are listed in $input.

  • predict(input, single_input = TRUE)
    (any, logical(1)) -> list of any
    Predict with the Graph by calling all the PipeOp's $train methods. Input and output, as well as the function of the single_input argument, are analogous to $train().

See also

Other mlr3pipelines backend related: PipeOpTaskPreprocSimple, PipeOpTaskPreproc, PipeOp, mlr_pipeops

Examples

library("mlr3") g = Graph$new()$ add_pipeop(PipeOpScale$new(id = "scale"))$ add_pipeop(PipeOpPCA$new(id = "pca"))$ add_edge("scale", "pca") g$input
#> name train predict op.id channel.name #> 1: scale.input Task Task scale input
g$output
#> name train predict op.id channel.name #> 1: pca.output Task Task pca output
task = tsk("iris") trained = g$train(task) trained[[1]]$data()
#> Species PC1 PC2 PC3 PC4 #> 1: setosa -2.2571412 -0.47842383 -0.12727962 0.02408751 #> 2: setosa -2.0740130 0.67188269 -0.23382552 0.10266284 #> 3: setosa -2.3563351 0.34076642 0.04405390 0.02828231 #> 4: setosa -2.2917068 0.59539986 0.09098530 -0.06573534 #> 5: setosa -2.3818627 -0.64467566 0.01568565 -0.03580287 #> --- #> 146: virginica 1.8642579 -0.38567404 0.25541818 0.38795715 #> 147: virginica 1.5593565 0.89369285 -0.02628330 0.21945690 #> 148: virginica 1.5160915 -0.26817075 0.17957678 0.11877324 #> 149: virginica 1.3682042 -1.00787793 0.93027872 0.02604141 #> 150: virginica 0.9574485 0.02425043 0.52648503 -0.16253353
task$filter(1:10) predicted = g$predict(task) predicted[[1]]$data()
#> Species PC1 PC2 PC3 PC4 #> 1: setosa -2.257141 -0.47842383 -0.12727962 0.024087508 #> 2: setosa -2.074013 0.67188269 -0.23382552 0.102662845 #> 3: setosa -2.356335 0.34076642 0.04405390 0.028282305 #> 4: setosa -2.291707 0.59539986 0.09098530 -0.065735340 #> 5: setosa -2.381863 -0.64467566 0.01568565 -0.035802870 #> 6: setosa -2.068701 -1.48420530 0.02687825 0.006586116 #> 7: setosa -2.435868 -0.04748512 0.33435030 -0.036652767 #> 8: setosa -2.225392 -0.22240300 -0.08839935 -0.024529919 #> 9: setosa -2.326845 1.11160370 0.14459247 -0.026769540 #> 10: setosa -2.177035 0.46744757 -0.25291827 -0.039766068