A Graph is a representation of a machine learning pipeline graph. It can be trained, and subsequently used for prediction.
A Graph is most useful when used together with Learner objects encapsulated as PipeOpLearner. In this case,
the Graph produces Prediction data during its $predict() phase and can be used as a Learner
itself (using the GraphLearner wrapper). However, the Graph can also be used without Learner objects to simply
perform preprocessing of data, and, in principle, does not even need to handle data at all but can be used for general processes with
dependency structure (although the PipeOps for this would need to be written).
Internals
A Graph is made up of a list of PipeOps, and a data.table of edges. Both for training and prediction, the Graph
performs topological sorting of the PipeOps and executes their respective $train() or $predict() functions in order, moving
the PipeOp results along the edges as input to other PipeOps.
Fields
pipeops:: namedlistofPipeOp
Contains allPipeOps in theGraph, named by thePipeOp's$ids.edges::data.tablewith columnssrc_id(character),src_channel(character),dst_id(character),dst_channel(character)
Table of connections between thePipeOps. Adata.table.src_idanddst_idare$ids ofPipeOps that must be present in the$pipeopslist.src_channelanddst_channelmust respectively be$outputand$inputchannel names of the respectivePipeOps.is_trained::logical(1)
Is theGraph, i.e. are all of itsPipeOps, trained, and can theGraphbe used for prediction?lhs::character
Ids of the 'left-hand-side'PipeOps that have some unconnected input channels and therefore act asGraphinput layer.rhs::character
Ids of the 'right-hand-side'PipeOps that have some unconnected output channels and therefore act asGraphoutput layer.input::data.tablewith columnsname(character),train(character),predict(character),op.id(character),channel.name(character)
Input channels of theGraph. For each channel lists the name, input type during training, input type during prediction,PipeOp$idof thePipeOpthe channel pertains to, and channel name as thePipeOpknows it.output::data.tablewith columnsname(character),train(character),predict(character),op.id(character),channel.name(character)
Output channels of theGraph. For each channel lists the name, output type during training, output type during prediction,PipeOp$idof thePipeOpthe channel pertains to, and channel name as thePipeOpknows it.packages::character
Set of all required packages for the various methods in theGraph, a set union of all required packages of all containedPipeOpobjects.state:: namedlist
Get / Set the$stateof each of the members ofPipeOp.param_set::ParamSet
Parameters and parameter constraints. Parameter values are in$param_set$values. These are the union of$param_sets of allPipeOps in theGraph. Parameter names as seen by theGraphhave the naming scheme<PipeOp$id>.<PipeOp original parameter name>. Changing$param_set$valuesalso propagates the changes directly to the containedPipeOps and is an alternative to changing aPipeOps$param_set$valuesdirectly.hash::character(1)
Stores a checksum calculated on theGraphconfiguration, which includes allPipeOphashes (and therefore their$param_set$values) and a hash of$edges.phash::character(1)
Stores a checksum calculated on theGraphconfiguration, which includes allPipeOphashes except their$param_set$values, and a hash of$edges.keep_results::logical(1)
Whether to store intermediate results in thePipeOp's$.resultslot, mostly for debugging purposes. DefaultFALSE.man::character(1)
Identifying string of the help page that shows withhelp().
Methods
ids(sorted = FALSE)
(logical(1)) ->character
Get IDs of allPipeOps. This is in order thatPipeOps were added ifsortedisFALSE, and topologically sorted ifsortedisTRUE.add_pipeop(op, clone = TRUE)
(PipeOp|Learner|Filter|...,logical(1)) ->self
MutatesGraphby adding aPipeOpto theGraph. This does not add any edges, so the newPipeOpwill not be connected within theGraphat first.
Instead of supplying aPipeOpdirectly, an object that can naturally be converted to aPipeOpcan also be supplied, e.g. aLearneror aFilter; seeas_pipeop(). The argument given asopis cloned ifcloneisTRUE(default); to access aGraph'sPipeOps by-reference, use$pipeops.
Note that$add_pipeop()is a relatively low-level operation, it is recommended to build graphs using%>>%.add_edge(src_id, dst_id, src_channel = NULL, dst_channel = NULL)
(character(1),character(1),character(1)|numeric(1)|NULL,character(1)|numeric(1)|NULL) ->self
Add an edge fromPipeOpsrc_id, and its channelsrc_channel(identified by its name or number as listed in thePipeOp's$output), toPipeOpdst_id's channeldst_channel(identified by its name or number as listed in thePipeOp's$input). If source or destinationPipeOphave only one input / output channel andsrc_channel/dst_channelare therefore unambiguous, they can be omitted (i.e. left asNULL).chain(gs, clone = TRUE)
(listofGraphs,logical(1)) ->self
Takes a list ofGraphs orPipeOps (or objects that can be automatically converted intoGraphs orPipeOps, seeas_graph()andas_pipeop()) as inputs and joins them in a serialGraphcoming afterself, as if connecting them using%>>%.plot(html = FALSE, horizontal = FALSE)
(logical(1),logical(1)) ->NULL
Plot theGraph, using either the igraph package (forhtml = FALSE, default) or thevisNetworkpackage forhtml = TRUEproducing ahtmlWidget. ThehtmlWidgetcan be rescaled usingvisOptions. Forhtml = FALSE, the orientation of the plotted graph can be controlled throughhorizontal.print(dot = FALSE, dotname = "dot", fontsize = 24L)
(logical(1),character(1),integer(1)) ->NULL
Print a representation of theGraphon the console. IfdotisFALSE, output is a table with one row for each containedPipeOpand columnsID($idofPipeOp),State(short representation of$stateofPipeOp),sccssors(PipeOps that take their input directly from thePipeOpon this line), andprdcssors(thePipeOps that produce the data that is read as input by thePipeOpon this line). IfdotisTRUE, print a DOT representation of theGraphon the console. The DOT output can be named via the argumentdotnameand thefontsizecan also be specified.set_names(old, new)
(character,character) ->self
RenamePipeOps: Change ID of eachPipeOpas identified byoldto the corresponding item innew. This should be used instead of changing aPipeOp's$idvalue directly!update_ids(prefix = "", postfix = "")
(character,character) ->self
Pre- or postfixPipeOp's existing ids. Bothprefixandpostfixdefault to"", i.e. no changes.train(input, single_input = TRUE)
(any,logical(1)) -> namedlist
TrainGraphby traversing theGraphs' edges and calling all thePipeOp's$trainmethods in turn. Return a namedlistof outputs for each unconnectedPipeOpout-channel, named according to theGraph's$outputnamecolumn. During training, the$statemember of eachPipeOps will be set and the$is_trainedslot of theGraph(and each individualPipeOp) will consequently be set toTRUE.
Ifsingle_inputisTRUE, theinputvalue will be sent to each unconnectedPipeOp's input channel (as listed in theGraph's$input). Typically,inputshould be aTask, although this is dependent on thePipeOps in theGraph. Ifsingle_inputisFALSE, theninputshould be alistwith the same length as theGraph's$inputtable has rows; each list item will be sent to a corresponding input channel of theGraph. Ifinputis a namedlist, names must correspond to input channel names ($input$name) and inputs will be sent to the channels by name; otherwise they will be sent to the channels in order in which they are listed in$input.predict(input, single_input = TRUE)
(any,logical(1)) ->listofany
Predict with theGraphby calling all thePipeOp's$trainmethods. Input and output, as well as the function of thesingle_inputargument, are analogous to$train().help(help_type)
(character(1)) -> help file
Displays the help file of the concretePipeOpinstance.help_typeis one of"text","html","pdf"and behaves as thehelp_typeargument of R'shelp().
See also
Other mlr3pipelines backend related:
PipeOp,
PipeOpTargetTrafo,
PipeOpTaskPreproc,
PipeOpTaskPreprocSimple,
mlr_graphs,
mlr_pipeops,
mlr_pipeops_updatetarget
Examples
library("mlr3")
g = Graph$new()$
add_pipeop(PipeOpScale$new(id = "scale"))$
add_pipeop(PipeOpPCA$new(id = "pca"))$
add_edge("scale", "pca")
g$input
#> name train predict op.id channel.name
#> <char> <char> <char> <char> <char>
#> 1: scale.input Task Task scale input
g$output
#> name train predict op.id channel.name
#> <char> <char> <char> <char> <char>
#> 1: pca.output Task Task pca output
task = tsk("iris")
trained = g$train(task)
trained[[1]]$data()
#> Species PC1 PC2 PC3 PC4
#> <fctr> <num> <num> <num> <num>
#> 1: setosa -2.2571412 -0.47842383 0.12727962 -0.02408751
#> 2: setosa -2.0740130 0.67188269 0.23382552 -0.10266284
#> 3: setosa -2.3563351 0.34076642 -0.04405390 -0.02828231
#> 4: setosa -2.2917068 0.59539986 -0.09098530 0.06573534
#> 5: setosa -2.3818627 -0.64467566 -0.01568565 0.03580287
#> ---
#> 146: virginica 1.8642579 -0.38567404 -0.25541818 -0.38795715
#> 147: virginica 1.5593565 0.89369285 0.02628330 -0.21945690
#> 148: virginica 1.5160915 -0.26817075 -0.17957678 -0.11877324
#> 149: virginica 1.3682042 -1.00787793 -0.93027872 -0.02604141
#> 150: virginica 0.9574485 0.02425043 -0.52648503 0.16253353
task$filter(1:10)
predicted = g$predict(task)
predicted[[1]]$data()
#> Species PC1 PC2 PC3 PC4
#> <fctr> <num> <num> <num> <num>
#> 1: setosa -2.257141 -0.47842383 0.12727962 -0.024087508
#> 2: setosa -2.074013 0.67188269 0.23382552 -0.102662845
#> 3: setosa -2.356335 0.34076642 -0.04405390 -0.028282305
#> 4: setosa -2.291707 0.59539986 -0.09098530 0.065735340
#> 5: setosa -2.381863 -0.64467566 -0.01568565 0.035802870
#> 6: setosa -2.068701 -1.48420530 -0.02687825 -0.006586116
#> 7: setosa -2.435868 -0.04748512 -0.33435030 0.036652767
#> 8: setosa -2.225392 -0.22240300 0.08839935 0.024529919
#> 9: setosa -2.326845 1.11160370 -0.14459247 0.026769540
#> 10: setosa -2.177035 0.46744757 0.25291827 0.039766068
