A Graph
is a representation of a machine learning pipeline graph. It can be trained, and subsequently used for prediction.
A Graph
is most useful when used together with Learner
objects encapsulated as PipeOpLearner
. In this case,
the Graph
produces Prediction
data during its $predict()
phase and can be used as a Learner
itself (using the GraphLearner
wrapper). However, the Graph
can also be used without Learner
objects to simply
perform preprocessing of data, and, in principle, does not even need to handle data at all but can be used for general processes with
dependency structure (although the PipeOp
s for this would need to be written).
Internals
A Graph
is made up of a list of PipeOp
s, and a data.table
of edges. Both for training and prediction, the Graph
performs topological sorting of the PipeOp
s and executes their respective $train()
or $predict()
functions in order, moving
the PipeOp
results along the edges as input to other PipeOp
s.
Fields
pipeops
:: namedlist
ofPipeOp
Contains allPipeOp
s in theGraph
, named by thePipeOp
's$id
s.edges
::data.table
with columnssrc_id
(character
),src_channel
(character
),dst_id
(character
),dst_channel
(character
)
Table of connections between thePipeOp
s. Adata.table
.src_id
anddst_id
are$id
s ofPipeOp
s that must be present in the$pipeops
list.src_channel
anddst_channel
must respectively be$output
and$input
channel names of the respectivePipeOp
s.is_trained
::logical(1)
Is theGraph
, i.e. are all of itsPipeOp
s, trained, and can theGraph
be used for prediction?lhs
::character
Ids of the 'left-hand-side'PipeOp
s that have some unconnected input channels and therefore act asGraph
input layer.rhs
::character
Ids of the 'right-hand-side'PipeOp
s that have some unconnected output channels and therefore act asGraph
output layer.input
::data.table
with columnsname
(character
),train
(character
),predict
(character
),op.id
(character
),channel.name
(character
)
Input channels of theGraph
. For each channel lists the name, input type during training, input type during prediction,PipeOp
$id
of thePipeOp
the channel pertains to, and channel name as thePipeOp
knows it.output
::data.table
with columnsname
(character
),train
(character
),predict
(character
),op.id
(character
),channel.name
(character
)
Output channels of theGraph
. For each channel lists the name, output type during training, output type during prediction,PipeOp
$id
of thePipeOp
the channel pertains to, and channel name as thePipeOp
knows it.packages
::character
Set of all required packages for the various methods in theGraph
, a set union of all required packages of all containedPipeOp
objects.state
:: namedlist
Get / Set the$state
of each of the members ofPipeOp
.param_set
::ParamSet
Parameters and parameter constraints. Parameter values are in$param_set$values
. These are the union of$param_set
s of allPipeOp
s in theGraph
. Parameter names as seen by theGraph
have the naming scheme<PipeOp$id>.<PipeOp original parameter name>
. Changing$param_set$values
also propagates the changes directly to the containedPipeOp
s and is an alternative to changing aPipeOp
s$param_set$values
directly.hash
::character(1)
Stores a checksum calculated on theGraph
configuration, which includes allPipeOp
hashes (and therefore their$param_set$values
) and a hash of$edges
.phash
::character(1)
Stores a checksum calculated on theGraph
configuration, which includes allPipeOp
hashes except their$param_set$values
, and a hash of$edges
.keep_results
::logical(1)
Whether to store intermediate results in thePipeOp
's$.result
slot, mostly for debugging purposes. DefaultFALSE
.man
::character(1)
Identifying string of the help page that shows withhelp()
.
Methods
ids(sorted = FALSE)
(logical(1)
) ->character
Get IDs of allPipeOp
s. This is in order thatPipeOp
s were added ifsorted
isFALSE
, and topologically sorted ifsorted
isTRUE
.add_pipeop(op, clone = TRUE)
(PipeOp
|Learner
|Filter
|...
,logical(1)
) ->self
MutatesGraph
by adding aPipeOp
to theGraph
. This does not add any edges, so the newPipeOp
will not be connected within theGraph
at first.
Instead of supplying aPipeOp
directly, an object that can naturally be converted to aPipeOp
can also be supplied, e.g. aLearner
or aFilter
; seeas_pipeop()
. The argument given asop
is cloned ifclone
isTRUE
(default); to access aGraph
'sPipeOp
s by-reference, use$pipeops
.
Note that$add_pipeop()
is a relatively low-level operation, it is recommended to build graphs using%>>%
.add_edge(src_id, dst_id, src_channel = NULL, dst_channel = NULL)
(character(1)
,character(1)
,character(1)
|numeric(1)
|NULL
,character(1)
|numeric(1)
|NULL
) ->self
Add an edge fromPipeOp
src_id
, and its channelsrc_channel
(identified by its name or number as listed in thePipeOp
's$output
), toPipeOp
dst_id
's channeldst_channel
(identified by its name or number as listed in thePipeOp
's$input
). If source or destinationPipeOp
have only one input / output channel andsrc_channel
/dst_channel
are therefore unambiguous, they can be omitted (i.e. left asNULL
).chain(gs, clone = TRUE)
(list
ofGraph
s,logical(1)
) ->self
Takes a list ofGraph
s orPipeOp
s (or objects that can be automatically converted intoGraph
s orPipeOp
s, seeas_graph()
andas_pipeop()
) as inputs and joins them in a serialGraph
coming afterself
, as if connecting them using%>>%
.plot(html = FALSE, horizontal = FALSE)
(logical(1)
,logical(1)
) ->NULL
Plot theGraph
, using either the igraph package (forhtml = FALSE
, default) or thevisNetwork
package forhtml = TRUE
producing ahtmlWidget
. ThehtmlWidget
can be rescaled usingvisOptions
. Forhtml = FALSE
, the orientation of the plotted graph can be controlled throughhorizontal
.print(dot = FALSE, dotname = "dot", fontsize = 24L)
(logical(1)
,character(1)
,integer(1)
) ->NULL
Print a representation of theGraph
on the console. Ifdot
isFALSE
, output is a table with one row for each containedPipeOp
and columnsID
($id
ofPipeOp
),State
(short representation of$state
ofPipeOp
),sccssors
(PipeOp
s that take their input directly from thePipeOp
on this line), andprdcssors
(thePipeOp
s that produce the data that is read as input by thePipeOp
on this line). Ifdot
isTRUE
, print a DOT representation of theGraph
on the console. The DOT output can be named via the argumentdotname
and thefontsize
can also be specified.set_names(old, new)
(character
,character
) ->self
RenamePipeOp
s: Change ID of eachPipeOp
as identified byold
to the corresponding item innew
. This should be used instead of changing aPipeOp
's$id
value directly!update_ids(prefix = "", postfix = "")
(character
,character
) ->self
Pre- or postfixPipeOp
's existing ids. Bothprefix
andpostfix
default to""
, i.e. no changes.train(input, single_input = TRUE)
(any
,logical(1)
) -> namedlist
TrainGraph
by traversing theGraph
s' edges and calling all thePipeOp
's$train
methods in turn. Return a namedlist
of outputs for each unconnectedPipeOp
out-channel, named according to theGraph
's$output
name
column. During training, the$state
member of eachPipeOp
s will be set and the$is_trained
slot of theGraph
(and each individualPipeOp
) will consequently be set toTRUE
.
Ifsingle_input
isTRUE
, theinput
value will be sent to each unconnectedPipeOp
's input channel (as listed in theGraph
's$input
). Typically,input
should be aTask
, although this is dependent on thePipeOp
s in theGraph
. Ifsingle_input
isFALSE
, theninput
should be alist
with the same length as theGraph
's$input
table has rows; each list item will be sent to a corresponding input channel of theGraph
. Ifinput
is a namedlist
, names must correspond to input channel names ($input$name
) and inputs will be sent to the channels by name; otherwise they will be sent to the channels in order in which they are listed in$input
.predict(input, single_input = TRUE)
(any
,logical(1)
) ->list
ofany
Predict with theGraph
by calling all thePipeOp
's$train
methods. Input and output, as well as the function of thesingle_input
argument, are analogous to$train()
.help(help_type)
(character(1)
) -> help file
Displays the help file of the concretePipeOp
instance.help_type
is one of"text"
,"html"
,"pdf"
and behaves as thehelp_type
argument of R'shelp()
.
See also
Other mlr3pipelines backend related:
PipeOp
,
PipeOpTargetTrafo
,
PipeOpTaskPreproc
,
PipeOpTaskPreprocSimple
,
mlr_graphs
,
mlr_pipeops
,
mlr_pipeops_updatetarget
Examples
library("mlr3")
g = Graph$new()$
add_pipeop(PipeOpScale$new(id = "scale"))$
add_pipeop(PipeOpPCA$new(id = "pca"))$
add_edge("scale", "pca")
g$input
#> name train predict op.id channel.name
#> <char> <char> <char> <char> <char>
#> 1: scale.input Task Task scale input
g$output
#> name train predict op.id channel.name
#> <char> <char> <char> <char> <char>
#> 1: pca.output Task Task pca output
task = tsk("iris")
trained = g$train(task)
trained[[1]]$data()
#> Species PC1 PC2 PC3 PC4
#> <fctr> <num> <num> <num> <num>
#> 1: setosa -2.2571412 -0.47842383 0.12727962 -0.02408751
#> 2: setosa -2.0740130 0.67188269 0.23382552 -0.10266284
#> 3: setosa -2.3563351 0.34076642 -0.04405390 -0.02828231
#> 4: setosa -2.2917068 0.59539986 -0.09098530 0.06573534
#> 5: setosa -2.3818627 -0.64467566 -0.01568565 0.03580287
#> ---
#> 146: virginica 1.8642579 -0.38567404 -0.25541818 -0.38795715
#> 147: virginica 1.5593565 0.89369285 0.02628330 -0.21945690
#> 148: virginica 1.5160915 -0.26817075 -0.17957678 -0.11877324
#> 149: virginica 1.3682042 -1.00787793 -0.93027872 -0.02604141
#> 150: virginica 0.9574485 0.02425043 -0.52648503 0.16253353
task$filter(1:10)
predicted = g$predict(task)
predicted[[1]]$data()
#> Species PC1 PC2 PC3 PC4
#> <fctr> <num> <num> <num> <num>
#> 1: setosa -2.257141 -0.47842383 0.12727962 -0.024087508
#> 2: setosa -2.074013 0.67188269 0.23382552 -0.102662845
#> 3: setosa -2.356335 0.34076642 -0.04405390 -0.028282305
#> 4: setosa -2.291707 0.59539986 -0.09098530 0.065735340
#> 5: setosa -2.381863 -0.64467566 -0.01568565 0.035802870
#> 6: setosa -2.068701 -1.48420530 -0.02687825 -0.006586116
#> 7: setosa -2.435868 -0.04748512 -0.33435030 0.036652767
#> 8: setosa -2.225392 -0.22240300 0.08839935 0.024529919
#> 9: setosa -2.326845 1.11160370 -0.14459247 0.026769540
#> 10: setosa -2.177035 0.46744757 0.25291827 0.039766068