Skip to contents

Materializes the active view of a Task by replacing its DataBackend with a new backend containing only the rows and columns currently used by the task.

This can be useful after operations that create virtual task views, such as Task $filter(), $select(), or $cbind(). In particular, many PipeOpTaskPreproc operations use Task $cbind() internally, which can create nested virtual backends. Materializing the view can reduce backend nesting and may free memory or speed up later data access.

Note that Task $materialize_view() only materializes the currently active view. Columns without any column role are dropped, and observations occuring more than once (duplicates in $row_ids), the resulting backend contains it only once, but the new task view will still contain it multiple times (duplicates in $row_ids are preserved).

Format

R6Class object inheriting from PipeOp.

Construction

PipeOpMaterialize$new(id = "materialize")

  • id :: character(1)
    Identifier of resulting object. See $id slot of PipeOp.

Input and Output Channels

PipeOpMaterialize has one input channel named "input", taking a Task both during training and prediction.

PipeOpMaterialize has one output channel named "output", producing a Task both during training and prediction.

The output is the input Task with the active view materialized.

State

The $state is left empty (list()).

Parameters

PipeOpMaterialize has no parameters.

Internals

PipeOpMaterialize calls Task $materialize_view() on a clone of the input task, both during training and prediction. During training, the internal validation task is also materialized using $materialize_view(internal_valid_task = TRUE), but not during prediction.

Fields

Only fields inherited from PipeOp.

Methods

Only methods inherited from PipeOp.

See also

https://mlr-org.com/pipeops.html

Other mlr3pipelines backend related: Graph, PipeOp, PipeOpTargetTrafo, PipeOpTaskPreproc, PipeOpTaskPreprocSimple, mlr_graphs, mlr_pipeops, mlr_pipeops_updatetarget

Other PipeOps: PipeOp, PipeOpEncodePL, PipeOpEnsemble, PipeOpImpute, PipeOpTargetTrafo, PipeOpTaskPreproc, PipeOpTaskPreprocSimple, mlr_pipeops, mlr_pipeops_adas, mlr_pipeops_blsmote, mlr_pipeops_boxcox, mlr_pipeops_branch, mlr_pipeops_chunk, mlr_pipeops_classbalancing, mlr_pipeops_classifavg, mlr_pipeops_classweights, mlr_pipeops_classweightsex, mlr_pipeops_colapply, mlr_pipeops_collapsefactors, mlr_pipeops_colroles, mlr_pipeops_copy, mlr_pipeops_datefeatures, mlr_pipeops_decode, mlr_pipeops_encode, mlr_pipeops_encodeimpact, mlr_pipeops_encodelmer, mlr_pipeops_encodeplquantiles, mlr_pipeops_encodepltree, mlr_pipeops_featureunion, mlr_pipeops_filter, mlr_pipeops_fixfactors, mlr_pipeops_histbin, mlr_pipeops_ica, mlr_pipeops_imputeconstant, mlr_pipeops_imputehist, mlr_pipeops_imputelearner, mlr_pipeops_imputemean, mlr_pipeops_imputemedian, mlr_pipeops_imputemode, mlr_pipeops_imputeoor, mlr_pipeops_imputesample, mlr_pipeops_info, mlr_pipeops_isomap, mlr_pipeops_kernelpca, mlr_pipeops_learner, mlr_pipeops_learner_pi_cvplus, mlr_pipeops_learner_quantiles, mlr_pipeops_missind, mlr_pipeops_modelmatrix, mlr_pipeops_multiplicityexply, mlr_pipeops_multiplicityimply, mlr_pipeops_mutate, mlr_pipeops_nearmiss, mlr_pipeops_nmf, mlr_pipeops_nop, mlr_pipeops_ovrsplit, mlr_pipeops_ovrunite, mlr_pipeops_pca, mlr_pipeops_proxy, mlr_pipeops_quantilebin, mlr_pipeops_randomprojection, mlr_pipeops_randomresponse, mlr_pipeops_regravg, mlr_pipeops_removeconstants, mlr_pipeops_renamecolumns, mlr_pipeops_replicate, mlr_pipeops_rowapply, mlr_pipeops_scale, mlr_pipeops_scalemaxabs, mlr_pipeops_scalerange, mlr_pipeops_select, mlr_pipeops_smote, mlr_pipeops_smotenc, mlr_pipeops_spatialsign, mlr_pipeops_splines, mlr_pipeops_subsample, mlr_pipeops_targetinvert, mlr_pipeops_targetmutate, mlr_pipeops_targettrafoscalerange, mlr_pipeops_textvectorizer, mlr_pipeops_threshold, mlr_pipeops_tomek, mlr_pipeops_tunethreshold, mlr_pipeops_unbranch, mlr_pipeops_updatetarget, mlr_pipeops_vtreat, mlr_pipeops_yeojohnson

Examples

library("mlr3")

task = tsk("iris")
task$select("Petal.Length")$filter(1:10)
task$backend$colnames
#> [1] "Sepal.Length" "Sepal.Width"  "Petal.Length" "Petal.Width"  "Species"     
#> [6] "..row_id"    
task$backend$nrow
#> [1] 150

pom = PipeOpMaterialize$new("materialize")
materialized = pom$train(list(task))[[1]]
materialized$backend$colnames
#> [1] "..row_id"     "Petal.Length" "Species"     
materialized$backend$nrow
#> [1] 10