Extracts non-negative components from data by performing non-negative matrix factorization. Only affects non-negative numerical features. See nmf() for details.

Format

R6Class object inheriting from PipeOpTaskPreproc/PipeOp.

Construction

PipeOpNMF$new(id = "nmf", param_vals = list())
  • id :: character(1)
    Identifier of resulting object, default "nmf".

  • param_vals :: named list
    List of hyperparameter settings, overwriting the hyperparameter settings that would otherwise be set during construction. Default list().

Input and Output Channels

Input and output channels are inherited from PipeOpTaskPreproc.

The output is the input Task with all affected numeric features replaced by their non-negative components.

State

The $state is a named list with the $state elements inherited from PipeOpTaskPreproc, as well as the elements of the object returned by nmf().

Parameters

The parameters are the parameters inherited from PipeOpTaskPreproc, as well as:

  • rank :: integer(1)
    Factorization rank, i.e., number of components. Initialized to 2. See nmf().

  • method :: character(1)
    Specification of the NMF algorithm. Initialized to "brunet". See nmf().

  • seed :: character(1) | integer(1) | list() | object of class NMF | function()
    Specification of the starting point. See nmf().

  • nrun :: integer(1)
    Number of runs to performs. Default is 1. More than a single run allows for the computation of a consensus matrix which will also be stored in the $state. See nmf().

  • debug :: logical(1)
    Whether to toggle debug mode. Default is FALSE. See nmf().

  • keep.all :: logical(1)
    Whether all factorizations are to be saved and returned. Default is FALSE. Only has an effect if nrun > 1. See nmf().

  • parallel :: character(1) | integer(1) | logical(1)
    Specification of parallel handling if nrun > 1. Initialized to FALSE, as it is recommended to use mlr3's future-based parallelization. See nmf().

  • parallel.required :: character(1) | integer(1) | logical(1)
    Same as parallel, but an error is thrown if the computation cannot be performed in parallel or with the specified number of processors. Initialized to FALSE, as it is recommended to use mlr3's future-based parallelization. See nmf().

  • shared.memory :: logical(1)
    Whether shared memory should be enabled. See nmf().

  • simplifyCB :: logical(1)
    Whether callback results should be simplified. Default is TRUE. See nmf().

  • track :: logical(1)
    Whether error tracking should be enabled. Default is FALSE. See nmf().

  • verbose :: integer(1) | logical(1)
    Specification of verbosity. Default is FALSE. See nmf().

  • pbackend :: character(1) | integer(1) | NULL
    Specification of the parallel backend. It is recommended to use mlr3's future-based parallelization. See nmf().

  • callback | function()
    Callback function that is called after each run (if nrun > 1). See nmf().

Internals

Uses the nmf() function as well as basis(), coef() and ginv().

Methods

Only methods inherited from PipeOpTaskPreproc/PipeOp.

See also

Other PipeOps: PipeOpEnsemble, PipeOpImpute, PipeOpTargetTrafo, PipeOpTaskPreprocSimple, PipeOpTaskPreproc, PipeOp, mlr_pipeops_boxcox, mlr_pipeops_branch, mlr_pipeops_chunk, mlr_pipeops_classbalancing, mlr_pipeops_classifavg, mlr_pipeops_classweights, mlr_pipeops_colapply, mlr_pipeops_collapsefactors, mlr_pipeops_colroles, mlr_pipeops_copy, mlr_pipeops_datefeatures, mlr_pipeops_encodeimpact, mlr_pipeops_encodelmer, mlr_pipeops_encode, mlr_pipeops_featureunion, mlr_pipeops_filter, mlr_pipeops_fixfactors, mlr_pipeops_histbin, mlr_pipeops_ica, mlr_pipeops_imputeconstant, mlr_pipeops_imputehist, mlr_pipeops_imputelearner, mlr_pipeops_imputemean, mlr_pipeops_imputemedian, mlr_pipeops_imputemode, mlr_pipeops_imputeoor, mlr_pipeops_imputesample, mlr_pipeops_kernelpca, mlr_pipeops_learner, mlr_pipeops_missind, mlr_pipeops_modelmatrix, mlr_pipeops_multiplicityexply, mlr_pipeops_multiplicityimply, mlr_pipeops_mutate, mlr_pipeops_nop, mlr_pipeops_ovrsplit, mlr_pipeops_ovrunite, mlr_pipeops_pca, mlr_pipeops_proxy, mlr_pipeops_quantilebin, mlr_pipeops_randomprojection, mlr_pipeops_randomresponse, mlr_pipeops_regravg, mlr_pipeops_removeconstants, mlr_pipeops_renamecolumns, mlr_pipeops_replicate, mlr_pipeops_scalemaxabs, mlr_pipeops_scalerange, mlr_pipeops_scale, mlr_pipeops_select, mlr_pipeops_smote, mlr_pipeops_spatialsign, mlr_pipeops_subsample, mlr_pipeops_targetinvert, mlr_pipeops_targetmutate, mlr_pipeops_targettrafoscalerange, mlr_pipeops_textvectorizer, mlr_pipeops_threshold, mlr_pipeops_tunethreshold, mlr_pipeops_unbranch, mlr_pipeops_updatetarget, mlr_pipeops_vtreat, mlr_pipeops_yeojohnson, mlr_pipeops

Examples

library("mlr3") task = tsk("iris") pop = po("nmf") task$data()
#> Species Petal.Length Petal.Width Sepal.Length Sepal.Width #> 1: setosa 1.4 0.2 5.1 3.5 #> 2: setosa 1.4 0.2 4.9 3.0 #> 3: setosa 1.3 0.2 4.7 3.2 #> 4: setosa 1.5 0.2 4.6 3.1 #> 5: setosa 1.4 0.2 5.0 3.6 #> --- #> 146: virginica 5.2 2.3 6.7 3.0 #> 147: virginica 5.0 1.9 6.3 2.5 #> 148: virginica 5.2 2.0 6.5 3.0 #> 149: virginica 5.4 2.3 6.2 3.4 #> 150: virginica 5.1 1.8 5.9 3.0
pop$train(list(task))[[1]]$data()
#> Species NMF1 NMF2 #> 1: setosa 0.8290139 0.03579919 #> 2: setosa 0.7425097 0.06914820 #> 3: setosa 0.7587001 0.03754975 #> 4: setosa 0.7137460 0.07715860 #> 5: setosa 0.8317556 0.02832131 #> --- #> 146: virginica 0.4031991 0.86828720 #> 147: virginica 0.3400982 0.83741979 #> 148: virginica 0.3994916 0.84480031 #> 149: virginica 0.3823531 0.87108386 #> 150: virginica 0.3591966 0.80980611
pop$state
#> <Object of class: NMFfit> #> # Model: #> <Object of class:NMFstd> #> features: 4 #> basis/rank: 2 #> samples: 150 #> # Details: #> algorithm: brunet #> seed: random #> RNG: 10403L, 148L, ..., 581505866L [f712fcb2e9af94b00ba580916aea483e] #> distance metric: 'KL' #> residuals: 3.085118 #> miscellaneous: dt_columns=<character>, affected_cols=<character>, #> intasklayout=c("<data.table>", "<data.frame>"), #> outtasklayout=c("<data.table>", "<data.frame>"), #> outtaskshell=c("<data.table>", "<data.frame>") . (use 'misc(object)') #> Iterations: 450 #> Timing: #> user system elapsed #> 0.109 0.016 0.125