Non-negative Matrix Factorization

Extracts non-negative components from data by performing non-negative matrix factorization. Only affects non-negative numerical features. See nmf() for details.

Format

R6Class object inheriting from PipeOpTaskPreproc/PipeOp.

Construction

PipeOpNMF$new(id = "nmf", param_vals = list())

id :: character(1)
Identifier of resulting object, default "nmf".
param_vals :: named list
List of hyperparameter settings, overwriting the hyperparameter settings that would otherwise be set during construction. Default list().

Input and Output Channels

Input and output channels are inherited from PipeOpTaskPreproc.

The output is the input Task with all affected numeric features replaced by their non-negative components.

State

The $state is a named list with the $state elements inherited from PipeOpTaskPreproc, as well as the elements of the object returned by nmf().

Parameters

The parameters are the parameters inherited from PipeOpTaskPreproc, as well as:

rank :: integer(1)
Factorization rank, i.e., number of components. Initialized to 2. See nmf().
method :: character(1)
Specification of the NMF algorithm. Initialized to "brunet". See nmf().
seed :: character(1) | integer(1) | list() | object of class NMF | function()
Specification of the starting point. See nmf().
nrun :: integer(1)
Number of runs to performs. Default is 1. More than a single run allows for the computation of a consensus matrix which will also be stored in the $state. See nmf().
debug :: logical(1)
Whether to toggle debug mode. Default is FALSE. See nmf().
keep.all :: logical(1)
Whether all factorizations are to be saved and returned. Default is FALSE. Only has an effect if nrun > 1. See nmf().
parallel :: character(1) | integer(1) | logical(1)
Specification of parallel handling if nrun > 1. Initialized to FALSE, as it is recommended to use mlr3's future-based parallelization. See nmf().
parallel.required :: character(1) | integer(1) | logical(1)
Same as parallel, but an error is thrown if the computation cannot be performed in parallel or with the specified number of processors. Initialized to FALSE, as it is recommended to use mlr3's future-based parallelization. See nmf().
shared.memory :: logical(1)
Whether shared memory should be enabled. See nmf().
simplifyCB :: logical(1)
Whether callback results should be simplified. Default is TRUE. See nmf().
track :: logical(1)
Whether error tracking should be enabled. Default is FALSE. See nmf().
verbose :: integer(1) | logical(1)
Specification of verbosity. Default is FALSE. See nmf().
pbackend :: character(1) | integer(1) | NULL
Specification of the parallel backend. It is recommended to use mlr3's future-based parallelization. See nmf().
callback | function()
Callback function that is called after each run (if nrun > 1). See nmf().

Internals

Uses the nmf() function as well as basis(), coef() and ginv().

Fields

Only fields inherited from PipeOp.

Methods

Only methods inherited from PipeOpTaskPreproc/PipeOp.

Examples

library("mlr3")

task = tsk("iris")
pop = po("nmf")

task$data()
#>        Species Petal.Length Petal.Width Sepal.Length Sepal.Width
#>         <fctr>        <num>       <num>        <num>       <num>
#>   1:    setosa          1.4         0.2          5.1         3.5
#>   2:    setosa          1.4         0.2          4.9         3.0
#>   3:    setosa          1.3         0.2          4.7         3.2
#>   4:    setosa          1.5         0.2          4.6         3.1
#>   5:    setosa          1.4         0.2          5.0         3.6
#>  ---                                                            
#> 146: virginica          5.2         2.3          6.7         3.0
#> 147: virginica          5.0         1.9          6.3         2.5
#> 148: virginica          5.2         2.0          6.5         3.0
#> 149: virginica          5.4         2.3          6.2         3.4
#> 150: virginica          5.1         1.8          5.9         3.0
pop$train(list(task))[[1]]$data()
#>        Species      NMF1       NMF2
#>         <fctr>     <num>      <num>
#>   1:    setosa 0.5808520 0.04741536
#>   2:    setosa 0.5179125 0.06572959
#>   3:    setosa 0.5312856 0.04639643
#>   4:    setosa 0.4971806 0.06988613
#>   5:    setosa 0.5832509 0.04280681
#>  ---                               
#> 146: virginica 0.2290743 0.55676271
#> 147: virginica 0.1866306 0.53550080
#> 148: virginica 0.2279444 0.54191697
#> 149: virginica 0.2142419 0.55788518
#> 150: virginica 0.2018033 0.51875141

pop$state
#> $nmf
#> <Object of class: NMFfit>
#>  # Model:
#>   <Object of class:NMFstd>
#>   features: 4 
#>   basis/rank: 2 
#>   samples: 150 
#>  # Details:
#>   algorithm:  brunet 
#>   seed:  random 
#>   RNG: 10403L, 223L, ..., 581505866L [c6a8911f7b61c7ab6db7422cde75b137]
#>   distance metric:  'KL' 
#>   residuals:  3.084418 
#>   Iterations: 440 
#>   Timing:
#>      user  system elapsed 
#>      0.06    0.00    0.06 
#> 
#> $dt_columns
#> [1] "Petal.Length" "Petal.Width"  "Sepal.Length" "Sepal.Width" 
#> 
#> $affected_cols
#> [1] "Petal.Length" "Petal.Width"  "Sepal.Length" "Sepal.Width" 
#> 
#> $intasklayout
#> Key: <id>
#>              id    type
#>          <char>  <char>
#> 1: Petal.Length numeric
#> 2:  Petal.Width numeric
#> 3: Sepal.Length numeric
#> 4:  Sepal.Width numeric
#> 
#> $outtasklayout
#> Key: <id>
#>        id    type
#>    <char>  <char>
#> 1:   NMF1 numeric
#> 2:   NMF2 numeric
#> 
#> $outtaskshell
#> Empty data.table (0 rows and 3 cols): Species,NMF1,NMF2
#> 
#> attr(,"class")
#> [1] "PipeOpNMFstate"