Project Numeric Features onto a Randomly Sampled Subspace

Projects numeric features onto a randomly sampled subspace. All numeric features (or the ones selected by affect_columns) are replaced by numeric features PR1, PR2, ... PRn

Samples with features that contain missing values result in all PR1..PRn being NA for that sample, so it is advised to do imputation before random projections if missing values can be expected.

Format

R6Class object inheriting from PipeOpTaskPreprocSimple/PipeOpTaskPreproc/PipeOp.

Construction

PipeOpRandomProjection$new(id = "randomprojection", param_vals = list())

id :: character(1)
Identifier of resulting object, default "randomprojection".
param_vals :: named list
List of hyperparameter settings, overwriting the hyperparameter settings that would otherwise be set during construction. Default list().

Input and Output Channels

Input and output channels are inherited from PipeOpTaskPreproc.

The output is the input Task with affected numeric features projected onto a random subspace.

State

The $state is a named list with the $state elements inherited from PipeOpTaskPreproc, as well as an element $projection, a matrix.

Parameters

The parameters are the parameters inherited from PipeOpTaskPreproc, as well as:

rank :: integer(1)
The dimension of the subspace to project onto. Initialized to 1.

Internals

If there are n (affected) numeric features in the input Task, then $state$projection is a rank x m matrix. The output is calculated as input %*% state$projection.

The random projection matrix is obtained through Gram-Schmidt orthogonalization from a matrix with values standard normally distributed, which gives a distribution that is rotation invariant, as per Eaton: Multivariate Statistics, A Vector Space Approach, Pg. 234.

Fields

Only fields inherited from PipeOp.

Methods

Only methods inherited from PipeOpTaskPreprocSimple/PipeOpTaskPreproc/PipeOp.

Examples

library("mlr3")

task = tsk("iris")
pop = po("randomprojection", rank = 2)

task$data()
#>        Species Petal.Length Petal.Width Sepal.Length Sepal.Width
#>         <fctr>        <num>       <num>        <num>       <num>
#>   1:    setosa          1.4         0.2          5.1         3.5
#>   2:    setosa          1.4         0.2          4.9         3.0
#>   3:    setosa          1.3         0.2          4.7         3.2
#>   4:    setosa          1.5         0.2          4.6         3.1
#>   5:    setosa          1.4         0.2          5.0         3.6
#>  ---                                                            
#> 146: virginica          5.2         2.3          6.7         3.0
#> 147: virginica          5.0         1.9          6.3         2.5
#> 148: virginica          5.2         2.0          6.5         3.0
#> 149: virginica          5.4         2.3          6.2         3.4
#> 150: virginica          5.1         1.8          5.9         3.0
pop$train(list(task))[[1]]$data()
#>        Species       PR1       PR2
#>         <fctr>     <num>     <num>
#>   1:    setosa -3.940642 -4.540791
#>   2:    setosa -3.553407 -4.199705
#>   3:    setosa -3.610673 -4.179699
#>   4:    setosa -3.619223 -4.012498
#>   5:    setosa -3.993417 -4.516007
#>  ---                              
#> 146: virginica -4.676199 -5.418214
#> 147: virginica -4.353197 -4.785079
#> 148: virginica -4.787989 -5.121619
#> 149: virginica -4.966248 -5.193520
#> 150: virginica -4.731992 -4.643445

pop$state
#> $projection
#>                     PR1        PR2
#> Petal.Length -0.4828504  0.2956250
#> Petal.Width   0.4901167 -0.5457836
#> Sepal.Length -0.1762255 -0.6642988
#> Sepal.Width  -0.7039785 -0.4164531
#> 
#> $dt_columns
#> [1] "Petal.Length" "Petal.Width"  "Sepal.Length" "Sepal.Width" 
#> 
#> $affected_cols
#> [1] "Petal.Length" "Petal.Width"  "Sepal.Length" "Sepal.Width" 
#> 
#> $intasklayout
#> Key: <id>
#>              id    type
#>          <char>  <char>
#> 1: Petal.Length numeric
#> 2:  Petal.Width numeric
#> 3: Sepal.Length numeric
#> 4:  Sepal.Width numeric
#> 
#> $outtasklayout
#> Key: <id>
#>        id    type
#>    <char>  <char>
#> 1:    PR1 numeric
#> 2:    PR2 numeric
#> 
#> $outtaskshell
#> Empty data.table (0 rows and 3 cols): Species,PR1,PR2
#>