# Project Numeric Features onto a Randomly Sampled Subspace

Source:`R/PipeOpRandomProjection.R`

`mlr_pipeops_randomprojection.Rd`

Projects numeric features onto a randomly sampled subspace. All numeric features
(or the ones selected by `affect_columns`

) are replaced by numeric features
`PR1`

, `PR2`

, ... `PRn`

Samples with features that contain missing values result in all `PR1`

..`PRn`

being
NA for that sample, so it is advised to do imputation *before* random projections
if missing values can be expected.

## Format

`R6Class`

object inheriting from `PipeOpTaskPreprocSimple`

/`PipeOpTaskPreproc`

/`PipeOp`

.

## Construction

`id`

::`character(1)`

Identifier of resulting object, default`"randomprojection"`

.`param_vals`

:: named`list`

List of hyperparameter settings, overwriting the hyperparameter settings that would otherwise be set during construction. Default`list()`

.

## Input and Output Channels

Input and output channels are inherited from `PipeOpTaskPreproc`

.

The output is the input `Task`

with affected numeric features
projected onto a random subspace.

## State

The `$state`

is a named `list`

with the `$state`

elements inherited from `PipeOpTaskPreproc`

,
as well as an element `$projection`

, a `matrix`

.

## Parameters

The parameters are the parameters inherited from `PipeOpTaskPreproc`

, as well as:

`rank`

::`integer(1)`

The dimension of the subspace to project onto. Initialized to 1.

## Internals

If there are `n`

(affected) numeric features in the input `Task`

,
then `$state$projection`

is a `rank`

x `m`

`matrix`

. The output is calculated as
`input %*% state$projection`

.

The random projection matrix is obtained through Gram-Schmidt orthogonalization from a matrix with values standard normally distributed, which gives a distribution that is rotation invariant, as per Eaton: Multivariate Statistics, A Vector Space Approach, Pg. 234.

## Methods

Only methods inherited from `PipeOpTaskPreprocSimple`

/`PipeOpTaskPreproc`

/`PipeOp`

.

## See also

https://mlr-org.com/pipeops.html

Other PipeOps:
`PipeOp`

,
`PipeOpEnsemble`

,
`PipeOpImpute`

,
`PipeOpTargetTrafo`

,
`PipeOpTaskPreproc`

,
`PipeOpTaskPreprocSimple`

,
`mlr_pipeops`

,
`mlr_pipeops_adas`

,
`mlr_pipeops_blsmote`

,
`mlr_pipeops_boxcox`

,
`mlr_pipeops_branch`

,
`mlr_pipeops_chunk`

,
`mlr_pipeops_classbalancing`

,
`mlr_pipeops_classifavg`

,
`mlr_pipeops_classweights`

,
`mlr_pipeops_colapply`

,
`mlr_pipeops_collapsefactors`

,
`mlr_pipeops_colroles`

,
`mlr_pipeops_copy`

,
`mlr_pipeops_datefeatures`

,
`mlr_pipeops_encode`

,
`mlr_pipeops_encodeimpact`

,
`mlr_pipeops_encodelmer`

,
`mlr_pipeops_featureunion`

,
`mlr_pipeops_filter`

,
`mlr_pipeops_fixfactors`

,
`mlr_pipeops_histbin`

,
`mlr_pipeops_ica`

,
`mlr_pipeops_imputeconstant`

,
`mlr_pipeops_imputehist`

,
`mlr_pipeops_imputelearner`

,
`mlr_pipeops_imputemean`

,
`mlr_pipeops_imputemedian`

,
`mlr_pipeops_imputemode`

,
`mlr_pipeops_imputeoor`

,
`mlr_pipeops_imputesample`

,
`mlr_pipeops_kernelpca`

,
`mlr_pipeops_learner`

,
`mlr_pipeops_missind`

,
`mlr_pipeops_modelmatrix`

,
`mlr_pipeops_multiplicityexply`

,
`mlr_pipeops_multiplicityimply`

,
`mlr_pipeops_mutate`

,
`mlr_pipeops_nmf`

,
`mlr_pipeops_nop`

,
`mlr_pipeops_ovrsplit`

,
`mlr_pipeops_ovrunite`

,
`mlr_pipeops_pca`

,
`mlr_pipeops_proxy`

,
`mlr_pipeops_quantilebin`

,
`mlr_pipeops_randomresponse`

,
`mlr_pipeops_regravg`

,
`mlr_pipeops_removeconstants`

,
`mlr_pipeops_renamecolumns`

,
`mlr_pipeops_replicate`

,
`mlr_pipeops_rowapply`

,
`mlr_pipeops_scale`

,
`mlr_pipeops_scalemaxabs`

,
`mlr_pipeops_scalerange`

,
`mlr_pipeops_select`

,
`mlr_pipeops_smote`

,
`mlr_pipeops_smotenc`

,
`mlr_pipeops_spatialsign`

,
`mlr_pipeops_subsample`

,
`mlr_pipeops_targetinvert`

,
`mlr_pipeops_targetmutate`

,
`mlr_pipeops_targettrafoscalerange`

,
`mlr_pipeops_textvectorizer`

,
`mlr_pipeops_threshold`

,
`mlr_pipeops_tunethreshold`

,
`mlr_pipeops_unbranch`

,
`mlr_pipeops_updatetarget`

,
`mlr_pipeops_vtreat`

,
`mlr_pipeops_yeojohnson`

## Examples

```
library("mlr3")
task = tsk("iris")
pop = po("randomprojection", rank = 2)
task$data()
#> Species Petal.Length Petal.Width Sepal.Length Sepal.Width
#> <fctr> <num> <num> <num> <num>
#> 1: setosa 1.4 0.2 5.1 3.5
#> 2: setosa 1.4 0.2 4.9 3.0
#> 3: setosa 1.3 0.2 4.7 3.2
#> 4: setosa 1.5 0.2 4.6 3.1
#> 5: setosa 1.4 0.2 5.0 3.6
#> ---
#> 146: virginica 5.2 2.3 6.7 3.0
#> 147: virginica 5.0 1.9 6.3 2.5
#> 148: virginica 5.2 2.0 6.5 3.0
#> 149: virginica 5.4 2.3 6.2 3.4
#> 150: virginica 5.1 1.8 5.9 3.0
pop$train(list(task))[[1]]$data()
#> Species PR1 PR2
#> <fctr> <num> <num>
#> 1: setosa 4.534058 -4.409578
#> 2: setosa 4.242573 -4.065779
#> 3: setosa 4.163508 -4.060085
#> 4: setosa 3.978902 -4.120237
#> 5: setosa 4.475472 -4.442770
#> ---
#> 146: virginica 3.647466 -7.827740
#> 147: virginica 3.388987 -7.181222
#> 148: virginica 3.573315 -7.696115
#> 149: virginica 3.253416 -8.079257
#> 150: virginica 3.172378 -7.417726
pop$state
#> $projection
#> PR1 PR2
#> Petal.Length -0.3810837 -0.7207785
#> Petal.Width -0.3094164 -0.2693797
#> Sepal.Length 0.8348811 -0.2540586
#> Sepal.Width 0.2490186 -0.5859754
#>
#> $dt_columns
#> [1] "Petal.Length" "Petal.Width" "Sepal.Length" "Sepal.Width"
#>
#> $affected_cols
#> [1] "Petal.Length" "Petal.Width" "Sepal.Length" "Sepal.Width"
#>
#> $intasklayout
#> Key: <id>
#> id type
#> <char> <char>
#> 1: Petal.Length numeric
#> 2: Petal.Width numeric
#> 3: Sepal.Length numeric
#> 4: Sepal.Width numeric
#>
#> $outtasklayout
#> Key: <id>
#> id type
#> <char> <char>
#> 1: PR1 numeric
#> 2: PR2 numeric
#>
#> $outtaskshell
#> Empty data.table (0 rows and 3 cols): Species,PR1,PR2
#>
```