Principal Component Analysis

Extracts principal components from data. Only affects numerical features. See stats::prcomp() for details.

Format

R6Class object inheriting from PipeOpTaskPreproc/PipeOp.

Construction

PipeOpPCA$new(id = "pca", param_vals = list())

id :: character(1)
Identifier of resulting object, default "pca".
param_vals :: named list
List of hyperparameter settings, overwriting the hyperparameter settings that would otherwise be set during construction. Default list().

Input and Output Channels

Input and output channels are inherited from PipeOpTaskPreproc.

The output is the input Task with all affected numeric features replaced by their principal components.

State

The $state is a named list with the $state elements inherited from PipeOpTaskPreproc, as well as the elements of the class stats::prcomp, with the exception of the $x slot. These are in particular:

sdev :: numeric
The standard deviations of the principal components.
rotation :: matrix
The matrix of variable loadings.
center :: numeric | logical(1)
The centering used, or FALSE.
scale :: numeric | logical(1)
The scaling used, or FALSE.

Parameters

The parameters are the parameters inherited from PipeOpTaskPreproc, as well as:

center :: logical(1)
Indicating whether the features should be centered. Default is TRUE. See prcomp().
scale. :: logical(1)
Whether to scale features to unit variance before analysis. Default is FALSE, but scaling is advisable. See prcomp().
rank. :: integer(1)
Maximal number of principal components to be used. Default is NULL: use all components. See prcomp().

Internals

Uses the prcomp() function.

Fields

Only fields inherited from PipeOp.

Methods

Only methods inherited from PipeOpTaskPreproc/PipeOp.

Examples

library("mlr3")

task = tsk("iris")
pop = po("pca")

task$data()
#>        Species Petal.Length Petal.Width Sepal.Length Sepal.Width
#>         <fctr>        <num>       <num>        <num>       <num>
#>   1:    setosa          1.4         0.2          5.1         3.5
#>   2:    setosa          1.4         0.2          4.9         3.0
#>   3:    setosa          1.3         0.2          4.7         3.2
#>   4:    setosa          1.5         0.2          4.6         3.1
#>   5:    setosa          1.4         0.2          5.0         3.6
#>  ---                                                            
#> 146: virginica          5.2         2.3          6.7         3.0
#> 147: virginica          5.0         1.9          6.3         2.5
#> 148: virginica          5.2         2.0          6.5         3.0
#> 149: virginica          5.4         2.3          6.2         3.4
#> 150: virginica          5.1         1.8          5.9         3.0
pop$train(list(task))[[1]]$data()
#>        Species       PC1         PC2         PC3          PC4
#>         <fctr>     <num>       <num>       <num>        <num>
#>   1:    setosa -2.684126 -0.31939725  0.02791483 -0.002262437
#>   2:    setosa -2.714142  0.17700123  0.21046427 -0.099026550
#>   3:    setosa -2.888991  0.14494943 -0.01790026 -0.019968390
#>   4:    setosa -2.745343  0.31829898 -0.03155937  0.075575817
#>   5:    setosa -2.728717 -0.32675451 -0.09007924  0.061258593
#>  ---                                                         
#> 146: virginica  1.944110 -0.18753230 -0.17782509 -0.426195940
#> 147: virginica  1.527167  0.37531698  0.12189817 -0.254367442
#> 148: virginica  1.764346 -0.07885885 -0.13048163 -0.137001274
#> 149: virginica  1.900942 -0.11662796 -0.72325156 -0.044595305
#> 150: virginica  1.390189  0.28266094 -0.36290965  0.155038628

pop$state
#> Standard deviations (1, .., p=4):
#> [1] 2.0562689 0.4926162 0.2796596 0.1543862
#> 
#> Rotation (n x k) = (4 x 4):
#>                      PC1         PC2         PC3        PC4
#> Petal.Length  0.85667061  0.17337266 -0.07623608  0.4798390
#> Petal.Width   0.35828920  0.07548102 -0.54583143 -0.7536574
#> Sepal.Length  0.36138659 -0.65658877  0.58202985 -0.3154872
#> Sepal.Width  -0.08452251 -0.73016143 -0.59791083  0.3197231