Impute numerical features by histogram.
During training, a histogram is fitted on each column using R's hist()
function.
The fitted histogram is then sampled from for imputation. Sampling happens in a two-step process:
First, a bin is sampled from the histogram, then a value is sampled uniformly from the bin.
This is an approximation to sampling from the empirical training data distribution (i.e. sampling
from training data with replacement), but is much more memory efficient for large datasets, since the $state
does not need to save the training data.
Format
R6Class
object inheriting from PipeOpImpute
/PipeOp
.
Construction
id
::character(1)
Identifier of resulting object, default"imputehist"
.param_vals
:: namedlist
List of hyperparameter settings, overwriting the hyperparameter settings that would otherwise be set during construction. Defaultlist()
.
Input and Output Channels
Input and output channels are inherited from PipeOpImpute
.
The output is the input Task
with all affected numeric features missing values imputed by (column-wise) histogram; see Description for details.
State
The $state
is a named list
with the $state
elements inherited from PipeOpImpute
.
The $state$model
is a named list
of list
s containing elements $counts
and $breaks
.
Parameters
The parameters are the parameters inherited from PipeOpImpute
.
Internals
Uses the graphics::hist()
function. Features that are entirely NA
are imputed as 0
.
Methods
Only methods inherited from PipeOpImpute
/PipeOp
.
See also
https://mlr-org.com/pipeops.html
Other PipeOps:
PipeOp
,
PipeOpEnsemble
,
PipeOpImpute
,
PipeOpTargetTrafo
,
PipeOpTaskPreproc
,
PipeOpTaskPreprocSimple
,
mlr_pipeops
,
mlr_pipeops_adas
,
mlr_pipeops_blsmote
,
mlr_pipeops_boxcox
,
mlr_pipeops_branch
,
mlr_pipeops_chunk
,
mlr_pipeops_classbalancing
,
mlr_pipeops_classifavg
,
mlr_pipeops_classweights
,
mlr_pipeops_colapply
,
mlr_pipeops_collapsefactors
,
mlr_pipeops_colroles
,
mlr_pipeops_copy
,
mlr_pipeops_datefeatures
,
mlr_pipeops_encode
,
mlr_pipeops_encodeimpact
,
mlr_pipeops_encodelmer
,
mlr_pipeops_featureunion
,
mlr_pipeops_filter
,
mlr_pipeops_fixfactors
,
mlr_pipeops_histbin
,
mlr_pipeops_ica
,
mlr_pipeops_imputeconstant
,
mlr_pipeops_imputelearner
,
mlr_pipeops_imputemean
,
mlr_pipeops_imputemedian
,
mlr_pipeops_imputemode
,
mlr_pipeops_imputeoor
,
mlr_pipeops_imputesample
,
mlr_pipeops_kernelpca
,
mlr_pipeops_learner
,
mlr_pipeops_learner_pi_cvplus
,
mlr_pipeops_learner_quantiles
,
mlr_pipeops_missind
,
mlr_pipeops_modelmatrix
,
mlr_pipeops_multiplicityexply
,
mlr_pipeops_multiplicityimply
,
mlr_pipeops_mutate
,
mlr_pipeops_nearmiss
,
mlr_pipeops_nmf
,
mlr_pipeops_nop
,
mlr_pipeops_ovrsplit
,
mlr_pipeops_ovrunite
,
mlr_pipeops_pca
,
mlr_pipeops_proxy
,
mlr_pipeops_quantilebin
,
mlr_pipeops_randomprojection
,
mlr_pipeops_randomresponse
,
mlr_pipeops_regravg
,
mlr_pipeops_removeconstants
,
mlr_pipeops_renamecolumns
,
mlr_pipeops_replicate
,
mlr_pipeops_rowapply
,
mlr_pipeops_scale
,
mlr_pipeops_scalemaxabs
,
mlr_pipeops_scalerange
,
mlr_pipeops_select
,
mlr_pipeops_smote
,
mlr_pipeops_smotenc
,
mlr_pipeops_spatialsign
,
mlr_pipeops_subsample
,
mlr_pipeops_targetinvert
,
mlr_pipeops_targetmutate
,
mlr_pipeops_targettrafoscalerange
,
mlr_pipeops_textvectorizer
,
mlr_pipeops_threshold
,
mlr_pipeops_tomek
,
mlr_pipeops_tunethreshold
,
mlr_pipeops_unbranch
,
mlr_pipeops_updatetarget
,
mlr_pipeops_vtreat
,
mlr_pipeops_yeojohnson
Other Imputation PipeOps:
PipeOpImpute
,
mlr_pipeops_imputeconstant
,
mlr_pipeops_imputelearner
,
mlr_pipeops_imputemean
,
mlr_pipeops_imputemedian
,
mlr_pipeops_imputemode
,
mlr_pipeops_imputeoor
,
mlr_pipeops_imputesample
Examples
library("mlr3")
task = tsk("pima")
task$missings()
#> diabetes age glucose insulin mass pedigree pregnant pressure
#> 0 0 5 374 11 0 0 35
#> triceps
#> 227
po = po("imputehist")
new_task = po$train(list(task = task))[[1]]
new_task$missings()
#> diabetes age pedigree pregnant glucose insulin mass pressure
#> 0 0 0 0 0 0 0 0
#> triceps
#> 0
po$state$model
#> $age
#> $age$counts
#> [1] 267 150 81 76 76 37 31 23 14 11 1 0 1
#>
#> $age$breaks
#> [1] 20 25 30 35 40 45 50 55 60 65 70 75 80 85
#>
#>
#> $glucose
#> $glucose$counts
#> [1] 4 38 167 205 157 91 60 41
#>
#> $glucose$breaks
#> [1] 40 60 80 100 120 140 160 180 200
#>
#>
#> $insulin
#> $insulin$counts
#> [1] 151 158 48 17 11 6 1 1 1
#>
#> $insulin$breaks
#> [1] 0 100 200 300 400 500 600 700 800 900
#>
#>
#> $mass
#> $mass$counts
#> [1] 14 98 180 221 148 61 27 5 2 0 1
#>
#> $mass$breaks
#> [1] 15 20 25 30 35 40 45 50 55 60 65 70
#>
#>
#> $pedigree
#> $pedigree$counts
#> [1] 128 282 154 99 54 22 16 4 4 1 1 2 1
#>
#> $pedigree$breaks
#> [1] 0.0 0.2 0.4 0.6 0.8 1.0 1.2 1.4 1.6 1.8 2.0 2.2 2.4 2.6
#>
#>
#> $pregnant
#> $pregnant$counts
#> [1] 349 143 107 83 52 20 12 1 1
#>
#> $pregnant$breaks
#> [1] 0 2 4 6 8 10 12 14 16 18
#>
#>
#> $pressure
#> $pressure$counts
#> [1] 3 2 24 94 217 228 127 25 11 1 1
#>
#> $pressure$breaks
#> [1] 20 30 40 50 60 70 80 90 100 110 120 130
#>
#>
#> $triceps
#> $triceps$counts
#> [1] 9 115 179 164 65 7 1 0 0 1
#>
#> $triceps$breaks
#> [1] 0 10 20 30 40 50 60 70 80 90 100
#>
#>