Provides an interface to the vtreat package.
PipeOpVtreat
naturally works for classification tasks and regression tasks.
Internally, PipeOpVtreat
follows the fit/prepare interface of vtreat, i.e., first creating a data treatment transform object via
vtreat::NumericOutcomeTreatment()
, vtreat::BinomialOutcomeTreatment()
, or vtreat::MultinomialOutcomeTreatment()
, followed by calling
vtreat::fit_prepare()
on the training data and vtreat::prepare()
during predicton.
Format
R6Class
object inheriting from PipeOpTaskPreproc
/PipeOp
.
Construction
id
::character(1)
Identifier of resulting object, default"vtreat"
.param_vals
:: namedlist
List of hyperparameter settings, overwriting the hyperparameter settings that would otherwise be set during construction. Defaultlist()
.
Input and Output Channels
Input and output channels are inherited from PipeOpTaskPreproc
.
The output is the input Task
with all affected features "prepared" by vtreat.
If vtreat found "no usable vars", the input Task
is returned unaltered.
State
The $state
is a named list
with the $state
elements inherited from PipeOpTaskPreproc
, as well as:
treatment_plan
:: object of classvtreat_pipe_step
|NULL
The treatment plan as constructed by vtreat based on the training data, i.e., an object of classtreatment_plan
. If vtreat found "no usable vars" and designing the treatment would have failed, this isNULL
.
Parameters
The parameters are the parameters inherited from PipeOpTaskPreproc
, as well as:
recommended
::logical(1)
Whether only the "recommended" prepared features should be returned, i.e., non constant variables with a significance value smaller than vtreat's threshold. Initialized toTRUE
.cols_to_copy
::function
|Selector
Selector
function, takes aTask
as argument and returns acharacter()
of features to copy.
SeeSelector
for example functions. Initialized toselector_none()
.minFraction
::numeric(1)
Minimum frequency a categorical level must have to be converted to an indicator column.smFactor
::numeric(1)
Smoothing factor for impact coding models.rareCount
::integer(1)
Allow levels with this count or below to be pooled into a shared rare-level.rareSig
::numeric(1)
Suppress levels from pooling at this significance value greater.collarProb
::numeric(1)
What fraction of the data (pseudo-probability) to collar data at ifdoCollar = TRUE
.doCollar
::logical(1)
IfTRUE
collar numeric variables by cutting off after a tail-probability specified bycollarProb
during treatment design.codeRestriction
::character()
What types of variables to produce.customCoders
:: namedlist
Map from code names to custom categorical variable encoding functions.splitFunction
::function
Function taking arguments nSplits, nRows, dframe, and y; returning a user desired split.ncross
::integer(1)
Integer larger than one, number of cross-validation rounds to design.forceSplit
::logical(1)
IfTRUE
force cross-validated significance calculations on all variables.catScaling
::logical(1)
IfTRUE
usestats::glm()
linkspace, if FALSE usestats::lm()
for scaling.verbose
::logical(1)
IfTRUE
print progress.use_paralell
::logical(1)
IfTRUE
use parallel methods.missingness_imputation
::function
Function of signature f(values: numeric, weights: numeric), simple missing value imputer.
Typically, an imputation via aPipeOp
should be preferred, seePipeOpImpute
.pruneSig
::numeric(1)
Suppress variables with significance above this level. Only effects [regression tasksmlr3::TaskRegr and binary classification tasks.scale
::logical(1)
IfTRUE
replace numeric variables with single variable model regressions ("move to outcome-scale"). These have mean zero and (for variables with significant less than 1) slope 1 when regressed (lm for regression problems/glm for classification problems) against outcome.varRestriction
::list()
List of treated variable names to restrict to. Only effects [regression tasksmlr3::TaskRegr and binary classification tasks.trackedValues
:: namedlist()
Named list mapping variables to know values, allows warnings upon novel level appearances (seevtreat::track_values()
). Only effects [regression tasksmlr3::TaskRegr and binary classification tasks.y_dependent_treatments
::character()
Character what treatment types to build per-outcome level. Only effects multiclass classification tasks.imputation_map
:: namedlist
List of map from column names to functions of signature f(values: numeric, weights: numeric), simple missing value imputers.
Typically, an imputation via aPipeOp
is to be preferred, seePipeOpImpute
.
For more information, see vtreat::regression_parameters()
, vtreat::classification_parameters()
, or vtreat::multinomial_parameters()
.
Internals
Follows vtreat's fit/prepare interface. See vtreat::NumericOutcomeTreatment()
, vtreat::BinomialOutcomeTreatment()
,
vtreat::MultinomialOutcomeTreatment()
, vtreat::fit_prepare()
and vtreat::prepare()
.
Methods
Only methods inherited from PipeOpTaskPreproc
/PipeOp
.
See also
https://mlr-org.com/pipeops.html
Other PipeOps:
PipeOp
,
PipeOpEnsemble
,
PipeOpImpute
,
PipeOpTargetTrafo
,
PipeOpTaskPreproc
,
PipeOpTaskPreprocSimple
,
mlr_pipeops
,
mlr_pipeops_adas
,
mlr_pipeops_blsmote
,
mlr_pipeops_boxcox
,
mlr_pipeops_branch
,
mlr_pipeops_chunk
,
mlr_pipeops_classbalancing
,
mlr_pipeops_classifavg
,
mlr_pipeops_classweights
,
mlr_pipeops_colapply
,
mlr_pipeops_collapsefactors
,
mlr_pipeops_colroles
,
mlr_pipeops_copy
,
mlr_pipeops_datefeatures
,
mlr_pipeops_encode
,
mlr_pipeops_encodeimpact
,
mlr_pipeops_encodelmer
,
mlr_pipeops_featureunion
,
mlr_pipeops_filter
,
mlr_pipeops_fixfactors
,
mlr_pipeops_histbin
,
mlr_pipeops_ica
,
mlr_pipeops_imputeconstant
,
mlr_pipeops_imputehist
,
mlr_pipeops_imputelearner
,
mlr_pipeops_imputemean
,
mlr_pipeops_imputemedian
,
mlr_pipeops_imputemode
,
mlr_pipeops_imputeoor
,
mlr_pipeops_imputesample
,
mlr_pipeops_kernelpca
,
mlr_pipeops_learner
,
mlr_pipeops_learner_pi_cvplus
,
mlr_pipeops_learner_quantiles
,
mlr_pipeops_missind
,
mlr_pipeops_modelmatrix
,
mlr_pipeops_multiplicityexply
,
mlr_pipeops_multiplicityimply
,
mlr_pipeops_mutate
,
mlr_pipeops_nearmiss
,
mlr_pipeops_nmf
,
mlr_pipeops_nop
,
mlr_pipeops_ovrsplit
,
mlr_pipeops_ovrunite
,
mlr_pipeops_pca
,
mlr_pipeops_proxy
,
mlr_pipeops_quantilebin
,
mlr_pipeops_randomprojection
,
mlr_pipeops_randomresponse
,
mlr_pipeops_regravg
,
mlr_pipeops_removeconstants
,
mlr_pipeops_renamecolumns
,
mlr_pipeops_replicate
,
mlr_pipeops_rowapply
,
mlr_pipeops_scale
,
mlr_pipeops_scalemaxabs
,
mlr_pipeops_scalerange
,
mlr_pipeops_select
,
mlr_pipeops_smote
,
mlr_pipeops_smotenc
,
mlr_pipeops_spatialsign
,
mlr_pipeops_subsample
,
mlr_pipeops_targetinvert
,
mlr_pipeops_targetmutate
,
mlr_pipeops_targettrafoscalerange
,
mlr_pipeops_textvectorizer
,
mlr_pipeops_threshold
,
mlr_pipeops_tomek
,
mlr_pipeops_tunethreshold
,
mlr_pipeops_unbranch
,
mlr_pipeops_updatetarget
,
mlr_pipeops_yeojohnson
Examples
library("mlr3")
set.seed(2020)
make_data <- function(nrows) {
d <- data.frame(x = 5 * rnorm(nrows))
d["y"] = sin(d[["x"]]) + 0.01 * d[["x"]] + 0.1 * rnorm(nrows)
d[4:10, "x"] = NA # introduce NAs
d["xc"] = paste0("level_", 5 * round(d$y / 5, 1))
d["x2"] = rnorm(nrows)
d[d["xc"] == "level_-1", "xc"] = NA # introduce a NA level
return(d)
}
task = TaskRegr$new("vtreat_regr", backend = make_data(100), target = "y")
pop = PipeOpVtreat$new()
pop$train(list(task))
#> $output
#> <TaskRegr:vtreat_regr> (100 x 8)
#> * Target: y
#> * Properties: -
#> * Features (7):
#> - dbl (7): xc_catD, xc_catN, xc_catP, xc_lev_NA, xc_lev_x_level_0_5,
#> xc_lev_x_level_1, xc_lev_x_level_minus_0_5
#>