Provides an interface to the vtreat package.
PipeOpVtreat
naturally works for classification tasks and regression tasks.
Internally, PipeOpVtreat
follows the fit/prepare interface of vtreat, i.e., first creating a data treatment transform object via
vtreat::NumericOutcomeTreatment()
, vtreat::BinomialOutcomeTreatment()
, or vtreat::MultinomialOutcomeTreatment()
, followed by calling
vtreat::fit_prepare()
on the training data and vtreat::prepare()
during predicton.
Format
R6Class
object inheriting from PipeOpTaskPreproc
/PipeOp
.
Construction
id
::character(1)
Identifier of resulting object, default"vtreat"
.param_vals
:: namedlist
List of hyperparameter settings, overwriting the hyperparameter settings that would otherwise be set during construction. Defaultlist()
.
Input and Output Channels
Input and output channels are inherited from PipeOpTaskPreproc
. Instead of a Task
, a
TaskSupervised
is used as input and output during training and prediction.
The output is the input Task
with all affected features "prepared" by vtreat.
If vtreat found "no usable vars", the input Task
is returned unaltered.
State
The $state
is a named list
with the $state
elements inherited from PipeOpTaskPreproc
, as well as:
treatment_plan
:: object of classvtreat_pipe_step
|NULL
The treatment plan as constructed by vtreat based on the training data, i.e., an object of classtreatment_plan
. If vtreat found "no usable vars" and designing the treatment would have failed, this isNULL
.
Parameters
The parameters are the parameters inherited from PipeOpTaskPreproc
, as well as:
recommended
::logical(1)
Whether only the "recommended" prepared features should be returned, i.e., non constant variables with a significance value smaller than vtreat's threshold. Initialized toTRUE
.cols_to_copy
::function
|Selector
Selector
function, takes aTask
as argument and returns acharacter()
of features to copy.
SeeSelector
for example functions. Initialized toselector_none()
.minFraction
::numeric(1)
Minimum frequency a categorical level must have to be converted to an indicator column.smFactor
::numeric(1)
Smoothing factor for impact coding models.rareCount
::integer(1)
Allow levels with this count or below to be pooled into a shared rare-level.rareSig
::numeric(1)
Suppress levels from pooling at this significance value greater.collarProb
::numeric(1)
What fraction of the data (pseudo-probability) to collar data at ifdoCollar = TRUE
.doCollar
::logical(1)
IfTRUE
collar numeric variables by cutting off after a tail-probability specified bycollarProb
during treatment design.codeRestriction
::character()
What types of variables to produce.customCoders
:: namedlist
Map from code names to custom categorical variable encoding functions.splitFunction
::function
Function taking arguments nSplits, nRows, dframe, and y; returning a user desired split.ncross
::integer(1)
Integer larger than one, number of cross-validation rounds to design.forceSplit
::logical(1)
IfTRUE
force cross-validated significance calculations on all variables.catScaling
::logical(1)
IfTRUE
usestats::glm()
linkspace, if FALSE usestats::lm()
for scaling.verbose
::logical(1)
IfTRUE
print progress.use_parallel
::logical(1)
IfTRUE
use parallel methods.missingness_imputation
::function
Function of signature f(values: numeric, weights: numeric), simple missing value imputer.
Typically, an imputation via aPipeOp
should be preferred, seePipeOpImpute
.pruneSig
::numeric(1)
Suppress variables with significance above this level. Only effects [regression tasksmlr3::TaskRegr and binary classification tasks.scale
::logical(1)
IfTRUE
replace numeric variables with single variable model regressions ("move to outcome-scale"). These have mean zero and (for variables with significant less than 1) slope 1 when regressed (lm for regression problems/glm for classification problems) against outcome.varRestriction
::list()
List of treated variable names to restrict to. Only effects [regression tasksmlr3::TaskRegr and binary classification tasks.trackedValues
:: namedlist()
Named list mapping variables to know values, allows warnings upon novel level appearances (seevtreat::track_values()
). Only effects [regression tasksmlr3::TaskRegr and binary classification tasks.y_dependent_treatments
::character()
Character what treatment types to build per-outcome level. Only effects multiclass classification tasks.imputation_map
:: namedlist
List of map from column names to functions of signature f(values: numeric, weights: numeric), simple missing value imputers.
Typically, an imputation via aPipeOp
is to be preferred, seePipeOpImpute
.
For more information, see vtreat::regression_parameters()
, vtreat::classification_parameters()
, or vtreat::multinomial_parameters()
.
Internals
Follows vtreat's fit/prepare interface. See vtreat::NumericOutcomeTreatment()
, vtreat::BinomialOutcomeTreatment()
,
vtreat::MultinomialOutcomeTreatment()
, vtreat::fit_prepare()
and vtreat::prepare()
.
Fields
Only fields inherited from PipeOp
.
Methods
Only methods inherited from PipeOpTaskPreproc
/PipeOp
.
See also
https://mlr-org.com/pipeops.html
Other PipeOps:
PipeOp
,
PipeOpEncodePL
,
PipeOpEnsemble
,
PipeOpImpute
,
PipeOpTargetTrafo
,
PipeOpTaskPreproc
,
PipeOpTaskPreprocSimple
,
mlr_pipeops
,
mlr_pipeops_adas
,
mlr_pipeops_blsmote
,
mlr_pipeops_boxcox
,
mlr_pipeops_branch
,
mlr_pipeops_chunk
,
mlr_pipeops_classbalancing
,
mlr_pipeops_classifavg
,
mlr_pipeops_classweights
,
mlr_pipeops_colapply
,
mlr_pipeops_collapsefactors
,
mlr_pipeops_colroles
,
mlr_pipeops_copy
,
mlr_pipeops_datefeatures
,
mlr_pipeops_decode
,
mlr_pipeops_encode
,
mlr_pipeops_encodeimpact
,
mlr_pipeops_encodelmer
,
mlr_pipeops_encodeplquantiles
,
mlr_pipeops_encodepltree
,
mlr_pipeops_featureunion
,
mlr_pipeops_filter
,
mlr_pipeops_fixfactors
,
mlr_pipeops_histbin
,
mlr_pipeops_ica
,
mlr_pipeops_imputeconstant
,
mlr_pipeops_imputehist
,
mlr_pipeops_imputelearner
,
mlr_pipeops_imputemean
,
mlr_pipeops_imputemedian
,
mlr_pipeops_imputemode
,
mlr_pipeops_imputeoor
,
mlr_pipeops_imputesample
,
mlr_pipeops_kernelpca
,
mlr_pipeops_learner
,
mlr_pipeops_learner_pi_cvplus
,
mlr_pipeops_learner_quantiles
,
mlr_pipeops_missind
,
mlr_pipeops_modelmatrix
,
mlr_pipeops_multiplicityexply
,
mlr_pipeops_multiplicityimply
,
mlr_pipeops_mutate
,
mlr_pipeops_nearmiss
,
mlr_pipeops_nmf
,
mlr_pipeops_nop
,
mlr_pipeops_ovrsplit
,
mlr_pipeops_ovrunite
,
mlr_pipeops_pca
,
mlr_pipeops_proxy
,
mlr_pipeops_quantilebin
,
mlr_pipeops_randomprojection
,
mlr_pipeops_randomresponse
,
mlr_pipeops_regravg
,
mlr_pipeops_removeconstants
,
mlr_pipeops_renamecolumns
,
mlr_pipeops_replicate
,
mlr_pipeops_rowapply
,
mlr_pipeops_scale
,
mlr_pipeops_scalemaxabs
,
mlr_pipeops_scalerange
,
mlr_pipeops_select
,
mlr_pipeops_smote
,
mlr_pipeops_smotenc
,
mlr_pipeops_spatialsign
,
mlr_pipeops_subsample
,
mlr_pipeops_targetinvert
,
mlr_pipeops_targetmutate
,
mlr_pipeops_targettrafoscalerange
,
mlr_pipeops_textvectorizer
,
mlr_pipeops_threshold
,
mlr_pipeops_tomek
,
mlr_pipeops_tunethreshold
,
mlr_pipeops_unbranch
,
mlr_pipeops_updatetarget
,
mlr_pipeops_yeojohnson
Examples
library("mlr3")
set.seed(2020)
make_data <- function(nrows) {
d <- data.frame(x = 5 * rnorm(nrows))
d["y"] = sin(d[["x"]]) + 0.01 * d[["x"]] + 0.1 * rnorm(nrows)
d[4:10, "x"] = NA # introduce NAs
d["xc"] = paste0("level_", 5 * round(d$y / 5, 1))
d["x2"] = rnorm(nrows)
d[d["xc"] == "level_-1", "xc"] = NA # introduce a NA level
return(d)
}
task = TaskRegr$new("vtreat_regr", backend = make_data(100), target = "y")
pop = PipeOpVtreat$new()
pop$train(list(task))
#> $output
#> <TaskRegr:vtreat_regr> (100 x 8)
#> * Target: y
#> * Properties: -
#> * Features (7):
#> - dbl (7): xc_catD, xc_catN, xc_catP, xc_lev_NA, xc_lev_x_level_0_5,
#> xc_lev_x_level_1, xc_lev_x_level_minus_0_5
#>