Based on POSIXct
/Date
columns of the data, a set of date related features is computed and
added to the feature set of the output task. If no POSIXct
or Date
column is found, the
original task is returned unaltered. This functionality is based on the add_datepart()
and
add_cyclic_datepart()
functions from the fastai package. If operation on only
particular POSIXct
/Date
columns is requested, use the affect_columns
parameter inherited
from PipeOpTaskPreprocSimple
.
For Date
columns, the features "hour"
, "minute"
, "second"
, and "is_day"
are skipped.
If cyclic = TRUE
, cyclic features are computed for the features "month"
, "week_of_year"
,
"day_of_year"
, "day_of_month"
, "day_of_week"
, "hour"
, "minute"
and "second"
. This
means that for each feature x
, two additional features are computed, namely the sine and cosine
transformation of 2 * pi * x / max_x
(here max_x
is the largest possible value the feature
could take on + 1
, assuming the lowest possible value is given by 0, e.g., for hours from 0 to
23, this is 24). This is useful to respect the cyclical nature of features such as seconds, i.e.,
second 21 and second 22 are one second apart, but so are second 60 and second 1 of the next
minute.
Format
R6Class
object inheriting from PipeOpTaskPreprocSimple
/PipeOpTaskPreproc
/PipeOp
.
Construction
id
::character(1)
Identifier of resulting object, default"datefeatures"
.param_vals
:: namedlist
List of hyperparameter settings, overwriting the hyperparameter settings that would otherwise be set during construction. Defaultlist()
.
Input and Output Channels
Input and output channels are inherited from PipeOpTaskPreproc
.
The output is the input Task
with date-related features computed and added to the
feature set of the output task and the POSIXct
columns of the data removed from the
feature set (depending on the value of keep_date_var
).
State
The $state
is a named list
with the $state
elements inherited from
PipeOpTaskPreproc
.
Parameters
The parameters are the parameters inherited from PipeOpTaskPreproc
, as well as:
keep_date_var
::logical(1)
Should thePOSIXct
columns be kept as features? DefaultFALSE
.cyclic
::logical(1)
Should cyclic features be computed? See Internals. DefaultFALSE
.year
::logical(1)
Should the year be extracted as a feature? DefaultTRUE
.quarter
::logical(1)
Should the quarter be extracted as a feature? DefaultTRUE
.month
::logical(1)
Should the month be extracted as a feature? DefaultTRUE
.week_of_year
::logical(1)
Should the week of the year be extracted as a feature? DefaultTRUE
.day_of_year
::logical(1)
Should the day of the year be extracted as a feature? DefaultTRUE
.day_of_month
::logical(1)
Should the day of the month be extracted as a feature? DefaultTRUE
.day_of_week
::logical(1)
Should the day of the week (ISO 8601) be extracted as a feature? DefaultTRUE
.hour
::logical(1)
Should the hour be extracted as a feature? DefaultTRUE
.minute
::logical(1)
Should the minute be extracted as a feature? DefaultTRUE
.second
::logical(1)
Should the second be extracted as a feature? DefaultTRUE
.is_day
::logical(1)
Should a feature be extracted indicating whether it is day time (06:00am - 08:00pm)? DefaultTRUE
.
Internals
The cyclic feature transformation always assumes that values range from 0, so some values (e.g. day of the month) are shifted before sine/cosine transform.
Fields
Only fields inherited from PipeOp
.
Methods
Only methods inherited from PipeOpTaskPreprocSimple
/PipeOpTaskPreproc
/PipeOp
.
See also
https://mlr-org.com/pipeops.html
Other PipeOps:
PipeOp
,
PipeOpEncodePL
,
PipeOpEnsemble
,
PipeOpImpute
,
PipeOpTargetTrafo
,
PipeOpTaskPreproc
,
PipeOpTaskPreprocSimple
,
mlr_pipeops
,
mlr_pipeops_adas
,
mlr_pipeops_blsmote
,
mlr_pipeops_boxcox
,
mlr_pipeops_branch
,
mlr_pipeops_chunk
,
mlr_pipeops_classbalancing
,
mlr_pipeops_classifavg
,
mlr_pipeops_classweights
,
mlr_pipeops_colapply
,
mlr_pipeops_collapsefactors
,
mlr_pipeops_colroles
,
mlr_pipeops_copy
,
mlr_pipeops_decode
,
mlr_pipeops_encode
,
mlr_pipeops_encodeimpact
,
mlr_pipeops_encodelmer
,
mlr_pipeops_encodeplquantiles
,
mlr_pipeops_encodepltree
,
mlr_pipeops_featureunion
,
mlr_pipeops_filter
,
mlr_pipeops_fixfactors
,
mlr_pipeops_histbin
,
mlr_pipeops_ica
,
mlr_pipeops_imputeconstant
,
mlr_pipeops_imputehist
,
mlr_pipeops_imputelearner
,
mlr_pipeops_imputemean
,
mlr_pipeops_imputemedian
,
mlr_pipeops_imputemode
,
mlr_pipeops_imputeoor
,
mlr_pipeops_imputesample
,
mlr_pipeops_kernelpca
,
mlr_pipeops_learner
,
mlr_pipeops_learner_pi_cvplus
,
mlr_pipeops_learner_quantiles
,
mlr_pipeops_missind
,
mlr_pipeops_modelmatrix
,
mlr_pipeops_multiplicityexply
,
mlr_pipeops_multiplicityimply
,
mlr_pipeops_mutate
,
mlr_pipeops_nearmiss
,
mlr_pipeops_nmf
,
mlr_pipeops_nop
,
mlr_pipeops_ovrsplit
,
mlr_pipeops_ovrunite
,
mlr_pipeops_pca
,
mlr_pipeops_proxy
,
mlr_pipeops_quantilebin
,
mlr_pipeops_randomprojection
,
mlr_pipeops_randomresponse
,
mlr_pipeops_regravg
,
mlr_pipeops_removeconstants
,
mlr_pipeops_renamecolumns
,
mlr_pipeops_replicate
,
mlr_pipeops_rowapply
,
mlr_pipeops_scale
,
mlr_pipeops_scalemaxabs
,
mlr_pipeops_scalerange
,
mlr_pipeops_select
,
mlr_pipeops_smote
,
mlr_pipeops_smotenc
,
mlr_pipeops_spatialsign
,
mlr_pipeops_subsample
,
mlr_pipeops_targetinvert
,
mlr_pipeops_targetmutate
,
mlr_pipeops_targettrafoscalerange
,
mlr_pipeops_textvectorizer
,
mlr_pipeops_threshold
,
mlr_pipeops_tomek
,
mlr_pipeops_tunethreshold
,
mlr_pipeops_unbranch
,
mlr_pipeops_updatetarget
,
mlr_pipeops_vtreat
,
mlr_pipeops_yeojohnson
Examples
library("mlr3")
dat = iris
set.seed(1)
dat$date = sample(
seq(as.POSIXct("2020-02-01"), to = as.POSIXct("2020-02-29"), by = "hour"), size = 150L
)
task = TaskClassif$new("iris_date", backend = dat, target = "Species")
pop = po("datefeatures", param_vals = list(cyclic = FALSE, minute = FALSE, second = FALSE))
pop$train(list(task))
#> $output
#>
#> ── <TaskClassif> (150x14) ──────────────────────────────────────────────────────
#> • Target: Species
#> • Target classes: setosa (33%), versicolor (33%), virginica (33%)
#> • Properties: multiclass
#> • Features (13):
#> • int (8): date.day_of_month, date.day_of_week, date.day_of_year, date.hour,
#> date.month, date.quarter, date.week_of_year, date.year
#> • dbl (4): Petal.Length, Petal.Width, Sepal.Length, Sepal.Width
#> • lgl (1): date.is_day
#>
pop$state
#> $dt_columns
#> [1] "date"
#>
#> $affected_cols
#> [1] "Petal.Length" "Petal.Width" "Sepal.Length" "Sepal.Width" "date"
#>
#> $intasklayout
#> Key: <id>
#> id type
#> <char> <char>
#> 1: Petal.Length numeric
#> 2: Petal.Width numeric
#> 3: Sepal.Length numeric
#> 4: Sepal.Width numeric
#> 5: date POSIXct
#>
#> $outtasklayout
#> Key: <id>
#> id type
#> <char> <char>
#> 1: Petal.Length numeric
#> 2: Petal.Width numeric
#> 3: Sepal.Length numeric
#> 4: Sepal.Width numeric
#> 5: date.day_of_month integer
#> 6: date.day_of_week integer
#> 7: date.day_of_year integer
#> 8: date.hour integer
#> 9: date.is_day logical
#> 10: date.month integer
#> 11: date.quarter integer
#> 12: date.week_of_year integer
#> 13: date.year integer
#>
#> $outtaskshell
#> Empty data.table (0 rows and 14 cols): Species,Petal.Length,Petal.Width,Sepal.Length,Sepal.Width,date.year...
#>