11# Sliding computations {#sec-sliding}
22
33A central tool in the ` {epiprocess} ` package is ` epi_slide() ` , which is based
4- on the powerful functionality provided in the
4+ on the powerful functionality provided in the
55[ ` slider ` ] ( https://cran.r-project.org/web/packages/slider ) package. In
66` {epiprocess} ` , to "slide" means to apply a computation---represented as a
77function or formula---over a sliding/rolling data window. Suitable
@@ -10,7 +10,7 @@ groupings can always be achieved by a preliminary call to `group_by()`.
1010By default, the meaning of one time step is inferred from the ` time_value `
1111column of the ` epi_df ` object under consideration, based on the way this column
1212understands addition and subtraction. For example, if the time values are coded
13- as ` Date ` objects, then one time step is one day, since
13+ as ` Date ` objects, then one time step is one day, since
1414` as.Date("2022-01-01") + 1 ` equals ` as.Date("2022-01-02") ` . Alternatively, the time step can be specified
1515manually in the call to ` epi_slide() ` ; you can read the documentation for more
1616details. Furthermore, the alignment of the running window used in ` epi_slide() `
@@ -51,10 +51,10 @@ order to smooth the signal, by passing in a formula for the first argument of
5151` epi_slide() ` . To do this computation per state, we first call ` group_by() ` .
5252
5353``` {r}
54- x %>%
55- group_by(geo_value) %>%
56- epi_slide(~ mean(.x$cases), before = 6 ) %>%
57- ungroup()
54+ x %>%
55+ group_by(geo_value) %>%
56+ epi_slide(~ mean(.x$cases), .window_size = 7 ) %>%
57+ ungroup()
5858```
5959
6060The formula specified has access to all non-grouping columns present in the
@@ -65,9 +65,9 @@ default. We can of course change this post hoc, or we can instead specify a new
6565name up front using the ` new_col_name ` argument:
6666
6767``` {r}
68- x %>%
68+ x %>%
6969 group_by(geo_value) %>%
70- epi_slide(~ mean(.x$cases), before = 6, new_col_name = "cases_7dav") %>%
70+ epi_slide(~ mean(.x$cases), .window_size = 7, . new_col_name = "cases_7dav") %>%
7171 ungroup()
7272```
7373
@@ -81,7 +81,7 @@ Like in `group_modify()`, there are alternative names for these variables as
8181well: ` . ` can be used instead of ` .x ` , ` .y ` instead of ` .group_key ` , and ` .z `
8282instead of ` .ref_time_value ` .
8383
84- ## Slide with a function
84+ ## Slide with a function
8585
8686We can also pass a function for the first argument in ` epi_slide() ` . In this
8787case, the passed function must accept the following arguments:
@@ -97,10 +97,10 @@ receives to `f`.
9797Recreating the last example of a 7-day trailing average:
9898
9999``` {r}
100- x %>%
101- group_by(geo_value) %>%
102- epi_slide(function(x, gk, rtv) mean(x$cases),
103- before = 6, new_col_name = "cases_7dav") %>%
100+ x %>%
101+ group_by(geo_value) %>%
102+ epi_slide(function(x, gk, rtv) mean(x$cases),
103+ .window_size = 7, . new_col_name = "cases_7dav") %>%
104104 ungroup()
105105```
106106
@@ -113,9 +113,9 @@ to a computation in which we can access any columns of `x` by name, just as we
113113would in a call to ` dplyr::mutate() ` , or any of the ` dplyr ` verbs. For example:
114114
115115``` {r}
116- x <- x %>%
117- group_by(geo_value) %>%
118- epi_slide(cases_7dav = mean(cases), before = 6 ) %>%
116+ x <- x %>%
117+ group_by(geo_value) %>%
118+ epi_slide(cases_7dav = mean(cases), .window_size = 7 ) %>%
119119 ungroup()
120120```
121121In addition to referring to individual columns by name, you can refer to the
@@ -128,7 +128,7 @@ top of the original counts.
128128#| code-fold: true
129129cols <- RColorBrewer::brewer.pal(7, "Set1")[-6]
130130ggplot(x, aes(x = time_value)) +
131- geom_col(aes(y = cases, fill = geo_value), alpha = 0.5,
131+ geom_col(aes(y = cases, fill = geo_value), alpha = 0.5,
132132 show.legend = FALSE) +
133133 scale_y_continuous(expand = expansion(c(0, 0.05))) +
134134 geom_line(aes(y = cases_7dav, col = geo_value), show.legend = FALSE) +
@@ -139,14 +139,14 @@ ggplot(x, aes(x = time_value)) +
139139 labs(x = "Date", y = "Reported COVID-19 cases")
140140```
141141
142- As we can see from the center top panel, it looks like Florida moved to weekly
142+ As we can see from the center top panel, it looks like Florida moved to weekly
143143reporting of COVID-19 cases in summer of 2021, while California occasionally reported negative cases counts!
144144
145145## Running a local forecaster {#sec-local-forecaster}
146146
147147As a more complex example, we preview some of the functionality of ` {epipredict} ` described in future chapters, and use a forecaster based on a
148148local (in time)
149- autoregression or "AR model". AR models can be fit in numerous ways
149+ autoregression or "AR model". AR models can be fit in numerous ways
150150(using base R
151151functions and various packages), but here we the ` arx_forecaster() ` , implemented in ` {epipredict} ` both
152152provides a more advanced example of sliding a function over an ` epi_df ` object,
@@ -165,46 +165,46 @@ considered in this vignette).
165165
166166``` {r eval=FALSE}
167167arx_forecaster <- function(
168- epi_df,
168+ epi_df,
169169 outcome, # the outcome column name in `epi_df`
170170 predictors, # a character vector, containing 1 or more predictors in `epi_df`
171- trainer = quantile_reg(),
171+ trainer = quantile_reg(),
172172 args_list = arx_args_list(
173- lags = c(0, 7, 14),
173+ lags = c(0, 7, 14),
174174 ahead = 7,
175175 quantile_levels = c(0.05, 0.95)
176176 )
177177)
178178
179179```
180180
181- We go ahead and slide this AR forecaster over the working ` epi_df ` of COVID-19
182- cases. Note that we actually model the ` cases_7dav ` column, to operate on the
181+ We go ahead and slide this AR forecaster over the working ` epi_df ` of COVID-19
182+ cases. Note that we actually model the ` cases_7dav ` column, to operate on the
183183scale of smoothed COVID-19 cases. This is clearly equivalent, up to a constant,
184184to modeling weekly sums of COVID-19 cases.
185185
186186``` {r, warning=FALSE}
187187fc_time_values <- seq(
188- from = as.Date("2020-06-01"),
189- to = as.Date("2021-12-01"),
188+ from = as.Date("2020-06-01"),
189+ to = as.Date("2021-12-01"),
190190 by = "1 months")
191191
192192fcasts <- epi_slide(
193- x,
194- ~ arx_forecaster(
195- epi_data = .x,
196- outcome = "cases_7dav",
197- predictors = "cases_7dav",
198- trainer = quantile_reg(),
199- args_list = arx_args_list(ahead = 7))$predictions,
200- before = 119,
201- ref_time_values = fc_time_values,
202- new_col_name = "fc")
193+ x,
194+ .f = ~ arx_forecaster(
195+ epi_data = .x,
196+ outcome = "cases_7dav",
197+ predictors = "cases_7dav",
198+ trainer = quantile_reg(),
199+ args_list = arx_args_list(ahead = 7))$predictions,
200+ .window_size = 120,
201+ . ref_time_values = fc_time_values,
202+ . new_col_name = "fc")
203203
204204# grab just the relevant columns, and make them easier to plot
205205fcasts <- fcasts %>%
206- select(geo_value, time_value, cases_7dav,
207- contains("_distn"), fc_target_date ) %>%
206+ unpack(fc, names_sep = "_") %>%
207+ select(geo_value, time_value, cases_7dav, starts_with("fc") ) %>%
208208 pivot_quantiles_wider(contains("_distn"))
209209fcasts
210210```
@@ -216,29 +216,29 @@ that correspond to the date the forecast is for (rather than the date it was mad
21621695\% prediction band.[ ^ 1 ]
217217
218218[ ^ 1 ] : If instead we had set ` as_list_col = TRUE `
219- in the call to ` epi_slide() ` , then we would have gotten a list column ` fc ` ,
219+ in the call to ` epi_slide() ` , then we would have gotten a list column ` fc ` ,
220220where each element of ` fc ` contains these results.
221221
222222To finish off, we plot the forecasts at some times (spaced out by a few months)
223- over the last year, at multiple horizons: 7, 14, 21, and 28 days ahead. To do
224- so, we encapsulate the process of generating forecasts into a simple function,
223+ over the last year, at multiple horizons: 7, 14, 21, and 28 days ahead. To do
224+ so, we encapsulate the process of generating forecasts into a simple function,
225225so that we can call it a few times.
226226
227227``` {r, message = FALSE, warning = FALSE}
228228k_week_ahead <- function(ahead = 7) {
229229 epi_slide(
230- x,
230+ x,
231231 ~ arx_forecaster(
232- epi_data = .x,
233- outcome = "cases_7dav",
234- predictors = "cases_7dav",
235- trainer = quantile_reg(),
236- args_list = arx_args_list(ahead = ahead))$predictions,
237- before = 119,
238- ref_time_values = fc_time_values,
239- new_col_name = "fc") %>%
240- select(geo_value, time_value, cases_7dav, contains("_distn"),
241- fc_target_date ) %>%
232+ epi_data = .x,
233+ outcome = "cases_7dav",
234+ predictors = "cases_7dav",
235+ trainer = quantile_reg(),
236+ args_list = arx_args_list(ahead = ahead))$predictions,
237+ .window_size = 120,
238+ . ref_time_values = fc_time_values,
239+ . new_col_name = "fc") %>%
240+ unpack(fc, names_sep = "_") %>%
241+ select(geo_value, time_value, cases_7dav, starts_with("fc") ) %>%
242242 pivot_quantiles_wider(contains("_distn"))
243243}
244244
@@ -247,15 +247,16 @@ z <- map(c(7, 14, 21, 28), k_week_ahead) %>% list_rbind()
247247```
248248
249249Then we can plot the on top of the observed data
250+
250251``` {r, fig.width=8, fig.height=9}
251252#| code-fold: true
252253ggplot(z) +
253- geom_line(data = x, aes(x = time_value, y = cases_7dav), color = "gray50") +
254+ geom_line(data = x, aes(x = time_value, y = cases_7dav), color = "gray50") +
254255 geom_ribbon(aes(x = fc_target_date, ymin = `0.05`, ymax = `0.95`,
255- group = time_value, fill = geo_value), alpha = 0.4) +
256- geom_line(aes(x = fc_target_date, y = `0.5`, group = time_value)) +
257- geom_point(aes(x = fc_target_date, y = `0.5`, group = time_value), size = 0.5) +
258- #geom_vline(data = tibble(x = fc_time_values), aes(xintercept = x),
256+ group = time_value, fill = geo_value), alpha = 0.4) +
257+ geom_line(aes(x = fc_target_date, y = `0.5`, group = time_value)) +
258+ geom_point(aes(x = fc_target_date, y = `0.5`, group = time_value), size = 0.5) +
259+ #geom_vline(data = tibble(x = fc_time_values), aes(xintercept = x),
259260 # linetype = 2, alpha = 0.5) +
260261 facet_wrap(vars(geo_value), scales = "free_y", nrow = 3) +
261262 scale_y_continuous(expand = expansion(c(0, 0.05))) +
@@ -269,22 +270,22 @@ spotty. At various points in time, we can see that its forecasts are volatile
269270(its point predictions are all over the place), or overconfident (its bands are
270271too narrow), or both at the same time. This is only meant as a simple demo and
271272not entirely unexpected given the way the AR model is set up. The
272- [ ` epipredict ` ] ( https://cmu-delphi.github.io/epipredict ) package,
273- offers a suite of predictive modeling tools
274- that improve on many of the shortcomings of the above simple AR model (simply
273+ [ ` epipredict ` ] ( https://cmu-delphi.github.io/epipredict ) package,
274+ offers a suite of predictive modeling tools
275+ that improve on many of the shortcomings of the above simple AR model (simply
275276using all states for training rather than 6 is a huge improvement).
276277
277278Second, the AR forecaster here is using finalized data, meaning, it uses the
278279latest versions of signal values (reported COVID-19 cases) available, for both
279280training models and making predictions historically. However, this is not
280281reflective of the provisional nature of the data that it must cope with in a
281282true forecast task. Training and making predictions on finalized data can lead
282- to an overly optimistic sense of accuracy; see, for example,
283+ to an overly optimistic sense of accuracy; see, for example,
283284[ @McDonaldBien2021 ] and references
284285therein. Fortunately, the ` epiprocess ` package provides a data structure called
285286` epi_archive ` that can be used to store all data revisions, and furthermore, an
286287` epi_archive ` object knows how to slide computations in the correct
287288version-aware sense (for the computation at each reference time $t$, it uses
288- only data that would have been available as of $t$). We will revisit this
289- example in the [ archive
289+ only data that would have been available as of $t$). We will revisit this
290+ example in the [ archive
290291vignette] ( https://cmu-delphi.github.io/epiprocess/articles/archive.html ) .
0 commit comments