Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
76 changes: 58 additions & 18 deletions Report/create_reports.R
Original file line number Diff line number Diff line change
Expand Up @@ -32,20 +32,30 @@ state_geos = locations %>%
filter(nchar(.data$geo_value) == 2) %>%
pull(.data$geo_value)
signals = c("confirmed_incidence_num",
"deaths_incidence_num")
"deaths_incidence_num",
"confirmed_admissions_covid_1d")

predictions_cards = get_covidhub_predictions(forecasters,
signal = signals,
ahead = 1:28,
geo_values = state_geos,
verbose = TRUE,
use_disk = TRUE)
use_disk = TRUE) %>%
filter(!(incidence_period == "epiweek" & ahead > 4))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just curious about this addition. It looks like this wasn't here before, yet we were still only saving aheads 1-4 for epiweek predictions (cases and deaths). I thought Jed made this cutoff elsewhere.

Copy link
Collaborator Author

@nmdefries nmdefries Jun 9, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By default, get_covidhub_predictions uses ahead = 1:4 for both day and epiweek forecasts. However, daily forecasts actually go up to aheads of 28. To get those without getting epiweek forecasts more than 4 weeks ahead, I switched the ahead setting to 1-28 and added the filter.

We could do two separate calls to get_covidhub_predictions here, one for cases + deaths and one for hospitalizations, with different ahead settings. However the underlying get_forecaster_predictions_alt downloads all forecast files every time it's run (are you aware of any particular reason for this?), so the memory/speed tradeoff is poor at the moment.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are you aware of any particular reason for this?

Ah, perhaps because this was originally intended to be run in the GitHub Actions, the files wouldn't persist between sessions anyway.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, that makes sense. And yes I believe that is the case re: downloading files.


predictions_cards = predictions_cards %>%
filter(!is.na(predictions_cards$target_end_date))
predictions_cards = predictions_cards %>% filter(target_end_date < today())
filter(!is.na(predictions_cards$target_end_date)) %>%
filter(target_end_date < today())

# Only accept forecasts made Monday or earlier
# For epiweek predictions, only accept forecasts made Monday or earlier.
# target_end_date is the date of the last day (Saturday) in the epiweek
# For daily predictions, accept any forecast where the target_end_date is later
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason we aren't using the "Monday or earlier" cutoff for hospitalization data?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The hospitalization forecasts are produced for every day following the forecast date; the target is N day ahead inc hosp. My understanding is that the "Monday or earlier" cutoff is only relevant for weekly forecasts, since we want to make sure that forecasts for a week aren't made with partial information for that week (i.e. it's easy to predict cases for a week if you know the values for 6 out of 7 days for that week). Will check with Dan.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This approach matches Dan's understanding.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it, thanks.

# than the forecast_date.
predictions_cards = predictions_cards %>%
filter(target_end_date - (forecast_date + 7 * ahead) >= -2)
filter(
(incidence_period == "epiweek" & target_end_date - (forecast_date + 7 * ahead) >= -2) |
(incidence_period == "day" & target_end_date > forecast_date)
)

# And only a forecaster's last forecast if multiple were made
predictions_cards = predictions_cards %>%
Expand Down Expand Up @@ -91,22 +101,52 @@ state_scores = evaluate_covid_predictions(state_predictions,
geo_type = "state")

source("score.R")
print("Saving state confirmed incidence...")
save_score_cards(state_scores, "state", signal_name = "confirmed_incidence_num",
output_dir = opt$dir)
print("Saving state deaths incidence...")
save_score_cards(state_scores, "state", signal_name = "deaths_incidence_num",
output_dir = opt$dir)
if ( "confirmed_incidence_num" %in% unique(state_scores$signal)) {
print("Saving state confirmed incidence...")
save_score_cards(state_scores, "state", signal_name = "confirmed_incidence_num",
output_dir = opt$dir)
} else {
warning("State confirmed incidence should generally be available. Please
verify that you expect not to have any cases incidence forecasts")
}
if ( "deaths_incidence_num" %in% unique(state_scores$signal)) {
print("Saving state deaths incidence...")
save_score_cards(state_scores, "state", signal_name = "deaths_incidence_num",
output_dir = opt$dir)
} else {
warning("State deaths incidence should generally be available. Please
verify that you expect not to have any deaths incidence forecasts")
}
if ( "confirmed_admissions_covid_1d" %in% unique(state_scores$signal)) {
print("Saving state hospitalizations...")
save_score_cards(state_scores, "state", signal_name = "confirmed_admissions_covid_1d",
output_dir = opt$dir)
}

print("Evaluating national forecasts")
# COVIDcast does not return national level data, using CovidHubUtils instead
nation_scores = evaluate_chu(nation_predictions, signals, err_measures)

print("Saving nation confirmed incidence...")
save_score_cards(nation_scores, "nation",
signal_name = "confirmed_incidence_num", output_dir = opt$dir)
print("Saving nation deaths incidence...")
save_score_cards(nation_scores, "nation", signal_name = "deaths_incidence_num",
output_dir = opt$dir)
if ( "confirmed_incidence_num" %in% unique(state_scores$signal)) {
print("Saving nation confirmed incidence...")
save_score_cards(nation_scores, "nation",
signal_name = "confirmed_incidence_num", output_dir = opt$dir)
} else {
warning("Nation confirmed incidence should generally be available. Please
verify that you expect not to have any cases incidence forecasts")
}
if ( "deaths_incidence_num" %in% unique(state_scores$signal)) {
print("Saving nation deaths incidence...")
save_score_cards(nation_scores, "nation", signal_name = "deaths_incidence_num",
output_dir = opt$dir)
} else {
warning("Nation deaths incidence should generally be available. Please
verify that you expect not to have any deaths incidence forecasts")
}
if ( "confirmed_admissions_covid_1d" %in% unique(state_scores$signal)) {
print("Saving nation hospitalizations...")
save_score_cards(nation_scores, "nation", signal_name = "confirmed_admissions_covid_1d",
output_dir = opt$dir)
}

print("Done")
32 changes: 19 additions & 13 deletions Report/score.R
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,8 @@ library("assertthat")

save_score_cards = function(score_card, geo_type = c("state", "nation"),
signal_name = c("confirmed_incidence_num",
"deaths_incidence_num"),
"deaths_incidence_num",
"confirmed_admissions_covid_1d"),
output_dir = ".") {
signal_name = match.arg(signal_name)
geo_type = match.arg(geo_type)
Expand All @@ -13,11 +14,11 @@ save_score_cards = function(score_card, geo_type = c("state", "nation"),
assert_that(signal_name %in% signals,
msg = "signal is not in score_card")
score_card = score_card %>% filter(signal == signal_name)
if (signal_name == "confirmed_incidence_num") {
sig_suffix = "cases"
} else {
sig_suffix = "deaths"
}

type_map <- list("confirmed_incidence_num" = "cases",
"deaths_incidence_num" = "deaths",
"confirmed_admissions_covid_1d" = "hospitalizations")
sig_suffix <- type_map[[signal_name]]
output_file_name = file.path(output_dir,
paste0("score_cards_", geo_type, "_",
sig_suffix, ".rds"))
Expand All @@ -37,20 +38,25 @@ save_score_cards = function(score_card, geo_type = c("state", "nation"),

evaluate_chu = function(predictions, signals, err_measures) {
allowed_signals = c("confirmed_incidence_num",
"deaths_incidence_num")
"deaths_incidence_num",
"confirmed_admissions_covid_1d")
assert_that(all(signals %in% allowed_signals),
msg = paste("Signal not allowed:",
setdiff(signals, allowed_signals)))

target_map <- list("confirmed_incidence_num" = "inc case",
"deaths_incidence_num" = "inc death",
"confirmed_admissions_covid_1d" = "inc hosp")
source_map <- list("confirmed_incidence_num" = "JHU",
"deaths_incidence_num" = "JHU",
"confirmed_admissions_covid_1d" = "HealthData")
scores = c()
for (signal_name in signals) {
preds_signal = predictions %>%
filter(signal == signal_name)
if (signal_name == "confirmed_incidence_num") {
jhu_signal = "inc case"
} else {
jhu_signal = "inc death"
}
chu_truth = covidHubUtils::load_truth("JHU", jhu_signal)
signal <- target_map[[signal_name]]
source <- source_map[[signal_name]]
chu_truth = covidHubUtils::load_truth(source, signal)
chu_truth = chu_truth %>%
rename(actual = value) %>%
select(-c(model,
Expand Down