---
title: "Epidemic Curves"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{Epidemic Curves}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---


```{r, include = FALSE}
knitr::opts_chunk$set(
warning = FALSE,
message = FALSE,
collapse = TRUE,
comment = "#>",
fig.width = 8,
fig.height = 4.5,
fig.retina = 2
)
```

The `plot_epicurve` function can be used for plotting incidence over time, commonly referred to as an epidemic curve or epicurve. It is designed to work with un-aggregated data, i.e. patient level linelist data.

We'll use a simulated ebola outbreak dataset from the [`{outbreaks}`](http://www.repidemicsconsortium.org/outbreaks/) package for our examples.

```{r setup}
library(dplyr)
library(ggplot2)
library(epivis)
library(outbreaks) 

# set a ggplot2 theme of your preference
theme_set(theme_light(base_size = 12))
```

```{r load-data}
df_ebola <- as_tibble(outbreaks::ebola_sim_clean$linelist)

glimpse(df_ebola)
```

You can plot a simple curve by providing the dataset and the bare column name containing the dates you want to plot:

```{r}
plot_epicurve(
  df_ebola, 
  date_col = date_of_onset
)
```

### Working with dates and weeks

Commonly incidence is plotted by week, not day. Here we have daily data but we can ask `plot_epicurve` to re-calculate incidence by week instead by setting `floor_date_week = TRUE`. You can choose the first day of the week with `week_start` (defaults to 1 (Monday)). Finally we can set date axis labels to the week number rather than date.

```{r}
plot_epicurve(
  df_ebola, 
  date_col = date_of_onset, 
  floor_date_week = TRUE, 
  week_start = 1,
  label_weeks = TRUE,
  date_lab = "Week of onset", 
  y_lab = "Incidence"
)
```

A secondary date axis can be added on top with default ggplot labels via `sec_date_axis = TRUE`. When this is done, week labels will be reduced to only the week number, as year information will be displayed in the top labels. You can control the number of date labels on the bottom axis with the `date_breaks` argument.

```{r}
plot_epicurve(
  df_ebola, 
  date_col = date_of_onset, 
  floor_date_week = TRUE, 
  label_weeks = TRUE,
  date_breaks = "3 weeks",
  sec_date_axis = TRUE,
  date_lab = "Week of onset", 
  y_lab = "Incidence"
)
```


If you want to display more week labels and avoid overlapping, you can use the `dodge_x_labs` helper function:

```{r}
plot_epicurve(
  df_ebola, 
  date_col = date_of_onset, 
  floor_date_week = TRUE,
  label_weeks = TRUE,
  sec_date_axis = TRUE,
  date_breaks = "1 week", 
  date_lab = "Week of onset", 
  y_lab = "Incidence"
) +
  dodge_x_labs()
```

Sometimes during epidemics a week or two may pass with no cases reported. By default this would not appear on the epicurve as there is no data for the latest week. However you may want to explicitly show this on the graphic to effectively communicate that the data is up-to-date and there are 0 cases in the latest week(s). To do this add a `date_max` argument. This will force the date axis to extend to that point:

```{r}
plot_epicurve(
  df_ebola, 
  date_col = date_of_onset, 
  floor_date_week = TRUE,
  label_weeks = TRUE,
  sec_date_axis = TRUE,
  date_max = "2015-06-01", # extend axis to June 2015
  date_breaks = "1 week", 
  date_lab = "Week of onset", 
  y_lab = "Incidence"
) +
  dodge_x_labs()
```

### Grouping Data

You may want to visualise a third variable by filling the columns with varying colours. We can do this by adding a `group_col` argument. 

```{r}
plot_epicurve(
  df_ebola, 
  date_col = date_of_onset, 
  group_col = outcome,
  floor_date_week = TRUE,  
  label_weeks = TRUE,
  date_breaks = "2 weeks",
  sec_date_axis = TRUE,
  date_lab = "Week of onset", 
  y_lab = "Incidence",
  group_lab = "Outcome"
)
```

By default, `NA`s will be plotted with a grey colour. You can change this with the `group_na_colour` argument. Alternatively, you may want to recode `NA`s in your data to a more meaningful label. Because the `outcome` column in this dataset is a factor, we can recode `NA`s with `forcats::fct_explicit_na`.

```{r}
df_ebola %>% 
  mutate(outcome = forcats::fct_explicit_na(outcome, "Unknown")) %>% 
  plot_epicurve(
    date_col = date_of_onset, 
    group_col = outcome,
    floor_date_week = TRUE,  
    label_weeks = TRUE,
    sec_date_axis = FALSE,
    date_lab = "Week of onset", 
    y_lab = "Incidence",
    group_lab = "Outcome"
  )
```

Note: here we 'pipe' the modified dataset into the first argument of the `plot_epicurve` function.

### Adding proportion lines

It can be useful to visualise a proportion or ratio over time on top of the epicurve, case fatality ratio being a good example. This can be done with `plot_epicurve` by providing a proportion column along with the numerator and denominator values. To plot the CFR here we would use the `outcome` column with a numerator of `"Death"` and a denominator of `c("Death", "Recover")` (ignoring unknown outcomes in the calculation):

```{r}
df_ebola %>% 
  mutate(outcome = forcats::fct_explicit_na(outcome, "Unknown")) %>% 
  plot_epicurve(
    date_col = date_of_onset, 
    group_col = outcome,
    prop_col = outcome,
    prop_numer = "Death",
    prop_denom = c("Death", "Recover"),
    floor_date_week = TRUE,  
    label_weeks = TRUE,
    sec_date_axis = FALSE,
    date_lab = "Week of onset", 
    y_lab = "Incidence",
    group_lab = "Outcome",
    prop_lab = "CFR"
  )
```

See also: `prop_line_colour` and `prop_line_size` argument to modify the colour and line thickness, respectively.


### Faceting

Facets can be used to split the epicurves by a categorical variable, often a location. In this case we can facet by `hospital` simply by adding a `facet_col = hospital` argument. We also set the facet columns to 2 and reduce the CFR line width due to smaller plot sizes:

```{r fig.height=8, fig.width=8}
df_ebola %>% 
  mutate(outcome = forcats::fct_explicit_na(outcome, "Unknown")) %>% 
  plot_epicurve(
    date_col = date_of_onset, 
    group_col = outcome,
    facet_col = hospital,
    facet_ncol = 2,
    facet_labs = label_wrap_gen(width = 30),
    prop_col = outcome,
    prop_numer = "Death",
    prop_denom = c("Death", "Recover"),
    prop_line_size = .5,
    floor_date_week = TRUE,  
    label_weeks = TRUE,
    date_breaks = "4 weeks",
    sec_date_axis = TRUE,
    date_lab = "Week of onset", 
    y_lab = "Incidence",
    group_lab = "Outcome",
    prop_lab = "CFR"
  )
```

### Theming

Although `plot_epicurve` has built-in theme defaults, because the function returns a ggplot object, you can easily reset any default by adding your own themes, palettes etc to the object:

```{r}
library(hrbrthemes) # install.packages("hrbrthemes") for additional ggplot2 themes

df_ebola %>% 
  mutate(outcome = forcats::fct_explicit_na(outcome, "Unknown")) %>% 
  plot_epicurve(
    date_col = date_of_onset, 
    group_col = outcome,
    prop_col = outcome,
    prop_numer = "Death",
    prop_denom = c("Death", "Recover"),
    prop_line_colour = "orange",
    prop_line_size = 0.5,
    floor_date_week = TRUE, 
    label_weeks = TRUE,
    sec_date_axis = FALSE,
    date_lab = "Week of onset", 
    y_lab = "Incidence",
    group_lab = "Outcome",
    prop_lab = "CFR"
  ) +
  scale_fill_manual(values = c("#486090FF", "#6078A8FF", "#7890A8FF")) + 
  hrbrthemes::theme_ft_rc() + 
  theme(
    panel.grid.major.x = element_blank(), 
    panel.grid.minor.x = element_blank(), 
    axis.title.y = element_text(hjust = .5)
  )
```