Age/Sex Pyramids

The plot_pyramid function can be used for plotting age pyramids split by gender. It is designed to work with un-aggregated data, i.e. patient level linelist data.

We’ll use the influenza A H7N9 dataset from the {outbreaks} package for our examples.

library(dplyr)
library(ggplot2)
library(outbreaks) 
library(epivis)

# set a ggplot2 theme of your preference
theme_set(theme_light(base_size = 12))
df_flu <- as_tibble(outbreaks::fluH7N9_china_2013)

glimpse(df_flu)
#> Rows: 136
#> Columns: 8
#> $ case_id                 <fct> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14,…
#> $ date_of_onset           <date> 2013-02-19, 2013-02-27, 2013-03-09, 2013-03-1…
#> $ date_of_hospitalisation <date> NA, 2013-03-03, 2013-03-19, 2013-03-27, 2013-…
#> $ date_of_outcome         <date> 2013-03-04, 2013-03-10, 2013-04-09, NA, 2013-…
#> $ outcome                 <fct> Death, Death, Death, NA, Recover, Death, Death…
#> $ gender                  <fct> m, m, f, f, f, f, m, m, m, m, m, f, f, m, f, m…
#> $ age                     <fct> 87, 27, 35, 45, 48, 32, 83, 38, 67, 48, 64, 52…
#> $ province                <fct> Shanghai, Shanghai, Anhui, Jiangsu, Jiangsu, J…

We can see the data has both an age and gender column, with the ‘levels’ of the latter being "m" and "f" for male or female.

To plot the pyramid chart we must supply the age and gender column to the plot_pyramid function along with the 2 levels of the gender column that represent male and female:

plot_pyramid(
  df = df_flu,
  age_col = age,
  gender_col = gender,
  gender_levels = c("m", "f")
)

A note on missing data: by default a caption will be added detailing any missing data not represented on the plot. You can remove this with add_missing_cap = FALSE. Missing data is any age that cannot be parsed as a number (including NAs) and/or any gender that does not match a binary level provided (including NAs).

Modifying the output

Age binning

If make_age_groups = TRUE (default) the age variable will be forced into numeric format (if not already) and then binned into age groups. Change the age breaks with the age_breaks argument:

plot_pyramid(
  df = df_flu,
  age_col = age,
  gender_col = gender,
  gender_levels = c("m", "f"),
  age_breaks = c(seq(0, 100, 5), Inf) # 5 year intervals to 100 then 100+
)

Age group labels are formatted by default with epivis::label_breaks(age_breaks) but you can supply your own labels if required via the age_labels argument (labels must be same length as breaks).

Notice all age groups appear on the plot by default, whether there are any observations of this group or not. To remove groups with no observations use drop_age_levels = TRUE.

If you have data that has already been binned by age and you want to use these bins, you can pass this column and set make_age_groups = FALSE.

Facetting

You can facet the graphic by supplying a facet_col:

df_flu %>% 
  mutate(outcome = forcats::fct_explicit_na(outcome, "Unknown")) %>% 
  plot_pyramid(
  age_col = age,
  gender_col = gender,
  facet_col = outcome, # facet by patient outcome
  facet_ncol = 2,
  gender_levels = c("m", "f")
)

Labelling

There are arguments to format axis and legend labels, as well as add data labels to the plot:

plot_pyramid(
  df = df_flu,
  age_col = age,
  gender_col = gender,
  gender_levels = c("m", "f"),
  gender_labs = c("Hommes", "Femmes"), # must respect the same order as the breaks
  x_lab = "Tranche d'Âge",
  y_lab = "Nombre de Personnes",
  show_data_labs = TRUE,
  add_missing_cap = FALSE # remove caption detailing missing data
)

Notice smaller value labels are places outside the bar to improve legibility. You can change the threshold for moving a label with lab_nudge_factor. The default value is 5. Increasing the number increases the distance from the max value required to move a label outside the bar. So if you want to keep all labels inside the bars, increase to a higher value:

plot_pyramid(
  df = df_flu,
  age_col = age,
  gender_col = gender,
  gender_levels = c("m", "f"),
  gender_labs = c("Hommes", "Femmes"), # must respect the same order as the breaks
  x_lab = "Tranche d'Âge",
  y_lab = "Nombre de Personnes",
  show_data_labs = TRUE,
  lab_nudge_factor = 50 # change from 5 to 50
)

Theming

Although plot_pyramid has built-in theme defaults, because the function returns a ggplot object, you can easily reset any default by adding your own themes, palettes etc to the object:

plot_pyramid(
  df = df_flu,
  age_col = age,
  gender_col = gender,
  gender_levels = c("m", "f"),
  gender_labs = c("Hommes", "Femmes"), # must respect the same order as the breaks
  colours = c("darkcyan", "firebrick"), # change bar colours
  x_lab = "Tranche d'Âge",
  y_lab = "Nombre de Personnes",
  show_data_labs = TRUE,
  lab_nudge_factor = 20, 
  lab_out_col = "white"
) +
  hrbrthemes::theme_ft_rc() +
  labs(title = "Add a title", subtitle = "and a subtitle") +
  theme(
    legend.position = "top",
    axis.text.x = element_blank(), 
    axis.title.x = element_text(hjust = .5),
    axis.title.y = element_text(hjust = .5)
  )