In a recent opinion piece the New York Times included this spiral graph showing the development of confirmed COVID-19 cases in the United States since the beginning of the pandemic. This visualization has stirred some debate whether it is “the proper way” to display such data. Arguments have been exchanged why this visualization is particularly bad or why it might actually be well suitable for this use case.
I will let everyone have their opinion, however, I challenged myself to recreate this plot using R and especially the ggplot2 package. (Hint: I like it.)
Let’s start with loading the R packages we will use for creating this plot.
pacman::p_load("tidyverse", "ggtext", "here", "lubridate")
We use the COVID-19 Dataset by Our World in Data for the number of confirmed Coronavirus cases in the United States.
Since the original chart starts on January 1st, 2020, while the first cases in the U.S. had been registered by end of January 2020, we add all the days from January 1st to the first date in the data. We manage this using
tidyr::complete. (We could also insert all missing dates manually, but maybe we want to reproduce the chart for other countries?)
We calculate the day of the year as well as the year from the date variable. The day of the year will be our x values in the plot. We will use the year to group the data in order to display the information in a cyclic manner.
owid_url <- "https://github.com/owid/covid-19-data/blob/master/public/data/owid-covid-data.csv?raw=true" country <- "United States" covid <- read_csv(owid_url) covid_cases <- covid %>% filter(location == country) %>% select(date, new_cases, new_cases_smoothed) %>% arrange(date) %>% # Add the dates before the 1st confirmed case add_row(date = as_date("2020-01-01"), new_cases = 0, new_cases_smoothed = 0, .before = 1) %>% complete(date = seq(min(.$date), max(.$date), by = 1), fill = list(new_cases = 0, new_cases_smoothed = 0)) %>% mutate(day_of_year = yday(date), year = year(date) )
Since we want to display the data in a cyclic manner, we using a polar coordinate system with
coord_polar. The line is built from connecting each day via
geom_segment, which takes values for x (current day) and xend (next day)and y and yend - the latter ones will be set to an integer value (UNIX timestamp) with the function
p <- covid_cases %>% ggplot() + geom_segment(aes(x = day_of_year, xend = day_of_year + 1, y = as.POSIXct(date), yend = as.POSIXct(date))) + coord_polar() p