The purpose of this repository is to show off the benefit of using
knitr
and Rmarkdown
to make automated reports, complete with
figures, tables and mathematical equations. The report we will be
creating in this demo consists of some COVID19 statistics (this is not
meant to be a comprehensive review of COVID19 statistics). Due to the
quickly changing nature of these statistics, we want to be able to
update the report without having to copy and paste a bunch of tables and
figures into a Word (or other) document every time we want to share it.
Using knitr
and Rmarkdown
will make updating our report as simple as
clicking a button.
Each Rmarkdown document has a short YAML header containing some metadata
about the document. I’ll only discuss the options we use in this
document, but you can find a bunch of other document formats and options
on the RStudio Formats
page. The only required option is output
.
-
title: Title of the document
-
author: Name(s) of the author(s)
-
date: Whatever goes between the quote marks is what will be displayed as the document date. In our case we put an R command,
format(Sys.Date(), '%d %B, %Y)
, that uses the current date. -
output: This defines the type of document that will be generated by our report. The outputs I most often generate are
md_document
,pdf_document
andword_document
.
The setup code block is where you want to do any setup of your global
environment that doesn’t directly pertain to the document. This is where
I set knitr
options, load packages, pre-process my data, etc…
The rest of the document contains the formatted text and code blocks that generate the results, figures and tables that make up your report.
In this short report, we will be using the sars2pack package by Sean Davis and several other contributors. As I mentioned above, this isn’t a very good report, but it does allow us to cover the technical basics of how to work with R markdown. I’ll also include a few other off-topic comments just for fun.
Lets start with a simple figure showing the cumulative number of cases in the US between the end of January and today (Figure 1), and then we’ll take a look at the cumulative number of cases by state for one week ago (Figure 2).
Next lets look at the state with the highest number of average daily
cases each month. Tables are fairly easy to add to an Rmd file with the
kable()
function in the knitr
package.
Year | Month | State | Incidence |
---|---|---|---|
2020 | January | California | 0.3 |
2020 | February | California | 0.3 |
2020 | March | New York | 2446.9 |
2020 | April | New York | 7347.5 |
2020 | May | Illinois | 2076.3 |
2020 | June | California | 3889.5 |
2020 | July | Florida | 10031.8 |
2020 | August | California | 6640.7 |
2020 | September | Texas | 4509.9 |
2020 | October | Texas | 3852.3 |
Up until now we have shown raw counts and incidence, but in this section we will compare these statistics with incidence rates normalize by the population of each state. We have also removed some of the day-to-day variability by calculating a running 7-day average of the incidence rate.
The total numbers make DC look pretty good in the previous figure. It is important to note, however, that the population of DC is much smaller and denser than Maryland or Virginia. What does this look like if we normalize by population size?
In this subsection we will display a video showing the incidence rate across the country over time. In the coding demo, we’ll build this up in 4 steps as follows:
-
We’ll start by modifying the code from Figure 2 to display incidence rates.
-
Next we’ll use the
gganimate
package to create an animated gif showing the change in incidence rates in the US over the course of the pandemic. -
We’ll create a date sidebar to show the progression of time throughout the animation.
-
Lastly, we’ll use the
magick
package to append the two animated gifs together into a single gif.
Now that we have the Rmd file set up, all we have to do is knit the document and our code will fetch new results, generate new figures and tables, and put them all together in a new document. You can check out the repository to review the code here. If you do, you’ll also see the same report knitted as a markdown file. All it takes to knit an Rmd file into a different format is the modification of a few lines in the yaml header - usually. There are some format specific things you can do that won’t gracefully translate over to other formats.