Skip to content

Bioinformatics-User-Forum/Automated-Reports-Presentation

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Introduction

The purpose of this repository is to show off the benefit of using knitr and Rmarkdown to make automated reports, complete with figures, tables and mathematical equations. The report we will be creating in this demo consists of some COVID19 statistics (this is not meant to be a comprehensive review of COVID19 statistics). Due to the quickly changing nature of these statistics, we want to be able to update the report without having to copy and paste a bunch of tables and figures into a Word (or other) document every time we want to share it. Using knitr and Rmarkdown will make updating our report as simple as clicking a button.

Rmarkdown document structure

Header

Each Rmarkdown document has a short YAML header containing some metadata about the document. I’ll only discuss the options we use in this document, but you can find a bunch of other document formats and options on the RStudio Formats page. The only required option is output.

  • title: Title of the document

  • author: Name(s) of the author(s)

  • date: Whatever goes between the quote marks is what will be displayed as the document date. In our case we put an R command, format(Sys.Date(), '%d %B, %Y), that uses the current date.

  • output: This defines the type of document that will be generated by our report. The outputs I most often generate are md_document, pdf_document and word_document.

Setup code block

The setup code block is where you want to do any setup of your global environment that doesn’t directly pertain to the document. This is where I set knitr options, load packages, pre-process my data, etc…

Formatted text and code blocks

The rest of the document contains the formatted text and code blocks that generate the results, figures and tables that make up your report.

COVID19 update

In this short report, we will be using the sars2pack package by Sean Davis and several other contributors. As I mentioned above, this isn’t a very good report, but it does allow us to cover the technical basics of how to work with R markdown. I’ll also include a few other off-topic comments just for fun.

Lets start with a simple figure showing the cumulative number of cases in the US between the end of January and today (Figure 1), and then we’ll take a look at the cumulative number of cases by state for one week ago (Figure 2).

Figure 1. Cumulative number of COVID19 cases in the US between Jan 22 and the present date

Figure 2. Total number of confirmed COVID19 cases in the US by state as of October 09

Next lets look at the state with the highest number of average daily cases each month. Tables are fairly easy to add to an Rmd file with the kable() function in the knitr package.

Table 1. States with the highest average daily incidence (confirmed cases) for each month.
Year Month State Incidence
2020 January California 0.3
2020 February California 0.3
2020 March New York 2446.9
2020 April New York 7347.5
2020 May Illinois 2076.3
2020 June California 3889.5
2020 July Florida 10031.8
2020 August California 6640.7
2020 September Texas 4509.9
2020 October Texas 3852.3

Compare raw incidence with incidence rates

Up until now we have shown raw counts and incidence, but in this section we will compare these statistics with incidence rates normalize by the population of each state. We have also removed some of the day-to-day variability by calculating a running 7-day average of the incidence rate.

Figure 3. Number of confirmed daily cases in Marylyand, DC and Virginia between January 22 and October 09

The total numbers make DC look pretty good in the previous figure. It is important to note, however, that the population of DC is much smaller and denser than Maryland or Virginia. What does this look like if we normalize by population size?

Figure 4. Incidence rate (confirmed cases) per 100,000 individuals averaged over the past seven days in Maryland, DC and Virginia between January 22 and October 09

A cool video

In this subsection we will display a video showing the incidence rate across the country over time. In the coding demo, we’ll build this up in 4 steps as follows:

  • We’ll start by modifying the code from Figure 2 to display incidence rates.

  • Next we’ll use the gganimate package to create an animated gif showing the change in incidence rates in the US over the course of the pandemic.

  • We’ll create a date sidebar to show the progression of time throughout the animation.

  • Lastly, we’ll use the magick package to append the two animated gifs together into a single gif.

Figure 5. Seven-day average incidence rate (confirmed cases) per 100,000 individuals for each of the lower 48 states between January 22 and October 09

Now that we have the Rmd file set up, all we have to do is knit the document and our code will fetch new results, generate new figures and tables, and put them all together in a new document. You can check out the repository to review the code here. If you do, you’ll also see the same report knitted as a markdown file. All it takes to knit an Rmd file into a different format is the modification of a few lines in the yaml header - usually. There are some format specific things you can do that won’t gracefully translate over to other formats.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published