-
Notifications
You must be signed in to change notification settings - Fork 0
/
01-chap1.Rmd
213 lines (138 loc) · 11.8 KB
/
01-chap1.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
# INTRODUCTION
Welcome to the R Markdown CSAS template. This template is based on (and in many places copied directly from) the huskydown R package, the thesisdown R package, and the CSAS LaTeX template. Hopefully it will provide a nicer interface for those that have never used TeX or LaTeX before. Using R Markdown will also allow you to easily keep track of your analyses in R chunks of code, with the resulting plots and output included as well. The hope is this R Markdown template gets you in the habit of doing reproducible research, which benefits you long-term as a researcher, but also will greatly help anyone that is trying to reproduce or build onto your results down the road.
Hopefully, you won't have much of a learning period to go through and you will reap the benefits of a nicely formatted CSAS Res Doc. The use of LaTeX in combination with Markdown is more consistent than the output of a word processor, much less prone to corruption or crashing, and the resulting file is smaller than a Word file. After working with Markdown and R together for a few weeks, we are confident this will be your reporting style of choice going forward.
<!-- If you're still on the fence about using R Markdown, check out the resource for newbies available at <https://ismayc.github.io/rbasics-book/>.-->
## WHY USE IT?
R Markdown creates a simple and straightforward way to interface with the beauty of LaTeX. Packages have been written in R to work directly with LaTeX to produce nicely formatting tables and paragraphs. In addition to creating a user friendly interface to LaTeX, R Markdown also allows you to read in your data, to analyze it and to visualize it using R functions, and also to provide the documentation and commentary on the results of your project. Further, it allows for R results to be passed inline to the commentary of your results. You'll see more on this later.
<!-- Having your code and commentary all together in one place has a plethora of benefits! -->
## WHO SHOULD USE IT?
Anyone who needs to use data analysis, math, tables, a lot of figures, complex cross-references, or who just cares about the final appearance of their document should use R Markdown. Of particular use should be anyone in the sciences, but the user-friendly nature of Markdown and its ability to keep track of and easily include figures, automatically generate a table of contents, index, references, table of figures, etc. should make it of great benefit to nearly anyone writing a CSAS Res Doc.
<!--
The {#rmd-basics} text after the chapter declaration will allow us to link throughout the document back to the beginning of Chapter 1. These labels will automatically be generated (if not specified) by changing the spaces to hyphens and capital letters to lowercase. Look for the reference to this label at the beginning of Chapter 2.
-->
# R MARKDOWN BASICS {#rmd-basics}
Here is a brief introduction into using _R Markdown_. _Markdown_ is a simple formatting syntax for authoring HTML, PDF, and MS Word documents. _R Markdown_ provides the flexibility of _Markdown_ with the implementation of **R** input and output. For more details on using _R Markdown_ see <http://rmarkdown.rstudio.com>.
Be careful with your spacing in _Markdown_ documents. While whitespace largely is ignored, it does at times give _Markdown_ signals as to how to proceed. As a habit, try to keep everything left aligned whenever possible, especially as you type a new paragraph. In other words, there is no need to indent basic text in the Rmd document (in fact, it might cause your text to do funny things if you do).
## LISTS
CSAS allows ordered lists:
1. Item 1
4. Item 2
and bulleted lists:
- Item 1
- Item 2
Notice in the ordered list that I intentionally mislabeled Item 2 as number 4. _Markdown_ automatically figures this out! You can put any numbers in the list and it will create the list. Check it out below.
To create sublists, just indent the values a bit (at least three spaces) and make sure there are no newlines between the items. (Here's one case where indentation is key!)
1. Item 1
1. Item 2
1. Item 3
a. Item 3a
a. Item 3b
i. Item 3bi
i. Item 3bii
i. Item 3biii
i. Item 3biv
i. Item 3bv
1. Item 3bv1
1. Item 3bv2
## LINE BREAKS
Make sure to add white space between lines if you'd like to start a new paragraph. Look at what happens below in the outputted document if you don't:
Here is the first sentence. Here is another sentence. Here is the last sentence to end the paragraph.
This should be a new paragraph.
*Now for the correct way:*
Here is the first sentence. Here is another sentence. Here is the last sentence to end the paragraph.
This should be a new paragraph.
## R CHUNKS
When you click the **Knit** button above a document will be generated that includes both content as well as the output of any embedded **R** code chunks within the document. You can embed an **R** code chunk like this (`cars` is a built-in **R** dataset):
```{r cars}
summary(cars)
```
## INLINE CODE
If you'd like to put the results of your analysis directly into your discussion, add inline code like this:
> The `cos` of $2 \pi$ is `r cos(2*pi)`.
Another example would be the direct calculation of the standard deviation:
> The standard deviation of `speed` in `cars` is `r sd(cars$speed)`.
One last neat feature is the use of the `ifelse` conditional statement which can be used to output text depending on the result of an **R** calculation:
> `r ifelse(sd(cars$speed) < 6, "The standard deviation is less than 6.", "The standard deviation is equal to or greater than 6.")`
Note the use of `>` here, which signifies a quotation environment that will be indented.
As you see with `$2 \pi$` above, mathematics can be added by surrounding the mathematical text with dollar signs. More examples of this are in [Mathematics and Science] if you uncomment the code in [Math].
## INCLUDING PLOTS
You can also embed plots. For example, here is a way to use the base **R** graphics package to produce a plot using the built-in `pressure` dataset:
```{r pressure, echo=FALSE, cache=TRUE}
plot(pressure)
```
Note that the `echo=FALSE` parameter was added to the code chunk to prevent printing of the **R** code that generated the plot. There are plenty of other ways to add chunk options. More information is available at <http://yihui.name/knitr/options/>.
Another useful chunk option is the setting of `cache=TRUE` as you see here. If document rendering becomes time consuming due to long computations or plots that are expensive to generate you can use knitr caching to improve performance. Later in this file, you'll see a way to reference plots created in **R** or external figures.
## LOADING AND EXPLORING DATA
Included in this template is a file called `flights.csv`. This file includes a subset of the larger dataset of information about all flights that departed from Seattle and Portland in 2014. More information about this dataset and its **R** package is available at <http://github.com/ismayc/pnwflights14>. This subset includes only Portland flights and only rows that were complete with no missing values. Merges were also done with the `airports` and `airlines` data sets in the `pnwflights14` package to get more descriptive airport and airline names.
\
We can load in this data set using the following command (`read_csv()` is from the package `readr`):
```{r load_data, echo=TRUE}
flights <- read_csv(file.path("data", "flights.csv"))
```
\
*Before we continue, note that to insert extra blank lines in text as we did above and below, we need to have a line with a backslash followed by two spaces. Look carefully at those lines in the code editor.*
\
All your tabular data should be in `.csv` files and placed in the `data` directory. The function `here::here()` should always be used when loading data into the project because it will prepend the full root directory of the project onto the directories and files inside the call. This gives consistent file structure access across multiple machines and platforms.
The data is now stored in the data frame called `flights` in **R**. To get a better feel for the variables included in this dataset we can use a variety of functions. Here we can see the dimensions (rows by columns) and also the names of the columns.
```{r str}
dim(flights)
names(flights)
```
\
Another good idea is to take a look at the dataset in table form. With this dataset having more than 50,000 rows, we won't explicitly show the results of the command here. I recommend you enter the command into the Console **_after_** you have run the **R** chunks above to load the data into **R**. To do that, you can use the command `View(flights)`.
A simple and effective way to see the entire structure in one call is to use the `glimpse()` command like this:
```{r glimpse_flights, echo = TRUE}
glimpse(flights)
```
```{r view_flights, eval=FALSE}
View(flights)
```
While not required, it is highly recommended you use the `dplyr` package to manipulate and summarize your data set as needed. It uses a syntax that is easy to understand using chaining operations. Below I've created a few examples of using `dplyr` to get information about the Portland flights in 2014. You will also see the use of the `ggplot2` package, which produces beautiful, high-quality academic visuals.
We begin by checking to ensure that needed packages are installed and then we load them into our current working environment:
```{r load_pkgs, message=FALSE, echo=TRUE}
# List of packages required for this analysis
pkg <- c("dplyr", "ggplot2", "knitr", "bookdown", "devtools")
# Check if packages are not installed and assign the
# names of the packages not installed to the variable new.pkg
new.pkg <- pkg[!(pkg %in% installed.packages())]
# If there are any packages in the list that aren't installed,
# install them
if (length(new.pkg)) {
install.packages(new.pkg, repos = "http://cran.rstudio.com")
}
library(csasdown)
```
\clearpage
The example we show here does the following:
- Selects only the `carrier_name` and `arr_delay` from the `flights` dataset and then assigns this subset to a new variable called `flights2`.
- Using `flights2`, we determine the largest arrival delay for each of the carriers.
```{r max_delays}
library(dplyr)
flights2 <- flights %>%
select(carrier_name, arr_delay)
max_delays <- flights2 %>%
group_by(carrier_name) %>%
summarize(max_arr_delay = max(arr_delay, na.rm = TRUE))
```
A useful function in the `knitr` package for making nice tables in _R Markdown_ is called `kable`. It is much easier to use than manually entering values into a table by copying and pasting values into Excel or LaTeX. This again goes to show how nice reproducible documents can be! (Note the use of `results="asis"`, which will produce the table instead of the code to create the table.) The `caption.short` argument is used to include a shorter title to appear in the List of Tables.
Within CSAS you should use the function `csasdown::csas_table()` instead. It just calls the function `kable()`, but it adds some default argument values that will make the tables render correctly in both LaTeX and in Word documents.
```{r maxdelays, results="asis"}
library(knitr)
csas_table(max_delays,
col.names = c("Airline", "Max Arrival Delay"),
caption = "Maximum Delays by Airline"
)
```
The last two options make the table a little easier-to-read.
Let's make a plot of the data:
```{r march3plot, fig.height=3, fig.width=6, fig.cap="My caption."}
library(ggplot2)
flights %>%
ggplot(aes(x = dep_time, y = arr_delay)) +
geom_point()
```
## ADDITIONAL RESOURCES
- _Markdown_ Cheatsheet - <https://github.com/adam-p/markdown-here/wiki/Markdown-Cheatsheet>
- _R Markdown_ Reference Guide - <https://www.rstudio.com/wp-content/uploads/2015/03/rmarkdown-reference.pdf>
- Introduction to `dplyr` - <https://cran.rstudio.com/web/packages/dplyr/vignettes/introduction.html>
- `ggplot2` Documentation - <http://docs.ggplot2.org/current/>