-
Notifications
You must be signed in to change notification settings - Fork 1
/
13-rGraphics.Rmd
executable file
·502 lines (376 loc) · 24 KB
/
13-rGraphics.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
# Graphics {#graphics}
One of the most powerful tools we have in Statistics and Data Science is graphics. This includes images/pictures, graphs/plots, and tables. You will want to make sure that all graphical elements are appropriately sized in the Body. If there is text in a static image/picture, you'll need to make sure that the text is legible on a variety of screen sizes.
We've already discussed both issues of color and text size in plots. For additional considerations, please refer to the following readings (ordered from most important to least):
- [Tufte-Fundamental Principles of Analytical Design](https://www.dropbox.com/s/hb52991v09p8q91/Tufte%20-%202006%20-%20The%20Fundamental%20principles%20of%20analytical%20design.pdf?dl=0)
- [Tufte-Chartjunk](https://www.dropbox.com/s/z8yrf4eqph6c2h4/Tufte%20-%202001%20-%20Chartjunk%20Vibrations%2C%20grids%2C%20and%20ducks.pdf?dl=0)
- [Kosslyn-Looking with the Eye and Mind](https://www.dropbox.com/s/62uegsribwdjtze/Kosslyn%20-%202006%20-%20Looking%20with%20the%20eye%20and%20mind.pdf?dl=0)
Remember, we always want to be modeling excellent graphing behaviors.
> All photographs can be fortified with words. --Dorothea Lange
> A picture is worth a thousand words...but which ones? --Unknown
Both of these quotations highlight that you need to include some text with your plots to help the user construct their understanding of what you're trying to show them.
## Titles and Labels {#graphLab}
Graph and Table titles should follow Title Case. Capitalize each word unless the word is "small" (e.g., of, an, etc.). The [Title Case website](https://titlecase.com/) can help you if you aren't sure. (This website also does other types of case such as camel case.)
Axis and Legend Labels will follow Sentence Case. That is to say only the first word of each axis label and the legend will be capitalized. The remaining words will be lower case with two exceptions (proper nouns and unit abbreviations).
When dealing with quantities on axes, you need to include the unit of measurement in the label. Typically, the unit will be placed in parentheses at the end of the label and use appropriate abbreviations. For example, "Height (in)", "Air pressure (mmHg)", "Resting heart rate (bpm)", etc. If the unit of measure is for a count, change the label to be along the followings lines: "Number of siblings", "Number of bikes owned", etc.
Labels should be informative without getting in the way of reading the graph.
## Font Sizes
A common area for adjustment in R graphics is that font size. Many times, we might need to adjust the size of the text used in a plot so that the axis labels and titles are easier to read. There are a couple of different approaches you can use to effect this: change the base font size or change the font size for particular elements.
The easiest approach is to change the base font size and then let `ggplot` do the necessary alterations. To go this route, you will need to do the following:
```{r fontSize1, echo=TRUE, eval=FALSE}
# Letting examplePlot be an already created ggplot object
examplePlot +
theme(
text = element_text(size = 18)
)
```
The above code should set the default font size for the entire plot to be 18 pt will potentially scale from there.
If instead you wish to be a bit more precise with the font size for different elements, you would need to something like the following:
```{r fontSize2, echo=TRUE, eval=FALSE}
# Letting examplePlot be an already created ggplot object
examplePlot +
theme(
axis.title = element_text(size = 18),
axis.text.x = element_text(size = 14),
legend.title = element_text(size = 18),
legend.text = element_text(size = 14),
plot.title = element_text(size = 24)
)
```
In this example code, we're instructing R to use size 18 font for the labels on both axes and the legend title. We're also using size 14 font for the text that occurs in the legend as well as along the horizontal axis. (The vertical axis's text will be the default size.) The plot's title will be 24 point.
There are many different aspects of size that you can adjust with the `theme` function; these are just a few. To see the full list, look at the Help documentation for theme (`?ggplot2::theme`). For each of the elements, you will need to use `element_text` with a `size` argument to dictate what font size should be used.
You will need to play around with the font sizes to find one that will work well in a wide variety of situations. Keep in mind, that the font __will not dynamically scale__ when you stretch/shrink the window or when you move between using a computer and mobile device.
## Axes and Scales
The default axes for `R`'s base graphics are, well, absolutely terrible. The algorithms for axes in `ggplot2` while better than base `R` are still in need of improvement. The default axes often do not fully cover the data as well as having gaps between the axes and the data. All this impedes the user's construction of meaning. Thus, you'll want to take control and stipulate the axes and scales to optimize what users get out of the plot. If you are providing multiple plots that the user is supposed to compare, make sure that they all use the same scaling and axes.
To force `ggplot2` to place (0, 0) in the lower-left corner and to control the scales, you will need to include the following:
```{r axesControl1, echo=TRUE, eval=FALSE}
# Create the ggplot2 object
g1 <- ggplot2::ggplot(...)
# Add your layer
g1 + ggplot2::geom_point()
# Control axes and scale
## Multiplicative Scaling of the Horizontal (x) Axis
## Additive Scaling of the Vertical (y) Axis
g1 + ggplot2::scale_x_continuous(
expand = expansion(mult = c(1,2), add = 0)
) +
scale_y_continuous(
expand = expansion(mult = 0, add = c(0,0.05))
)
```
## Legend Sizes
If you find yourself needing to enlarge the size of the legend key, you can use the following argument with any `ggplot` object:
```{r changeLengend, echo=TRUE, eval=FALSE}
# Change the size of legend elements in ggplot
g1 <- ggplot2::ggplot(...)
g1 +
theme(
legend.key.size = unit(2, "cm")
)
```
The `unit` function takes two arguments: a numeric value that will be the measure and a unit of measure. In the above example, this will make legend examples 2 cm wide. You may need to experiment with this setting to find an optimal setting across a wide variety of screen sizes. That is, be sure to test your settings on both computers and mobile devices.
## Color and Plots in `R` {#colorPlots}
In `R` you can set color theme which you use in `ggplot2`. Additionally, the package `viridis` provides several additional color palettes which are improvements upon the default color scheme.
We have developed two custom palettes for you to use with BOAST apps. These palettes are the `boastPalette` and the `psuPalette` and are part of the `boastUtils` package. We have worked hard to ensure that these palettes are consistent with color blindness and web standards as well as consistent with our color themes.
```{r customPalettes, echo=TRUE, eval=FALSE}
# To call a color from the boastPalette use
boastUtils::boastPalette[1]
## Numbers go 1 through 9
# To call a color from the psuPalette use
boastUtils::psuPalette[1]
## Numbers go 1 through 8
# Both palettes get used in the order of what is listed.
```
```{r boastPalette, fig.cap="The Boast Palette", fig.align="center", fig.width=7, fig.height=4, echo=FALSE}
par(mar = c(0.25, 10, 0.75, 9.5))
boastUtils::showPalette(
palette = boastUtils::boastPalette
)
```
```{r psuPalette, fig.cap="The PSU Palette", fig.align='center', fig.width=7, fig.height=4, echo=FALSE}
par(mar = c(0.25, 10, 0.75, 9.5))
boastUtils::showPalette(
palette = boastUtils::psuPalette
)
```
To use these palettes (or ones from `viridis`) with a `ggplot2` object, you'll need to do the following
```{r examplePalette, echo=TRUE, eval=FALSE}
# Create ggplot2 object
g1 <- ggplot(
data = df,
mapping = aes(x = x, y = y, color = grp, fill = grp)
) +
geom_point() +
# Tell R to use your chosen palette
scale_color_manual( # If you use "color" in aes
values = boastUtils::boastPalette
) +
scale_fill_manual( # If you use "fill" in aes
values = boastUtils::boastPalette
)
# You can also call colors individually
ggplot(
data = data,
mapping = aes(x = x, y = y)
) +
geom_point(
fill = boastUtils::psuPalette[2]
)
```
If you have more groups than eight/nine colors listed in the two palettes, consider reworking your examples as you could overwhelm the user with too many colors. (This also applies to using different shapes to plot points.)
### Color and Accessibility
Color can be a great tool to help highlight different cases. While we have striven to create palettes which are friendly for color blindness, we can go beyond color to help all users. One key way to do this is to partner color with the shape of a points and/or the style of line used. This can be done easily in `ggplot` by using both the `color`/`fill` aesthetic as well as the `shape` and/or `linetype` aesthetics. For example,
```{r colorShapeLine, echo=TRUE, eval=FALSE}
ggplot(
data = exampleData,
mapping = aes(
x = height,
y = weight,
color = class, # color only stipulates the color for the border
fill = class, # fill will color the entire point
shape = class,
linetype = class
)
) +
geom_point(size = 1) +
geom_smooth(
formula = y ~ x,
method = "lm",
se = FALSE
) +
theme_bw() +
theme(
text = element_text(size = 18)
)
```
The above code will produce a scatter plot. The color and shape of the points will depend upon the value of `class` for each observation. Additionally, the plot will contain linear trend lines for each level of `class` that repeats the color used for `class` on the points and uses different styles of lines (solid, dashed, dotted, etc.)
As with color, shape and line type can become taxing quickly. Thus, if you find yourself reaching multiple shapes/types (say 6 or more), we should reconsider the plot.
#### An Exception to Line Types
Currently, there is an exception to using different line types in conjunction with color. This is for the notion of creating multiple paths (e.g., see [Law of Large Numbers app](https://psu-eberly.shinyapps.io/Law_of_Large_Numbers/)). Here, it is not as important that a viewer can distinguish between the individual curves. What does matter is whether the individual sees how the various curves converge or diverge. Thus, for path type plots, we will not worry about using line type in conjunction with color.
### Common Color Usage
Throughout BOAST, we often have certain re-occurring elements in plots. These include elements such as estimates, null values, population parameters ("true value"), etc. To build consistency across the many apps, we use the following standardized colors for these elements:
+ Confidence Intervals
- Containing the true value: `psuPalette[1]`
- NOT containing the true value: `psuPalette[2]`
+ Population Parameter/true value: `psuPalette[3]`
+ Observed Estimate: `psuPalette[4]`
+ Null Value (Frequentist) or comparison value (Bayesian): "black"
+ Likelihood: "blue"
## Transparency
In different situations, we may want to set the transparency of a graph object to be something other than opaque. We can do this with the `alpha` aesthetic. The values for `alpha` go from 0 (transparent) to 1 (opaque). You may need to play with several different values to find the optimal setting for your plot.
## Histograms
When dealing with histograms, especially when using `ggplot`, we need to think about what to use as the width of the bins (or alternatively, the number of bins). The `ggplot2` approach has a default of 30 bins but is also complains when you use this setting:
<div style = "color: red;">
>`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
</div>
To avoid this message from getting placed into your log as well as getting displayed to your users, you can either find an optimal number of bins (or bin width) OR you can invoke the Freedman-Diaconis Rule to set the value of `binwidth`:
```{r fdRule, echo=TRUE, eval=FALSE}
ggplot(
data = data,
mapping = aes(x = x)
) +
geom_histogram(
binwidth = function(x){ifelse(IQR(x) == 0, 0.1, 2 * IQR(x) / (length(x)^(1/3)))}
)
```
## Tables {#tables}
Data tables can pose a challenge for individuals to comprehend. Just as a wall of text isn't conducive to helping a person understand what's going on, neither is a wall of data values. Thus, we need to be extreme judicious (picky) about incorporating data tables into any of our apps.
In web development there are two main types of tables: layout tables and data tables.
+ Layout tables help control where different elements appear on the page.
+ We need an additional distinction for data tables:
- Summary Data Tables are tables that have summary information; typified by two-way tables (a.k.a. contingency tables or crosstabs) but might also include other things such as values of descriptive statistics stratified by groups.
- Data Sets are an entire data object, presented in tabular format
__Layout Tables should never be used in a BOAST App.__
Data Sets should be displayed __as sparingly as possible__. In order to include a Data Set display, you will need to have identified an explicit learning goal/objective that necessitates the user digging through a data frame. If you can't identify such a learning goal, you should NOT include a data frame.
If the goal is to allow the user to look through the data set OR to have access to the data, then __give a link__ to either the original source of the data (__preferred__) or for them to download the file.
Summary Data Tables can be used more often and can enrich the user's experience with your app. However, these must still be constructed in an appropriate manner.
Neither Data Sets nor Summary Data Tables should be inserted into your App as a picture. This is an big Accessibility violation. Use the directions below to create the appropriate type of data table.
### Displaying Data Tables
Your first step is to create a data frame object in your `R` code.
+ If you are displaying a data set (rare), then you will either need to read in the data or call that data frame. For this example, we'll be using the `mtcars` data frame that is part of `R`.
+ If you are making a Summary Data Table, you'll need to either use `R` to calculate the values and store in a data frame or create a data frame yourself.
In either case, be sure you identify what columns you're going to use. If your original data file has 50 columns, but your App only makes use of 5, drop the other 45. Only display the columns that you actually use.
Your next step is to decide on where to put this display (e.g., inside an Exploration Tab or as a separate page). This will help you identify where in your App's UI section you need to put the appropriate code.
To ensure that your data table is accessible and responsive (i.e., mobile friendly), you will need to use the `DT` package.
```{r installDT, echo=TRUE, eval=FALSE}
install.packages("DT")
# Be sure to include this in your library call
library(DT)
```
In your UI section, you'll need to use the following code, placed in the appropriate area:
```{r dtUICode1, echo=TRUE, eval=FALSE}
# [code omitted]
DT::DTOutput(outputId = "mtCars")
# [code omitted]
```
Then, in your Server section, you'll need to use the following code:
```{r dtServerCode1, echo=TRUE, eval=FALSE}
# [code omitted]
# Prepare your data set with only the columns needed
carData <- mtcars[,c("mpg", "cyl", "hp", "gear", "wt")]
## Use Short but Meaningful Column Names
names(carData) <- c("MPG", "# of Cylinders", "Horsepower", "# of Gears", "Weight")
# Create the output data table
# Be sure to use the same name as you did in the UI
output$mtCars <- DT::renderDT(
expr = carData,
caption = "Motor Trend US Data, 1973-1974 Models", # Add a caption to your table
style = "bootstrap4", # You must use this style
rownames = TRUE,
options = list( # You must use these options
responsive = TRUE, # allows the data table to be mobile friendly
scrollX = TRUE, # allows the user to scroll through a wide table
columnDefs = list( # These will set alignment of data values
# Notice the use of ncol on your data frame; leave the 1 as is.
list(className = 'dt-center', targets = 1:ncol(carData))
)
)
)
# [code omitted]
```
If you are making a Summary Data Table, you will need to follow the same process. If your data frame does not have row names, but instead a column with values acting as row names, you may replace the `rownames = TRUE` with `rownames = FALSE`; there should not be a column of sequential numbers on the left.
Column names __MUST__ be simple and *meaningful* to the user. To this end, you should rename any columns that might have poor choices for names, just as we have done with the `mtcars` data. This includes using Greek characters in isolation. You should not have any columns labeled \(\mu\) or \(\sigma\). Rather you need to use English words.
*Note: getting mathematical expressions to render properly in graphical environments in `R` is not as easy as in the paragraphs or headers of an app. Only certain graphing packages support limited mathematical expressions. The same is true for table generation packages.*
Again, try to use tables as infrequently as possible. Poorly constructed tables can create accessibility issues causing screen readers to poorly communicate tables to your users. If you run into problems and/or have questions, __talk to Neil and Bob__.
### Additional Table Examples
Neil built a [Data Table Examples app](https://rstudio-connect.tlt.psu.edu:3939/content/249) that you should reference when you're building data tables for display. (Note: you will need to connect to PSU's VPN in order to access this app.)
We're including some additional Summary Data Table examples. For these examples, I'm going to make use of the `palmerpenguins` package of data sets.
#### Summary Data Table of Descriptive Statistics
```{r dtSummary1Setup, echo=TRUE, eval=FALSE}
library(palmerpenguins)
library(psych)
library(DT)
library(tibble)
penStats <- psych::describeBy(
x = penguins$body_mass_g,
group = penguins$species,
mat = TRUE, # Formats output appropriate for DT
digits = 3 # sets the number of digits retained
)
# Picking which columns to keep
penStats <- penStats[, c("group1", "n", "mean", "sd", "median", "mad", "min", "max", "skew",
"kurtosis")]
# Make the group1 column the row names
penStats <- tibble::remove_rownames(penStats)
penStats <- tibble::column_to_rownames(penStats,
var = "group1")
# Improve column names
names(penStats) <- c("Count", "SAM (g/penguin)", "SASD (g)", "Median (g)", "MAD (g)", "Min (g)",
"Max (g)", "Sample Skewness (g^3)", "Sample Excess Kurtosis (g^4)")
```
```{r dtSummary1Shiny, echo=TRUE, eval=FALSE}
# Make the Table
output$penguinSummary <- DT::renderDT(
expr = penStats,
caption = "Descriptive Stats for Palmer Penguins",
style = "bootstrap4",
rownames = TRUE,
autoHideNavigation = TRUE,
options = list(
responsive = TRUE,
scrollX = TRUE,
paging = FALSE, # Set to False for small tables
columnDefs = list(
list(className = 'dt-center',
targets = 1:ncol(penStats))
)
)
)
```
```{r dtSummary1Target, echo=FALSE, eval=FALSE}
DT::datatable(
data = penStats,
caption = "Descriptive Stats for Palmer Penguins",
style = "bootstrap4",
rownames = TRUE,
autoHideNavigation = TRUE,
options = list(
responsive = TRUE,
scrollX = TRUE,
paging = FALSE,
columnDefs = list(
list(className = 'dt-center',
targets = 1:ncol(penStats))
)
)
)
```
#### Summary Data Table for Output Table
While this example is for an ANOVA table, you can build from this for other output tables. If you store the output of any call as an object, you can then use the structure function, `str` to investigate the output. Ultimately, you need something that is either a matrix or a data frame.
```{r dtSummary2Setup, echo=TRUE, eval=FALSE}
library(palmerpenguins)
library(psych)
library(DT)
library(tibble)
library(rstatix)
# This bad practice but I'm going to pretend that all assumptions are met
penModel <- aov(body_mass_g ~ species*sex, data = penguins)
anovaPen <- round(anova(penModel), 3)
# Rounding to truncate decimals
```
```{r dtSummary2Shiny, echo=TRUE, eval=FALSE}
# Make the Table
output$penguinAnova <- DT::renderDT(
expr = anovaPen,
caption = "(Classical) ANOVA Table for Palmer Pengins",
style = "bootstrap4",
rownames = TRUE,
options = list(
responsive = TRUE,
scrollX = TRUE,
paging = FALSE, # Set to False for small tables
columnDefs = list(
list(className = 'dt-center',
targets = 1:ncol(anovaPen))
)
)
)
```
```{r dtSummary2Target, echo=FALSE, eval=FALSE}
DT::datatable(
data = anovaPen,
caption = "(Classical) ANOVA Table for Palmer Penguins",
style = "bootstrap4",
rownames = TRUE,
options = list(
responsive = TRUE,
scrollX = TRUE,
paging = FALSE,
columnDefs = list(
list(className = 'dt-center',
targets = 1:ncol(anovaPen))
)
)
)
```
## Alt Text, Again {#altText2}
Any graphical element you include in your App __MUST__ have an alternative (assistive) text description ("alt text"). This provides a short description of what is in the image or plot for users who are visual impaired. (Tables, when properly formatted will handle this automatically.)
Here are several resources worth checking out:
+ [WebAIM Alternative Text Guide](https://webaim.org/techniques/alttext/#basics)
+ [Penn State's Image ALT Text Page](https://accessibility.psu.edu/images/alttext/)
+ [W3C's ALT Text Decision Tree](https://www.w3.org/WAI/tutorials/images/decision-tree/)
### Adding Alt Text Graphs--`alt` Argument
With the release of `Shiny` version 1.6.0, the `renderPlot` function got a vital upgrade: there is now an `alt` argument that will allow you put alt text on your plots. The `alt` argument functions just like the matching argument in the `tags$img` call for static images.
__Use this approach as your first choice method.__ The `alt` argument of `renderPlot` can be dynamically updated using user inputs, if you use `renderPlot` inside a reactive environment such as `observeEvent`.
Generally speaking, keep your alt text to __no more than 140 characters in length__. Be critical with your word choice and don't use waste words. For example, rather than setting alt text to "a picture of Neil Hatfield wearing a shirt and time", use "headshot of Neil Hatfield". Example of waste words include phrases such as "a plot of", "a histogram of", "a pie chart of", etc. Focus your alt text on describing the vital aspects of the plot.
### Alt Text via ARIA
If a graph is particularly complicated and/or you need more than 140 characters, you might want to consider using [Accessible Rich Internet Applications (ARIA)](https://developer.mozilla.org/en-US/docs/Web/Accessibility/ARIA) to assist us in writing some labels that will stand in place of formal alt text. To do this, you will need to make use of the following code:
```{r plotsAltTextc, echo=TRUE, eval=FALSE}
# [code omitted]
# In the UI section, in the appropriate tabItem
plotOutput(outputId = plotID) # Look for lines like this
# Code for adding the aria label
tags$script(HTML(
"$(document).ready(function() {
document.getElementById('plotId').setAttribute('aria-label',
`General description of the plot`)
})"
))
# [code omitted]
```
Important things to note:
1. Place the `tags$script(HTML(...))` code right after each instance of `plotOutput`.
2. Copy the above code as formatted
3. Change the two (2) pieces for each particular plot
a. Replace `plotId` (keep the single quotation marks in the code)
b. Replace `General description of the plot` (keep the single quotation marks in the code)
ARIA labels can be used in conjunction with alt text. Additionally, if you have a complex plot, you might want to type up a description of the plot that you'll place in your app for all users to see. We can use ARIA commands to direct screen readers to connect the plots to any paragraph elements as well. Keep in mind that thinking about accessibility improves your app for *all* users.