-
Notifications
You must be signed in to change notification settings - Fork 0
/
README.Rmd
167 lines (118 loc) · 7.29 KB
/
README.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
---
title: "BIFX552 Midterm Exam"
author: "Randy Johnson"
date: "10/12/2017"
output: word_document
---
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
library(transpose)
library(tidyverse)
library(cowplot)
cbbPalette <- c("#000000", "#E69F00", "#56B4E9", "#009E73", "#F0E442", "#0072B2", "#D55E00", "#CC79A7")
```
# Instructions
This is the same package we put together as a class in early September, with a few additions. The package currently resides on the server at */home/johnson/transpose*. Clone the repository to your local machine and modify as instructed below.
For specifics on how this exam will be graded, see the rubric below.
The exam consists of three tasks worth 50 points each. Check your progress often using the `R CMD CHECK transpose` command, and commit chnages when appropriate (note: some things will be picked up by `R CMD CHECK` even though they are excluded in *.Rbuildignore* - these warnings are OK to ignore). You won't need to push any commits to the source repository, since you don't have write permissions to my directory. When you are finished, use `scp` to upload your modified git repository to */home/johnson/midterm/transpose_username*, where *username* is your username. If you would prefer, I can verify that the upload was successful, since you don't have permissions to view what is in the *midterm* directory.
## transpose_vector() (50 pts)
Your first task is to create a new function called `transpose_vector()`. This should take the same input and return the same output as `transpose_loop()`, but use vectorization instead of a nested `for` loop.
Remember:
* Your R package should contain both functions and all appropriate documentation and NAMESPACE entries.
* Make sure your function works before commiting.
## Testing (50 pts)
The second task is to add some testing checks to the package to help ensure that users get reasonable errors when they misuse your functions and to make future changes to your package less buggy.
### User checks
Our code currently assumes that `x` is a matrix, but there is no validation of this assumption. Add a function to the package called `validate_x` that checks if the user submitted a matrix. This should be called at the beginning of both `transpose_loop` and `transpose_vector`.
`validate_x`
* Input: a matrix, `x`
* Use `class(x)` to verify that `x` is a matrix.
* If `class(x)` is not a matrix, use `stop()` to throw an error with a meaningful error message.
* Output: none
Once you complete this function move to the package checks (don't forget to commit this change to the repository).
### Package checks
When running `R CMD CHECK transpose`, the *testing/testthat.R* script checks that everything is in order. There are currently two checks in the *testing/testthat* directory:
* Verify that `transpose_loop(x)` works when `x` is a matrix (done).
* Verify that an error is thrown when no input is given to `transpose_loop()` (done).
Add a new testthat file with two more to checks:
* Verify that `transpose_vector(x)` works when `x` is a matrix.
* Verify that an error is thrown when `x` is a `data.frame`.
## Documentation (50 pts)
Finish filling in the next section on how to use this package. Make your edits in `README.Rmd` and knit the file to create an updated Word document.
# Using this package
This package was developed by the BIFX 552 class at Hood college to practice R package building and to demonstrate the potential code speedup when using comparing nested loops, vectorized code, and compiled code.
## Installation
To install this package, you currently need to navigate to the directory containing either the source code or the compiled R package and install using `R CMD INSTALL`. You can download the package from */home/johnson/transpose* on the BIFX server at Hood college.
## Comparison of transpose with base R
```{r, echo=FALSE}
# create a list of various sized matricies
sizes <- c(100000, 500000, 1000000, 1500000, 2000000)
x <- list(matrix(1:sizes[1], ncol = 10),
matrix(1:sizes[2], ncol = 10),
matrix(1:sizes[3], ncol = 10),
matrix(1:sizes[4], ncol = 10),
matrix(1:sizes[5], ncol = 10))
# this will make for a nicer figure below
sizeInMillions <- sizes / 1e6
# time how long they each take to run using each function
loopTimes <- sapply(x, function(x) system.time(transpose_loop(x))[3])
vectorTimes <- sapply(x, function(x) system.time(transpose_vector(x))[3])
tTimes <- sapply(x, function(x) system.time(t(x))[3])
# data_frame for displaying times below
times <- data_frame(sizeInMillions = rep(sizeInMillions, 3),
seconds = c(loopTimes, vectorTimes, tTimes),
label = rep(c('for', 'sapply', 't'), each = length(sizes)))
```
R is optimized for vector operations. For loops, especially nested for loops, are known for being inefficient. To demonstrate this inefficiency, we have created this package with two functions replecating the functionality of the transpose function, `t()`:
* `transpose_loop()` transposes a matrix using a nested for loop.
* `transpose_vector()` uses vector operations to transpose the matrix.
```{r, echo = FALSE}
# comparig fold change between transpose_loop and transpose_vector
for_sapply <- filter(times, label == 'for')$seconds /
filter(times, label == 'sapply')$seconds
for_sapply <- mean(for_sapply) %>% round(digits = 1)
# comparing fold change between transpose_loop and t
for_t <- filter(times, label == 'for')$seconds /
filter(times, label == 't')$seconds
for_t <- mean(for_t[-1]) %>% round(digits = 1)
# comparing fold change between transpose_vector and t
sapply_t <- filter(times, label == 'sapply')$seconds /
filter(times, label == 't')$seconds
sapply_t <- mean(sapply_t[-1]) %>% round(digits = 1)
```
The average speedup across the 5 sizes tested in this example is:
| Method | Speedup over `for` | Speedup over `sapply` |
|---------:|:------------------:|:---------------------:|
| `sapply` | ```r for_sapply``` | |
| `t` | ```r for_t``` | ```r sapply_t``` |
This figure shows the observed compute times for each method as the matrix size increases by 500,000 elements.
```{r, echo = FALSE}
ggplot(times, aes(sizeInMillions, seconds, color = label)) +
geom_point() +
scale_color_manual(values = cbbPalette)
```
# Grading
This exam will be graded on a curve as this is the first time this exam has been administered. I anticipate that there will be about 2 A's and 3 B's, but if you all score similarly, I'm happy to give everyone A's. You may use any resources you have available to you, but you are expected to complete the exam on your own.
## Rubric
| Tasks | Skill level | Points |
|:------|:------------|------:|
| Code | Appropriate comments | 10% |
| | Appropriate indentation | 10% |
| | Use of white space | 6% |
| | Appropriate vectorization | 10% |
| | File header | 4% |
| | Runtime errors in code | -16% |
| | Task incomplete | -20% |
| | | **40%** |
| Git repository | Proper structure | 10% |
| | Commit message formatting | 6% |
| | Commit only complete work | 6% |
| | Clean repository history | 4% |
| | Commit only related changes | 4% |
| | Small commits | 4% |
| | Appropriate use of *.gitignore* | 6% |
| | | **40%** |
| R Package | Installs without errors or warnings | 10% |
| | Documentation complete | 6% |
| | Appropriate NAMESPACE entries | 4% |
| | | **20%** |