forked from lgatto/2016-02-25-adv-programming-EMBL
-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy path03-debug.Rmd
845 lines (643 loc) · 19.4 KB
/
03-debug.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
---
title: "Part III: Debugging"
author: "Laurent Gatto"
---
# Overview
- Defensive programming
- Debbugging: techniques and tools
- Condition handling
- Testing
# Defensive programming
Before we begin with debugging, let's look at ways to prevent bugs
(more at the end of this part).
*Defensive programming*:
- making the code work in a predicable manner
- writing code that fails in a well-defined manner
- if something *weird* happens, either properly deal with it, of fail
quickly and loudly
The level of defensiveness will depend whether you write a function
for interactive of programmatic usage.
## Talk to your users
### Diagnostic messages
```{r, eval=FALSE}
message("This is a message for our dear users.")
```
```{r, eval=FALSE}
message("This is a message for our dear users. ",
paste("Thank you for using our software",
sw, "version", packageVersion(sw)))
```
Do not use `print` or `cat`:
```{r, eval=FALSE}
f1 <- function() {
cat("I AM LOUD AND YOU CAN'T HELP IT.\n")
## do stuff
invisible(TRUE)
}
f1()
```
```{r, eval=FALSE}
f2 <- function() {
message("Sorry to interup, but...")
## do stuff
invisible(TRUE)
}
f2()
suppressMessages(f2())
```
Of course, it is also possible to manually define verbosity. This
makes you write more code for a feature readily available. But still
better to use `message`.
```{r, eval=FALSE}
f3 <- function(verbose = TRUE) {
if (verbose)
message("I am being verbose because you let me.")
## do stuff
invisible(TRUE)
}
f3()
f3(verbose = FALSE)
```
### Warning
> There is a problem with warnings. No one reads them. Pat Burns, in
> *R inferno*.
```{r, eval=FALSE}
warning("Do not ignore me. Somthing bad might have happened.")
warning("Do not ignore me. Somthing bad might be happening.", immediate. = TRUE)
```
```{r, eval=FALSE}
f <- function(...)
warning("Attention, attention, ...!", ...)
f()
f(call. = FALSE)
```
Print warnings after they have been thrown.
```{r, eval=FALSE}
warnings()
last.warning
```
See also to `warn` option in `?options` .
```{r, eval=FALSE}
option("warn")
```
### Error
```{r, eval=FALSE}
stop("This is the end, my friend.")
```
```{r, eval=FALSE}
log(c(2, 1, 0, -1, 2)); print('end') ## warning
xor(c(TRUE, FALSE)); print ('end') ## error
```
Stop also has a `call.` parameter.
```{r, eval=FALSE}
geterrmessage()
```
### Logging
See for example the [`log4r`](https://github.com/johnmyleswhite/log4r)
package:
```{r, eval=FALSE}
## Import the log4r package.
library('log4r')
## Create a new logger object with create.logger().
logger <- create.logger()
## Set the logger's file output: currently only allows flat files.
logfile(logger) <- file.path('base.log')
## Set the current level of the logger.
level(logger) <- "INFO"
## Try logging messages at different priority levels.
debug(logger, 'A Debugging Message') ## Won't print anything
info(logger, 'An Info Message')
warn(logger, 'A Warning Message')
error(logger, 'An Error Message')
fatal(logger, 'A Fatal Error Message')
```
### Progress bars
- `utils::txtProgressBar` function
```{r, eval=FALSE}
n <- 10
pb <- txtProgressBar(min = 0, max = n, style = 3)
for (i in 1:n) {
setTxtProgressBar(pb, i)
Sys.sleep(0.5)
}
close(pb)
```
- [`progress`](https://github.com/gaborcsardi/progress) package
```{r, eval=FALSE}
library("progress")
pb <- progress_bar$new(total = n)
for (i in 1:n) {
pb$tick()
Sys.sleep(0.5)
}
```
Tip: do not over use progress bars. Ideally, a user should be
confident that everything is under control and progress is made while
waiting for a function to return. In my experience, a progress bar is
usefull when there is a specific and/or user-defined number of
iterations, such a *iterating over n files*, or *running a simulation
n times*.
**Question**: What about mixing progress bars and verbosity.
## KISS
Keep your functions simple and stupid (and short).
## Failing fast and well
> Bounds errors are ugly, nasty things that should be stamped out
> whenever possible. One solution to this problem is to use the
> `assert` statement. The `assert` statement tells C++, "This can
> never happen, but if it does, abort the program in a nice way." One
> thing you find out as you gain programming experience is that things
> that can "never happen" happen with alarming frequency. So just to
> make sure that things work as they are supposed to, it’s a good idea
> to put lots of self checks in your program. -- Practical C++
> Programming, Steve Oualline, O'Reilly.
```{r, eval=FALSE}
if (!condition) stop(...)
```
```{r, eval=FALSE}
stopifnot(TRUE)
stopifnot(TRUE, FALSE)
```
For example to test input classes, lengths, ...
```{r, eval=FALSE}
f <- function(x) {
stopifnot(is.numeric(x), length(x) == 1)
invisible(TRUE)
}
f(1)
f("1")
f(1:2)
f(letters)
```
The [`assertthat`](https://github.com/hadley/assertthat) package:
```{r, eval=FALSE}
x <- "1"
library("assertthat")
stopifnot(is.numeric(x))
assert_that(is.numeric(x))
assert_that(length(x) == 2)
```
* `assert_that()` signal an error.
* `see_if()` returns a logical value, with the error message as an attribute.
* `validate_that()` returns `TRUE` on success, otherwise returns the error as
a string.
* `is.flag(x)`: is x `TRUE` or `FALSE`? (a boolean flag)
* `is.string(x)`: is x a length 1 character vector?
* `has_name(x, nm)`, `x %has_name% nm`: does `x` have component `nm`?
* `has_attr(x, attr)`, `x %has_attr% attr`: does `x` have attribute `attr`?
* `is.count(x)`: is x a single positive integer?
* `are_equal(x, y)`: are `x` and `y` equal?
* `not_empty(x)`: are all dimensions of `x` greater than 0?
* `noNA(x)`: is `x` free from missing values?
* `is.dir(path)`: is `path` a directory?
* `is.writeable(path)`/`is.readable(path)`: is `path` writeable/readable?
* `has_extension(path, extension)`: does `file` have given `extension`?
## Consistency and predictability
Reminder of the interactive use vs programming examples:
- `[` and `drop`
- `sapply`, `lapply`, `vapply`
Remember also the concept of *tidy data*.
## Comparisons
### Floating point issues to be aware of
R FAQ [7.31](http://cran.r-project.org/doc/FAQ/R-FAQ.html#Why-doesn_0027t-R-think-these-numbers-are-equal_003f)?
```{r, eval=FALSE}
a <- sqrt(2)
a * a == 2
a * a - 2
```
```{r, eval=FALSE}
1L + 2L == 3L
1.0 + 2.0 == 3.0
0.1 + 0.2 == 0.3
```
### Floating point: how to compare
- `all.equal` compares R objects for *near equality*. Takes into
account whether object attributes and names ought the taken into
consideration (`check.attributes` and `check.names` parameters) and
tolerance, which is machine dependent.
```{r, eval=FALSE}
all.equal(0.1 + 0.2, 0.3)
all.equal(0.1 + 0.2, 3.0)
isTRUE(all.equal(0.1 + 0.2, 3)) ## when you just want TRUE/FALSE
```
### Exact identity
`identical`: test objects for exact equality
```{r, eval=FALSE}
1 == NULL
all.equal(1, NULL)
identical(1, NULL)
identical(1, 1.) ## TRUE in R (both are stored as doubles)
all.equal(1, 1L)
identical(1, 1L) ## stored as different types
```
Appropriate within `if`, `while` condition statements. (not
`all.equal`, unless wrapped in `isTRUE`).
## Exercise
(From [adv-r](http://adv-r.had.co.nz/Exceptions-Debugging.html#defensive-programming).)
The `col_means` function computes the means of all numeric columns in
a data frame.
```{r, eval=FALSE}
col_means <- function(df) {
numeric <- sapply(df, is.numeric)
numeric_cols <- df[, numeric]
data.frame(lapply(numeric_cols, mean))
}
```
Is it a robust function? What happens if there are unusual inputs.
```{r, eval=FALSE}
col_means(mtcars)
col_means(mtcars[, 0])
col_means(mtcars[0, ])
col_means(mtcars[, "mpg", drop = FALSE])
col_means(1:10)
col_means(as.matrix(mtcars))
col_means(as.list(mtcars))
mtcars2 <- mtcars
mtcars2[-1] <- lapply(mtcars2[-1], as.character)
col_means(mtcars2)
```
# Debugging: techniques and tools
### Shit happens
> Funding your bug is a process of confirming the many things that you
> believe are true - until you find one which is not true. -- Norm Matloff
#### 1. Identify the bug (the difficult part)
- Something went wrong!
- Where in the code does it happen?
- Does it happen every time?
- What input triggered it?
- Report it (even if it is in your code - use github issues, for
example).
**Tip**: Beware of your intuition. As a scientist, do what you are
used to: generate a hypotheses, *design an experiment* to test them,
and record the results.
#### 2. Fix it (the less difficult part)
- Correct the bug.
- Make sure that bug will not repeat itself!
- How can we be confident that we haven't introduced new bugs?
## Tools
- `print`/`cat`
- `traceback()`
- `browser()`
- `options(error = )`, `options(warn = )`
- `trace`
- IDE: RStudio, StatET, emacs' ess tracebug.
### Manually
Inserting `print` and `cat` statements in the code. Works, but time
consuming.
### Finding the bug
> Many bugs are subtle and hard to find. -- Hadley Wickham
Bugs are shy, and are generally hidden, deep down in your code, to
make it as difficult as possible for you to find them.
```{r, echo=TRUE}
e <- function(i) {
x <- 1:4
if (i < 5) x[1:2]
else x[-1:2]
}
f <- function() sapply(1:10, e)
g <- function() f()
```
`traceback`: lists the sequence of calls that lead to the error
```{r, eval=FALSE}
g()
traceback()
```
If the source code is available (for example for `source()`d code),
then traceback will display the exact location in the function, in the
form `filename.R#linenum`.
### Browsing the error
- Register the function for debugging: `debug(g)`. This adds a call to
the `browser()` function (see also below) and the very beginning of
the function `g`.
- Every call to `g()` will not be run interactively.
- To finish debugging: `undebug(g)`.
```{r, eval=FALSE}
debug(g)
g()
```
How to debug:
- `n` executes the next step of the function. Use `print(n)` or
`get(n)` to print/access the variable `n`.
- `s` to step into the next function. If it is not a function, same as
`n`.
- `f` to finish execution of the current loop of function.
- `c` to leave interactive debugging and continue regular execution of
the function.
- `Q` to stop debugging, terminate the function and return to the
global workspace.
- `where` print a stack trace of all active function calls.
- `Enter` same as `n` (or `s`, if it was used most recently), unless
`options(browserNLdisabled = TRUE)` is set.
To fix a function when the source code is not directly available, use
`fix(fun)`. This will open the function's source code for editing and,
after saving and closing, store the updated function in the global
workspace.
### Breakpoints
- Add a call to `browser()` anywhere in the source code to execute the
rest of the code interactively.
- To run breakpoints conditionally, wrap the call to `browser()` in a
condition.
### Setting options
- `options(error = recover)`: display the call stack and choose where
to step in.
- `options(error = stop)` or `options(error = NULL)`: reset to default
behaviour.
- `options(warn = 'numeric')`: sets the handling of warning
messages. If `warn` is negative all warnings are ignored. If `warn`
is zero (the default) warnings are stored until the top-level
function returns. If 10 or fewer warnings were signalled they will
be printed otherwise a message saying how many were signalled. An
object called `last.warning` is created and can be printed through
the function `warnings`. If `warn` is one, warnings are printed as
they occur. If `warn` is two or larger all warnings are turned into
errors.
- `options(error = dump.frames)`: like `recover` but for
non-interactive use. Will create a `last.dump.rda` file in the
current working directory, which can then be reloaded in an
interactive session to re-inter interactive debugging (using
`debugger()`).
### Debugging with IDEs
- RSudio: `Show Traceback`, `Rerun with Debug` and interactive debugging.
![RStudio debugging 1](./figs/debugRStudio1.png)
![RStudio debugging 2](./figs/debugRStudio2.png)
- StatET (Eclipse plugin)
- [emacs ESS and tracebug](http://ess.r-project.org/Manual/ess.html#Developing-with-ESS)
### Exercise
1. Your turn - play with `traceback`, `recover` and `debug`:
(Example originally by Martin Morgan and Robert Gentleman.)
```{r, echo=TRUE}
e <- function(i) {
x <- 1:4
if (i < 5) x[1:2]
else x[-1:2] # oops! x[-(1:2)]
}
f <- function() sapply(1:10, e)
g <- function() f()
```
2. Fix `readFasta2`.
```{r, eval=FALSE}
## make sure you have the 'sequences' package.
## Get readFasta2, the function to debug
library(devtools)
install_github("lgatto/sequences") ## from github
## or
install.packages("sequences") ## from CRAN
library("sequences")
sequences:::debugme()
## Get an example file
f <- dir(system.file("extdata", package = "sequences"),
full.names=TRUE, pattern = "moreDnaSeqs.fasta")
## BANG!
readFasta2(f)
```
## Condition handling
### `try` and `tryCatch`
The function `f` will never terminate.
```{r, eval=FALSE}
f <- function() {
x <- "1"
log(x)
message("x was the ", class(x), " ", x)
}
f()
```
Use `try` to proceed with the execution even when an error occurs.
```{r, eval=FALSE}
f <- function() {
x <- "1"
try(log(x))
message("x was the ", class(x), " ", x)
}
f()
```
```{r, eval=FALSE}
try({
a <- 1
b <- "2"
a + b
})
```
In case of error, `try` returns a object of class `try-error`:
```{r, eval=FALSE}
success <- try(1 + 2)
failure <- try(1 + "2", silent = TRUE)
class(success)
class(failure)
```
```{r, eval=FALSE}
inherits(failure, "try-error")
if (inherits(failure, "try-error"))
message("There was an error here.")
```
Handling errors is particularly useful to iterate over all elements of
an input, despite errors (and inspecting/handling/fixing the errors
afterwards).
```{r, eval=FALSE}
el <- list(1:10, c(-1, 1), TRUE, "1")
res <- lapply(el, log)
res
res <- lapply(el, function(x) try(log(x)))
res
```
#### Hadley's tip
> Another useful `try()` idiom is using a default value if an
> expression fails. Simply assign the default outside the `try` block,
> and then run the risky code:
```{r, eval=FALSE}
default <- NULL
try(default <- read.csv("possibly-bad-input.csv"), silent = TRUE)
```
> There is also `plyr::failwith()`, which makes this strategy even
> easier to implement.
```{r, eval=FALSE}
f <- function(x)
if (x == 1) stop("Error!") else 1
f(1)
f(2)
safef <- failwith(NULL, f)
safef(1)
safef(2)
```
Use `tryCatch` to specify a behaviour (handler function) in case of
error, warning or message.
```{r, eval=FALSE}
f <- function() {
x <- "1"
tryCatch(log(x),
error = function(e) cat("There was an error!\n"))
message("x was the ", class(x), " ", x)
}
f()
```
More example from Hadleys' *Advanced R* book.
```{r, eval=FALSE}
show_condition <- function(code) {
tryCatch(code,
error = function(c) "error",
warning = function(c) "warning",
message = function(c) "message"
)
}
show_condition(stop("!"))
show_condition(warning("?!"))
show_condition(message("?"))
show_condition(0)
```
A more informative `read.csv` version:
```{r, eval=FALSE}
read.csv2 <- function(file, ...) {
tryCatch(read.csv(file, ...), error = function(c) {
c$message <- paste0(c$message, " (in ", file, ")")
stop(c)
})
}
read.csv("code/dummy.csv")
read.csv2("code/dummy.csv")
```
`tryCatch` has a `finally` argument that specifies a code block to be
executed regardless of whether the initial expression succeeds or
fails. Usefull, for example, to clean up (deleting files, closing
connections, ...).
### `withCallingHandlers`
The `withCallingHandlers` function allows to defined special behaviour
in case of *unusual conditions*, including warnings and errors. In the
example below, we start a browser in case of (obscure) warnings.
```{r, eval=FALSE}
f <- function(x = 10) {
lapply(seq_len(x), function(i) {
## make an example 2x2 contingency table
d <- matrix(sample(4:10, 4), nrow = 2, ncol = 2)
## will produce warning if there is a 5 or less
## in the contingency table
chisq.test(d)
})
}
```
```{r, eval=FALSE}
set.seed(1)
f()
set.seed(1)
withCallingHandlers(f(), warning=function(e) recover())
```
### Difference between `tryCatch` and `withCallingHandlers`
(From [*Advanced R*](http://adv-r.had.co.nz/Exceptions-Debugging.html#condition-handling))
The handlers in `withCallingHandlers()` are called in the context of
the call that generated the condition whereas the handlers in
`tryCatch()` are called in the context of `tryCatch()`. This is shown
here with `sys.calls()`, which is the run-time equivalent of
`traceback()` -- it lists all calls leading to the current function.
```{r, eval=FALSE}
f <- function() g()
g <- function() h()
h <- function() stop("!")
tryCatch(f(), error = function(e) print(sys.calls()))
withCallingHandlers(f(), error = function(e) print(sys.calls()))
```
### Debugging at the C level with `ddd` or `gdb`
Demo
### Exercise
```{r, eval=TRUE, echo=FALSE}
safelog <- function(x) {
tryCatch(log(x),
error = function(e) paste("an error with input", x),
warning = function(e) paste("a warning with input", x))
}
```
Write a new `safelog` function that catches and handles errors and
warnings to emulate the following behaviour.
```{r, echo=TRUE, eval=TRUE}
log(1)
safelog(1)
log(-1)
safelog(-1)
log("a")
safelog("a")
```
**Answer**
```{r, echo=TRUE}
safelog <- function(x) {
tryCatch(log(x),
error = function(e) paste("an error with input", x),
warning = function(e) paste("a warning with input", x))
}
```
## Tracing code
From `?trace`:
> A call to `trace` allows you to insert debugging code (e.g., a call
> to `browser` or `recover`) at chosen places in any function. A call
> to `untrace` cancels the tracing.
```{r, eval=FALSE}
## Report whenever e invoked
trace(sum)
hist(rnorm(100))
untrace(sum)
```
```{r, eval=FALSE}
## Evaluate arbitrary code whenever e invoked
trace(e, quote(cat("i am", i, "\n")))
## Another way to enter browser whenver e invoked
trace(e, browser)
## stop tracing
untrace(e)
```
### Inserting code dynamically
> The `trace` function operates by constructing a revised version of
> the function (or of the method, if ‘signature’ is supplied), and
> assigning the new object back where the original was found.
```{r, eval=FALSE}
f <- function() {
## make an example 2x2 contingency table
d <- matrix(sample(4:10, 4), nrow=2, ncol=2)
chisq.test(d)
}
set.seed(1)
f() ## no warning
set.seed(11)
f() ## warning
```
We want to conditionally enter brower mode, when an element of `d` is
smaller than 5.
```{r, eval=FALSE}
if (any(d < 5))
browser()
```
This expression must be executed at a specific location in our function `f`:
```{r, eval=FALSE}
as.list(body(f))
```
```{r, eval=FALSE}
trace("f", quote(if (any(d < 5)) browser()), at = 3)
```
We can now run our updated function `f`
```{r, eval=FALSE}
f
body(f)
```
```{r, eval=FALSE}
set.seed(1)
f() ## normal execution
set.seed(11)
f() ## enters browser mode
```
### Debugging S4 methods
> The `trace` function operates by constructing a revised version of
> the function (or of the method, if ‘signature’ is supplied), and
> assigning the new object back where the original was found.
```{r, eval=FALSE}
library("MSnbase")
data(itraqdata)
x <- itraqdata[[1]]
plot(x, full=TRUE)
```
Not helpful:
```{r, eval=FALSE}
debug(plot)
plot(x, full=TRUE)
```
Try again:
```{r, eval=FALSE}
trace("plot", browser,
signature = c("Spectrum", "missing"))
plot(x, full=TRUE)
```
## Unit testing
See [here](https://github.com/lgatto/2016-02-25-adv-programming-EMBL/blob/master/unittesting.md).