Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cover: consider tracking pairs of coverage instrumentation points #249

Open
thepudds opened this issue Jun 1, 2019 · 8 comments
Open

cover: consider tracking pairs of coverage instrumentation points #249

thepudds opened this issue Jun 1, 2019 · 8 comments

Comments

@thepudds
Copy link
Collaborator

thepudds commented Jun 1, 2019

This is not something I would personally suggest doing right now or in the near term, but wanted to at least record the suggestion. (Also, perhaps this is already tracked somewhere else, or maybe something like this is already being done within go-fuzz, in which case I will learn something ;-).

To my knowledge, if we ignore other things like sonar for a moment, go-fuzz coverage tracks individual coverage points like so:

_go_fuzz_dep_.CoverTab[5262]++

where CoverTab is defined as:

// CoverTab holds code coverage.
// It is initialized to a new array so that instrumentation
// executed during process initialization has somewhere to write to.
// It is replaced by a newly initialized array when it is
// time for actual instrumentation to commence.
var CoverTab = new([CoverSize]byte)

Other fuzzers such as afl-fuzz take a different approach. For example, from http://lcamtuf.coredump.cx/afl/technical_details.txt:

The code injected at branch points is essentially equivalent to:

  cur_location = <COMPILE_TIME_RANDOM>;
  shared_mem[cur_location ^ prev_location]++; 
  prev_location = cur_location >> 1;
  
  [...]
  
This form of coverage provides considerably more insight into the execution
path of the program than simple block coverage. In particular, it trivially
distinguishes between the following execution traces:

  A -> B -> C -> D -> E (tuples: AB, BC, CD, DE)
  A -> B -> D -> C -> E (tuples: AB, BD, DC, CE)

This aids the discovery of subtle fault conditions in the underlying code,
because security vulnerabilities are more often associated with unexpected
or incorrect state transitions than with merely reaching a new basic block.

The reason for the shift operation in the last line of the pseudocode shown
earlier in this section is to preserve the directionality of tuples (without
this, A ^ B would be indistinguishable from B ^ A) and to retain the identity
of tight loops (otherwise, A ^ A would be obviously equal to B ^ B).

It seems go-fuzz (or a future go test -fuzz=. ./...) could at least consider that approach

On the other hand, go-fuzz has some greater sophistication in other areas, which very well might mean this is not as useful as it might appear for other fuzzers.

@dvyukov
Copy link
Owner

dvyukov commented Jun 2, 2019

As far as I understand AFLs use of XOR is an attempt to get edge coverage on top of the weak binary instrumentation. LibFuzzer uses compiler instrumentation which does instrument edges, and it does not use XOR. In go-fuzz I tried to get edge coverage by adding empty else branches and default switch cases. It should cover most of the cases. I don't remember if we add special instrumentation to short-circuited && and ||, but we could. Proper edge coverage should be simpler with compiler instrumentation.

@thepudds
Copy link
Collaborator Author

thepudds commented Aug 13, 2019

I constructed a simple synthetic example (at the bottom) to try to illustrate a potential difference here.

I then did a quick modification of go-fuzz-build/cover.go, and saw a reasonably good speed-up on that synthetic example:

  • With the modifications, it usually takes something like 5-20 seconds or so to get to "bingo".
  • Without the modifications, it can take something like 10-15 minutes to get to "bingo" (though there is reasonably high variance, and I usually killed it after a few minutes when testing, so I'm not 100% sure of the real average).

That is with 4 workers. YMMV.

A (heavily) simplified version of the code inserted by the normal instrumentation I think is something like:

func foo() {
  if a {
    _go_fuzz_dep_.CoverTab[11111]++
  }
  if b {
    _go_fuzz_dep_.CoverTab[22222]++
  }
}

In contrast, the modified code instead (in some cases) inserts code roughly similar to:

func foo() {
  var _go_fuzz_prev_loc int
  
  if a {
    _go_fuzz_dep_.CoverTab[11111 ^ _go_fuzz_prev_loc]++
    _go_fuzz_prev_loc = 11111 >> 1
  }
  if b {
    _go_fuzz_dep_.CoverTab[22222 ^ _go_fuzz_prev_loc]++
    _go_fuzz_prev_loc = 22222 >> 1
  }
}

That again is a heavy simplification of the actual generated code, and doesn't show the automatically generated empty else clauses, etc. (The modified code also treats an empty block slightly differently than a block with code and a couple other small nuances, but that pseudo-code is at least an attempt to convey the gist of the change).

Here is the synthetic example, which tries for force go-fuzz to find three interesting strings:

func FuzzEdge(input []byte) int {
	found := 0
	list := stringSlice(input)
	if len(list) < 3 {
		return 0
	}
	if list[0] == "Z1" {
		found++
	}
	if list[1] == "Z2" {
		found++
	}
	if list[2] == "Z3" {
		found++
	}

	if found == 3 {
		panic("bingo")
	}
	return 0
}

func stringSlice(input []byte) []string {
	var list []string
	for len(input) > 0 {
		size := int(input[0])
		input = input[1:]
		if size > len(input) || size == 0 {
			break
		}
		s := string(input[:size])
		input = input[size:]
		list = append(list, s)
	}
	return list
}

I suspect part of the difference in speed is due to the modifications treating getting 2 out of 3 correct as a stronger signal than the vanilla go-fuzz-build. In other words, even if Z1 and Z2 have already been seen independently, if a new input contains a Z1 and Z2 together in the proper slots in the input data for the first time, with the modifications, that updates a new location in the CoverTab (vs. without the modifications, if Z1 and Z2 have already been seen independently, then Z1 and Z2 being seen together is not a new location in CoverTab, I think).

One heavy caveat is this is just a quick experiment (and probably not the right way to do it in terms where to modify the code, etc.).

This initial result might be useful, or might not be.

Finally, the current version of the modifications linked above have a darwin-specific problem on the tip version of Go that currently causes it to fail to build (an issue with internal/syscall). I haven't looked at that carefully, but it passes travis on linux/darwin/windows with Go 1.11, while failing on Go 1.12 and tip on darwin.

edit: sorry, the darwin-specific problem also affects Go 1.12 as well as tip. (I originally wrote just tip seemed affected). Presumably the problem is solvable, but that is the current state.

@thepudds
Copy link
Collaborator Author

One additional set of quick data points.

A different branch can now use two previous locations to xor (as well as there is a constant at the top of go-fuzz-build/cover.go to change it to use one previous location to xor, which is closer to the approach above in #249 (comment)).

In theory, it might be that xor'ing two previous locations increases the amount of "look back" examined in terms of locations hit in a particular invocation of a function, and the CoverTab location gets progressively closer to representing the actual path taken through a function.

I tested against the example below, which has five strings of interest to find before "bingo", and hence this is intentionally a bit harder than the three string example above in #249 (comment).

The summary is that using two previous locations seems to hit bingo faster in this test than using one previous location, and one previous location hits bingo seemingly faster than the current go-fuzz master. That said, these are of course just quick experiments, and not conclusive.

Each experiment was run 10x times, with a max allowed time for a given fuzz run of 3 minutes prior to automatically killing the run and restarting go-fuzz for the next round (for a total of 30x runs across 90 minutes of wall clock time in total).

Using two previous locations:

  • 10x: hit bingo
    • Avg bingo time: 28 sec
    • Max bingo time: 51 sec

Using one previous locations:

  • 7x: did not hit bingo within 3 minutes
  • 3x: hit bingo
    • Avg bingo time: 24 sec
    • Max bingo time: 27 sec

Using go-fuzz master:

  • 10x: did not hit bingo within 3 minutes

The example used is similar to the one above, but extends it to also look for "Z4" and "Z5".

Hunt for 5 strings example (click to expand)
func FuzzEdgeDetectionMedium5(input []byte) int {
	var found int
	list := stringSlice(input)
	if len(list) < 5 {
		return 0
	}
	if list[0] == "Z1" {
		found++
	}
	if list[1] == "Z2" {
		found++
	}
	if list[2] == "Z3" {
		found++
	}
	if list[3] == "Z4" {
		found++
	}
	if list[4] == "Z5" {
		found++
	}

	if found == 5 {
		panic("bingo")
	}
	return 0
}

func stringSlice(input []byte) []string {
	var list []string
	for len(input) > 0 {
		size := int(input[0])
		input = input[1:]
		if size > len(input) || size == 0 {
			break
		}
		s := string(input[:size])
		input = input[size:]
		list = append(list, s)
	}
	return list
}

One potential drawback of increasing the lookback count might be inflating the corpus unnecessarily, but I haven't looked at that really. Also, one could consider increasing the count of previous locations tracked even further than the two above (to three or four or more), though there are likely diminishing returns at some point. For example, maybe in practice two is not materially better than one, even though two outperformed one in this simple test.

@dvyukov
Copy link
Owner

dvyukov commented Sep 18, 2019

  1. Sensibly benchmarking fuzzing changes is hard. As you noted this can inflate corpus. What is the effect on very large programs? If we inflate corpus from 1000 to 5000 inputs, that may potentially reduce overall efficiency.

  2. _go_fuzz_previous_location is a global. This won't work well with multiple goroutines. The current approach may be fine for evaluation (although we need to be super careful to avoid multiple goroutines, otherwise we will get corpus explosion and wrong results). But for production we would need something else. Not sure if passing the var as argument to all functions can fly. Another alternative would be to restrict this logic only to a single function and use a function local var.

@thepudds
Copy link
Collaborator Author

@dvyukov, thanks for taking a look at this, and thanks for the comments.

While this is fresh, I am going to make a quick reply now (without waiting a few days to double-check everything, though I'll try to double-check within a few days and comment back here if I find I said something wrong).

_go_fuzz_previous_location is a global.

There is a global, but hopefully the global is not used within a function or method (though I could have made a mistake). There is a local declaration of _go_fuzz_previous_location at the top of each function that should shadow the global declaration, so when _go_fuzz_previous_location is used to update coverage information within a function, it should be the local variable that is used.

At first, I was just using the local declaration of _go_fuzz_previous_location and there was no global _go_fuzz_previous_location, but I later hit a problem on larger code bases where some top-level declarations use boolean operators such as '&&', which meant the top-level declaration would get instrumented by the normal go-fuzz-build instrumentation, even though it is outside of a function. For example, math/exp_asm.go has this as a top-level declaration (outside of a function), and the '&&' means it gets instrumented by the normal go-fuzz-build with updates to _go_fuzz_dep_.CoverTab:

var useFMA = cpu.X86.HasAVX && cpu.X86.HasFMA

That's not a problem for normal go-build, but it caused a compilation error for my experiment, so one of the last things I did was add in the global as a simple workaround to let things compile... but hopefully that doesn't affect the local usage within a function (as long as I didn't make a silly mistake ;-). If this was to be pursued for real, a better change might be to not instrument those types of top-level declarations that are not useful for fuzzing? But as you said, given this was just an experiment, I was just looking for a simple change to make things work.

Sensibly benchmarking fuzzing changes is hard.

One other aspect that makes benchmarking this particular change more challenging is that it is harder to use coverage as a metric to compare before and after the change, because the change itself alters the coverage counts.

As you noted this can inflate corpus.

Agreed that is a danger. Also, just to be explicit, one other danger is saturating the CoverTab sooner for a given program, though I suspect that is a smaller danger compared to possibly over inflating the corpus. (Also, I'm not 100% sure, but I think right now, go-fuzz-build might insert some updates to CoverTab with locations that are effectively redundant in some case, so there might also be an opportunity to recoup some increases in CoverTab by eliminating some currently redundant locations updating CoverTab, though I'm not suggesting anyone tackle that now).

Three other related comments:

  1. Based on some testing (plus some questionable instincts ;-), I would guess using one previous location would be a net win, including there is some evidence with afl-fuzz that it is a reasonable choice, but comparing to afl-fuzz is not an apples-to-apples comparison, and I definitely don't have an exhaustive data set to prove anything. I can try a bit of longer scale testing, but it will still not be exhaustive.

  2. Using two previous locations I suspect in many cases would be a net win, but compared to one previous location, two locations might be more susceptible to hitting pathological cases... and a net win in many cases that is a significant drawback in other common cases is not very desirable, of course.

  3. One approach to mitigate the downside for either 1. or 2. might be to have a threshold that governs when the deeper instrumentation applies (e.g., maybe based on number of branches or instrumented locations, or something like that. A threshold could be applied on a per function basis, and/or possibly global thresholds, though global thresholds might not be as useful given go-fuzz-build I think doesn't really know what instrumented code will be executed vs. not).

In any event, thanks again for the comments. I guess this can be considered an experiment with "potentially interesting but potentially uninteresting" results.... and as I said in the first comment here, there are likely other things in the world of Go fuzzing that are more pressing compared to this...

@dvyukov
Copy link
Owner

dvyukov commented Sep 18, 2019

There is a local declaration of _go_fuzz_previous_location at the top of each function

Ah, interesting. I missed that. I guess this can work then.
The uses of the global var cannot possibly happen during fuzzing, only during init, right?

@dvyukov
Copy link
Owner

dvyukov commented Sep 18, 2019

One other aspect that makes benchmarking this particular change more challenging is that it is harder to use coverage as a metric to compare before and after the change, because the change itself alters the coverage counts.

FWIW in syzkaller I split "coverage" and "signal" into 2 completely independent things. Coverage is still plain PCs that can be used e.g. for coverage reports, and signal is some abstract numbers, can be edges/paths/something else, generic logic does not care what exactly it is. This is useful in some respects.
However I am not sure if there is a nice way to do this in go-fuzz. In syzkaller we collect a trace of all PCs and than that can be turned into edges/paths/etc (with function entry/exit annotations it could be turned into many more things b/c it would be possible to restore full stack trace for each PC). But using traces has own problems too, e.g. loops can produce insanely long traces. It also conflicts with goroutines.

@dvyukov
Copy link
Owner

dvyukov commented Sep 18, 2019

Three other related comments:

I don't know what to say. Maybe.
FWIW in libfuzzer we have lots of tunables for such things and then one can always experiment and try. I don't mind having some tunables in go-fuzz. But then again there is a question of what should happen when/if we move go-fuzz into std lib. The default definitely should be good. Both gc compiler and runtime package have some debug knobs, so maybe we could have too.

Re CoverTab, it always was supposed to be of tunable size. It's just that I did fixed 64K initially as the simplest option and then never had time/need to improve that. E.g. we could have instrumentation that does CoverTab[LOCATION_HASH & GlobalSizeMask]++.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants