Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for parallel reduction #12

Open
sethfowler opened this issue Apr 21, 2017 · 6 comments
Open

Support for parallel reduction #12

sethfowler opened this issue Apr 21, 2017 · 6 comments
Labels

Comments

@sethfowler
Copy link

It seems like it'd be nice to have support for running reductions in parallel.

There are some disadvantages, so this should definitely be opt-in. One issue is that there may be dependencies between chunks which need to be handled in some fashion. When you have a file with chunks like this:

A
B
C
D

It may be that you can successfully remove C only if you have previously removed A, or that you will succeed in removing D only if you did not already remove B. If you run the reductions where you remove A, B, C, and D in parallel, this will produce nondeterministic results.

Nondeterministic results would be acceptable for a lot of use cases, though, so I think there'd be value in supporting the feature.

Would the other users of lithium find this useful? Has anyone taken a look at doing this already?

@sethfowler
Copy link
Author

A naive implementation would also assume that the chunks are independent enough that if you can remove C by itself and D by itself, you can remove both C and D. That happens to be true for my use case, but isn't true in general. If the user knows that assumption is safe, that's great, and perhaps that's the only case worth supporting.

When that assumption isn't true, what can happen is that you decide to remove C and D in parallel, "commit" those changes, and then go on to test removing E with both C and D removed. Since C and D combined result in an uninteresting file, we'll decide we can't remove E, and indeed we won't be able to remove anything else unless we get lucky and find a compensating removal that makes the file interesting again.

One could imagine dealing with this with something a bit hacky - for example, remember the last interesting version of the file, and if you see a long streak of uninteresting versions or hit EOF, go back and see if that streak began by combining parallel reductions. If so, the parallel reductions can be retested in isolation (for the example in the previous paragraph, that'd involve testing with only C and D removed) to make sure that combining them still resulted in an interesting file. If we detect that we're retesting often enough that we're not gaining anything from the parallelism, we switch to serial mode.

That's all pretty complicated, though. =) For my own use case, and I expect for many use cases, it'd be fine to combine the reductions naively.

@nth10sd
Copy link
Contributor

nth10sd commented May 1, 2017

While parallelism in computing brings a lot of advantages, I'm not sure it is an entirely good idea to introduce this without having a comprehensive heuristics strategy.

Some of the testcases we reduce via Lithium are naturally intermittent by nature, i.e. they are testcases involving garbage collection, or OOM issues, or simply timing events. They have much potential to race and to produce inconsistent results. In fact, even if we had a consistently reproducible testcase initially, as Lithium starts reducing them, they may turn intermittent. (of course, the other way is possible too)

Now, we have some strategies to mitigate this, either on the OS level (disabling ASLR), or by using special functions available in certain binaries (e.g. in the SpiderMonkey js engine, we have OOM-specific functions [oomAfterAllocations] to specifically reduce intermittence), or even in Lithium, where one uses the range.py interestingness test to test multiple times before determining if the testcase is "interesting".

It remains unclear to me how adding parallelism can help solve these issues. Of course, in the ideal case, the testcase is consistently reproducible as it gets reduced where I see a potential benefit, but one will have to code Lithium specifically to avoid racing against itself. We also have to note the different storage speeds of media, where a slow hard disk and OS-issues (Win?) could even cause a reduction that would take longer than a non-parallel one.

Nondeterministic results would be acceptable for a lot of use cases, though, so I think there'd be value in supporting the feature.

Why do you think that this is the case? Do you have some examples?

If we detect that we're retesting often enough that we're not gaining anything from the parallelism, we switch to serial mode.

What would the heuristic for this look like? Will we have to set an arbitrary number of reduction attempts that fail in parallel mode before switching to serial mode?

Ultimately, I'm not sure that the value of adding the complexity of this for a potential boost in reduction speed, for a small number of situations is worth it.

@sethfowler
Copy link
Author

Thanks for the detailed response! Let me start by addressing your questions:

Why do you think that this is the case? Do you have some examples?

All I can really say is that this approach would work well for my use case, given my specific needs: getting a reduced test case quickly is very important, but it's important that Lithium produce identical output every time it's run on the same test case.

To give you a little bit of context, I'm using Lithium for reducing test cases written in the P4 language. These are test cases for the P4 toolchain itself, and "interestingness" in this context generally means that the test case triggers an internal compiler error. P4 programs can be very large; compilation often takes on the order of ten minutes, and reducing a test case with Lithium can take days. That's why I'm interested in speeding up the process. =)

What would the heuristic for this look like? Will we have to set an arbitrary number of reduction attempts that fail in parallel mode before switching to serial mode?

That's one option, though I think it would be better to calculate some rough measure of speedup and switch to serial mode if the speedup isn't sufficient to justify running in parallel.

Ultimately, I'm not sure that the value of adding the complexity of this for a potential boost in reduction speed, for a small number of situations is worth it.

Mozilla faces particularly difficult challenges in terms of test case reduction due to the sheer size and complexity of Gecko. I'd argue (without evidence =) that many potential users of Lithium have situations similar to mine: they are attempting to reduce test cases that take a very long time to run precisely because they take so long to run. In these cases, reduction speed is very important; the goal is to improve developers' debugging productivity and make more efficient use of resources, and running a tool that takes days to finish is unfortunately more or less in opposition to those goals.

Hopefully that explains why I see value in supporting parallelism in Lithium. If it doesn't feel like a good fit for where you want to take the project, though, that's OK. =) I opened this issue mostly to get an idea of how the folks working on Lithium would feel about these kinds of changes, and whether there'd be any interest in merging patches to implement them.

@sethfowler
Copy link
Author

FWIW, ideally what I'd like to do is get #13 or something similar implemented (or maybe #11 is already enough! I'll try it out soon) and then experiment with adding a naive approach to parallelism (a work queue of chunks, assume that chunks are more or less independent with respect to the effects of their removal on interestingness). I'll report back as to whether I see any success with that approach. More complicated heuristics should definitely wait until the basic approach is validated.

@nth10sd
Copy link
Contributor

nth10sd commented May 3, 2017

To give you a little bit of context, I'm using Lithium for reducing test cases written in the P4 language.

Thanks for the context, Seth. It has really helped in framing how other folks use Lithium to aid their work tasks, and I can definitely see how it would work for you.

I opened this issue mostly to get an idea of how the folks working on Lithium would feel about these kinds of changes, and whether there'd be any interest in merging patches to implement them.

I am more than happy to add hooks in Lithium to help you achieve your goal of parallelism. Some of the ways we deal with the needs of multiple projects in GitHub is to host them in different repositories - that way we can compartmentalise projects without one being too giant of a monolith. (We've made that mistake before, and trying to avoid it going forward)

If you notice the funfuzz repository, it relies on the Lithium repository being "next" to it, among others. You could create a separate repository and check out Lithium separately on the side. That way, your repository can host the parallelism code that calls into Lithium, while Lithium contains the serial stuff.

We have plans to make Lithium more embeddable, such as pushing it up to PyPi among others. However, that is still some ways off. Note that @jschwartzentruber has a refactor of Lithium coming up, that eradicates globals and pushes things into a more classes-oriented manner, in the short-term.

In all, thanks for explaining your points in a detailed way. We are committed to having our code be useful for as many folks as possible (where reasonable and aligned to our mission), so we definitely value your ideas as to how we can achieve even better co-operation going forward. :)

@nth10sd
Copy link
Contributor

nth10sd commented May 5, 2017

Note that @jschwartzentruber has a refactor of Lithium coming up

This is PR #14.

such as pushing it up to PyPi among others

We don't know if we're actually doing this (or when, for that matter), but #16 should help in this regard.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants