Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Migrate most shrinker functions to the ir #3962

Open
wants to merge 23 commits into
base: master
Choose a base branch
from

Conversation

tybug
Copy link
Member

@tybug tybug commented May 1, 2024

part of #3921.

This is a big one, apologies! Hard to avoid; the remaining functions had cascading dependencies on each other.

I've benchmarked these changes by hooking minimal with derandomize=False and 5 trials (conftest.py, viz script + full results):

latest (d9a077d) image
old (f7d0d29) image
old (d81d50e) image
old (901d636) image

Some pretty big dips, unfortunately. I think there are a few contributors here:

  • doubling up on Integer.shrink from the negative and positive direction (see review comment)
  • to know whether an ir tree A is better than B ahead of time, we have to convert both to a buffer and sort_key the buffers, which costs a call. This will be improved when we move our ordering function to the ir tree.
  • I probably messed up migrating some of these shrink passes such that they are less effective now, but it's hard to tell amongst all the other noise (or signal, arguably). (e: exhibit A 😅 901d636)

Conversations of note:

@tybug tybug requested a review from Zac-HD as a code owner May 1, 2024 22:49
@@ -2292,12 +2327,16 @@ def _pop_ir_tree_node(self, ir_type: IRTypeName, kwargs: IRKWargsType) -> IRNode
# (in fact, it is possible that giving up early here results in more time
# for useful shrinks to run).
if node.ir_type != ir_type:
# needed for try_shrinking_nodes to see what node was *attempted*
# to be drawn.
self.invalid_at = (ir_type, kwargs)
self.mark_invalid(f"(internal) want a {ir_type} but have a {node.ir_type}")
Copy link
Member Author

@tybug tybug May 1, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now that we've moved to a stronger typed representation in the ir, we do pretty badly on a class of problems: crossing typed boundaries. In particular shrinking one_of(A, B) where A has a different type/kwargs than B. Before, we would shrink the integer for one_of causing it to choose A instead of B. A interprets the bits differently than B, but we would get lucky most of the time and find an counterexample; and now that we're in A we do more robust shrinking. In the ir we're much more strict and mark_invalid as soon as we have a misalignment, so the initial coincidental slip from B to A never happens.

One thing that works here is drawing randomly instead of giving up when we hit a misalignment, which I would say is basically equivalent to master behavior.

Concretely:

# condition avoids trivial counterexamples
assert minimal(st.integers(101, 200) | st.integers(0, 100), lambda n: n not in [101, 0]) == 102

fails on this branch and passes on master (I'll add this as a test case as well).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

idle thoughts: I wonder if a short "randomly generate ir trees smaller than our current value" phase when the shrinker would otherwise exit as stuck would get us out of many of these coincidental slips. Possibly not, since if we didn't have to exploit some fixed structure while shrinking, we likely would have found this counterexample during the generation phase.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven’t touched Hypothesis in a few years, so take this with a grain of salt, but I vaguely recall that having the shrinker introduce randomness into the data stream has tended to be a bad idea.

Normally, the shrinker tends to make its input both shorter and simpler over time, despite the fact that (999 random bytes) is technically “smaller” than (1000 zero bytes).

But if the shrinker starts reintroducing randomness, this falls apart, and the shrinker will get obsessed with some high-entropy input because it happens to be shorter the current low-entropy input.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seconded on this. Specifically the problem that tends to result is that the performance becomes terrible, because you end up repeatedly reintroducing lots of expensive-to-shrink regions in order to make minor size gains.

My guess is that you could possibly fix this with some sort of low-entropy generation mode (e.g. always generate one of the two smallest values for a given type) but I'm not sure how robust that would be and it potentially still has the same problem.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That intuition makes sense to me. I think performance may still be ok if we only introduce randomness as a last-resort phase in the shrinker when we are otherwise at a fixpoint. iirc the shrinker used to have some dynamic enabling logic similar to this. Of course we're now complicating the shrink pass logic and have to justify the tradeoff in complexity as worthwhile. Alternatively low-entropy generation also seems reasonable.

It just feels quite bad to not be able to shrink st.integers() | st.floats() because we happened to hit an error in the floats region first, and we are seeing a few regressions in the shrink quality test suite due to this (though how much that maps to reality is unclear to me).

Comment on lines 1345 to 1354
# try shrinking from both sides towards shrink_towards
shrink_towards = kwargs["shrink_towards"]
Integer.shrink(
abs(shrink_towards - value),
lambda n: self.try_shrinking_nodes(nodes, shrink_towards + n),
)
Integer.shrink(
abs(shrink_towards - value),
lambda n: self.try_shrinking_nodes(nodes, shrink_towards - n),
)
Copy link
Member Author

@tybug tybug May 1, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is pretty brute force and does ~double the shrinks necessary in the worst case.

I think master behavior for integer shrinking is something like "shrink towards 0 on the current sign (only positive or only negative) and occasionally try and flip the sign bit". I think master gets a bit luckier/smarter than this in practice, because a straight-line translation of that here shrinks minimal(st.integers(), lambda n: not (-1 < n < 2)) to 2 instead of -1; to find -1 it needs to pursue the worse shrink of 2 to -2 and then shrink -2 to -1.

A complicating factor here is shrinking behavior is different for bounded and unbounded integers:

assert minimal(st.integers(), lambda n: not (-1 < n < 2)) == -1
assert minimal(st.integers(-5, 5), lambda n: not (-1 < n < 2)) == 2
assert minimal(st.integers(max_value=5), lambda n: not (-1 < n < 2)) == -1
assert minimal(st.integers(min_value=-5), lambda n: not (-1 < n < 2)) == -1

which we can't change until we move the ordering to the ir.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

to find -1 it needs to pursue the worse shrink of 2 to -2 and then shrink -2 to -1

I suspect it's getting lucky with some sort of punning here, as the integers strategy is a mixture under the hood. I don't think it can reliably do this strategy - there's no backtracking in the shrinker - but it might be doing something that causes previous bytes for a larger integer width to be reinterpreted as a negative integer.

@Zac-HD
Copy link
Member

Zac-HD commented May 3, 2024

Can we get a secondary (log) y-axis on the plot showing proportional change, in addition to the absolute number? +800 calls is quite different on a base of 50 to 2,000!

Comment on lines +79 to +80
# floats close to math.inf can overflow in this intermediate step.
# probably something we should fix?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This part (like the fraction shrink below) only matter when the fractional part is nonzero, i.e. for floats less than 2^53. So the test for inf is in practice fine, but it may be clearer/better to make the loop conditional, or move the integer delegation inside the loop so that it triggers before it has a chance to overflow.

@tybug
Copy link
Member Author

tybug commented May 3, 2024

I've updated the graph to include an n✕ change axis. I'm looking into the worst discrepancies there next.

@Zac-HD
Copy link
Member

Zac-HD commented May 3, 2024

I've updated the graph to include an n✕ change axis. I'm looking into the worst discrepancies there next.

Fwiw this looks fine to me - the large-negative multipliers are relative to a small baseline, and the tails with large absolute changes aren't that large in relative terms. Might as well look into it, but don't stress too much + consider doing so as a follow-up later.

@tybug
Copy link
Member Author

tybug commented May 4, 2024

agree it doesn't look hugely out of line, but I won't feel confident in this pull until I look into it, for my own sake more than anything 😄

I updated the benchmark for 50b2b13 + d81d50e. It's a bit misleading because we're basically getting a free constant shrink boost by disabling the dfa, so I would mentally downweight the improvements a bit. That said, it's looking quite a bit better. More investigation of where we do badly to come.

@Zac-HD
Copy link
Member

Zac-HD commented May 4, 2024

I updated the benchmark for 50b2b13 + d81d50e. It's a bit misleading because we're basically getting a free constant shrink boost by disabling the dfa, so I would mentally downweight the improvements a bit. That said, it's looking quite a bit better. More investigation of where we do badly to come.

Nice! You could get a buffer-without-dfas baseline if you wanted to, but I don't think this is really necessary - you're already measuring the change as we're going to ship it, and it's looking good. I guess in principle the downside might be some loss of shrink quality, but I don't expect anyone to notice that.

@tybug
Copy link
Member Author

tybug commented May 22, 2024

ok, I think that's all the performance we're going to get out of the shrinker for now. I'm really happy with what we were able to squeeze out (updated graphs in description). b69cf17 turned out to be critical for good performance.

A few remaining coverage failures, but the implementation should be settled here. Majority of the delay here was in fixing some subtle bugs (aeccbc5, 172dbd9).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants