Migrate most shrinker functions to the ir #3962

tybug · 2024-05-01T22:49:50Z

part of #3921.

This is a big one, apologies! Hard to avoid; the remaining functions had cascading dependencies on each other.

I've benchmarked these changes by hooking minimal with derandomize=False and 5 trials (conftest.py, viz script + full results):

latest (d9a077d)

old (f7d0d29)

old (d81d50e)

old (901d636)

Some pretty big dips, unfortunately. I think there are a few contributors here:

doubling up on Integer.shrink from the negative and positive direction (see review comment)
to know whether an ir tree A is better than B ahead of time, we have to convert both to a buffer and sort_key the buffers, which costs a call. This will be improved when we move our ordering function to the ir tree.
I probably messed up migrating some of these shrink passes such that they are less effective now, but it's hard to tell amongst all the other noise (or signal, arguably). (e: exhibit A 😅 901d636)

Conversations of note:

disabling shrinking dfas (Migrate most shrinker functions to the ir #3962 (comment))

tybug · 2024-05-01T22:51:02Z

hypothesis-python/src/hypothesis/internal/conjecture/data.py

@@ -2292,12 +2327,16 @@ def _pop_ir_tree_node(self, ir_type: IRTypeName, kwargs: IRKWargsType) -> IRNode
 # (in fact, it is possible that giving up early here results in more time
 # for useful shrinks to run).
 if node.ir_type != ir_type:
+ # needed for try_shrinking_nodes to see what node was *attempted*
+ # to be drawn.
+ self.invalid_at = (ir_type, kwargs)
 self.mark_invalid(f"(internal) want a {ir_type} but have a {node.ir_type}")


Now that we've moved to a stronger typed representation in the ir, we do pretty badly on a class of problems: crossing typed boundaries. In particular shrinking one_of(A, B) where A has a different type/kwargs than B. Before, we would shrink the integer for one_of causing it to choose A instead of B. A interprets the bits differently than B, but we would get lucky most of the time and find an counterexample; and now that we're in A we do more robust shrinking. In the ir we're much more strict and mark_invalid as soon as we have a misalignment, so the initial coincidental slip from B to A never happens.

One thing that works here is drawing randomly instead of giving up when we hit a misalignment, which I would say is basically equivalent to master behavior.

Concretely:

# condition avoids trivial counterexamples assert minimal(st.integers(101, 200) | st.integers(0, 100), lambda n: n not in [101, 0]) == 102

fails on this branch and passes on master (I'll add this as a test case as well).

idle thoughts: I wonder if a short "randomly generate ir trees smaller than our current value" phase when the shrinker would otherwise exit as stuck would get us out of many of these coincidental slips. Possibly not, since if we didn't have to exploit some fixed structure while shrinking, we likely would have found this counterexample during the generation phase.

I haven’t touched Hypothesis in a few years, so take this with a grain of salt, but I vaguely recall that having the shrinker introduce randomness into the data stream has tended to be a bad idea.

Normally, the shrinker tends to make its input both shorter and simpler over time, despite the fact that (999 random bytes) is technically “smaller” than (1000 zero bytes).

But if the shrinker starts reintroducing randomness, this falls apart, and the shrinker will get obsessed with some high-entropy input because it happens to be shorter the current low-entropy input.

Seconded on this. Specifically the problem that tends to result is that the performance becomes terrible, because you end up repeatedly reintroducing lots of expensive-to-shrink regions in order to make minor size gains.

My guess is that you could possibly fix this with some sort of low-entropy generation mode (e.g. always generate one of the two smallest values for a given type) but I'm not sure how robust that would be and it potentially still has the same problem.

That intuition makes sense to me. I think performance may still be ok if we only introduce randomness as a last-resort phase in the shrinker when we are otherwise at a fixpoint. iirc the shrinker used to have some dynamic enabling logic similar to this. Of course we're now complicating the shrink pass logic and have to justify the tradeoff in complexity as worthwhile. Alternatively low-entropy generation also seems reasonable.

It just feels quite bad to not be able to shrink st.integers() | st.floats() because we happened to hit an error in the floats region first, and we are seeing a few regressions in the shrink quality test suite due to this (though how much that maps to reality is unclear to me).

hypothesis-python/src/hypothesis/internal/conjecture/engine.py

tybug · 2024-05-01T22:57:09Z

hypothesis-python/src/hypothesis/internal/conjecture/shrinker.py

+ # try shrinking from both sides towards shrink_towards
+ shrink_towards = kwargs["shrink_towards"]
+ Integer.shrink(
+ abs(shrink_towards - value),
+ lambda n: self.try_shrinking_nodes(nodes, shrink_towards + n),
+ )
+ Integer.shrink(
+ abs(shrink_towards - value),
+ lambda n: self.try_shrinking_nodes(nodes, shrink_towards - n),
+ )


This is pretty brute force and does ~double the shrinks necessary in the worst case.

I think master behavior for integer shrinking is something like "shrink towards 0 on the current sign (only positive or only negative) and occasionally try and flip the sign bit". I think master gets a bit luckier/smarter than this in practice, because a straight-line translation of that here shrinks minimal(st.integers(), lambda n: not (-1 < n < 2)) to 2 instead of -1; to find -1 it needs to pursue the worse shrink of 2 to -2 and then shrink -2 to -1.

A complicating factor here is shrinking behavior is different for bounded and unbounded integers:

assert minimal(st.integers(), lambda n: not (-1 < n < 2)) == -1 assert minimal(st.integers(-5, 5), lambda n: not (-1 < n < 2)) == 2 assert minimal(st.integers(max_value=5), lambda n: not (-1 < n < 2)) == -1 assert minimal(st.integers(min_value=-5), lambda n: not (-1 < n < 2)) == -1

which we can't change until we move the ordering to the ir.

to find -1 it needs to pursue the worse shrink of 2 to -2 and then shrink -2 to -1

I suspect it's getting lucky with some sort of punning here, as the integers strategy is a mixture under the hood. I don't think it can reliably do this strategy - there's no backtracking in the shrinker - but it might be doing something that causes previous bytes for a larger integer width to be reinterpreted as a negative integer.

Zac-HD · 2024-05-03T09:13:41Z

Can we get a secondary (log) y-axis on the plot showing proportional change, in addition to the absolute number? +800 calls is quite different on a base of 50 to 2,000!

jobh · 2024-05-03T12:19:31Z

hypothesis-python/src/hypothesis/internal/conjecture/shrinking/floats.py

+ # floats close to math.inf can overflow in this intermediate step.
+ # probably something we should fix?


This part (like the fraction shrink below) only matter when the fractional part is nonzero, i.e. for floats less than 2^53. So the test for inf is in practice fine, but it may be clearer/better to make the loop conditional, or move the integer delegation inside the loop so that it triggers before it has a chance to overflow.

tybug · 2024-05-03T20:38:47Z

I've updated the graph to include an n✕ change axis. I'm looking into the worst discrepancies there next.

Zac-HD · 2024-05-03T21:40:29Z

I've updated the graph to include an n✕ change axis. I'm looking into the worst discrepancies there next.

Fwiw this looks fine to me - the large-negative multipliers are relative to a small baseline, and the tails with large absolute changes aren't that large in relative terms. Might as well look into it, but don't stress too much + consider doing so as a follow-up later.

tybug · 2024-05-04T01:08:31Z

agree it doesn't look hugely out of line, but I won't feel confident in this pull until I look into it, for my own sake more than anything 😄

I updated the benchmark for 50b2b13 + d81d50e. It's a bit misleading because we're basically getting a free constant shrink boost by disabling the dfa, so I would mentally downweight the improvements a bit. That said, it's looking quite a bit better. More investigation of where we do badly to come.

hypothesis-python/src/hypothesis/internal/conjecture/engine.py

Zac-HD · 2024-05-04T07:05:41Z

I updated the benchmark for 50b2b13 + d81d50e. It's a bit misleading because we're basically getting a free constant shrink boost by disabling the dfa, so I would mentally downweight the improvements a bit. That said, it's looking quite a bit better. More investigation of where we do badly to come.

Nice! You could get a buffer-without-dfas baseline if you wanted to, but I don't think this is really necessary - you're already measuring the change as we're going to ship it, and it's looking good. I guess in principle the downside might be some loss of shrink quality, but I don't expect anyone to notice that.

this fixes a nasty flaky error that I spent many hours tracking down

tybug · 2024-05-22T16:57:03Z

ok, I think that's all the performance we're going to get out of the shrinker for now. I'm really happy with what we were able to squeeze out (updated graphs in description). b69cf17 turned out to be critical for good performance.

A few remaining coverage failures, but the implementation should be settled here. Majority of the delay here was in fixing some subtle bugs (aeccbc5, 172dbd9).

migrate most shrinker functions to the ir

0ba943c

tybug requested a review from Zac-HD as a code owner May 1, 2024 22:49

tybug commented May 1, 2024

View reviewed changes

tybug added 3 commits May 2, 2024 01:04

format

9e3dbc0

temporarily hack around test_sets_of_fixed_length

b02a67d

fix region deletion logic using start instead of ir_start

901d636

jobh reviewed May 3, 2024

View reviewed changes

tybug added 2 commits May 3, 2024 21:02

cache called ir_tree_nodes in addition to computed ones

50b2b13

disable dfa shrinking

d81d50e

tybug commented May 4, 2024

View reviewed changes

hypothesis-python/src/hypothesis/internal/conjecture/engine.py Outdated Show resolved Hide resolved

track ir tree datas in the DataTree

f7d0d29

tybug mentioned this pull request May 8, 2024

Improve caching for misaligned ConjectureDatas #3977

Merged

tybug added 13 commits May 9, 2024 19:18

Merge branch 'master' into shrinker-ir

f826f69

format

bc418d9

Merge branch 'master' into shrinker-ir

12d0019

respect forced status in datatree simulate for invalid nodes

172dbd9

this fixes a nasty flaky error that I spent many hours tracking down

only set invalid_at when required

a1b1732

track and report misaligned shrinking counts

627af8f

check initial value in minimize_nodes

870c20b

guard against out of bounds replace_all

e88ce7e

normalize to minimal buffer when shrinking

aeccbc5

check ir_value_permitted for all trees

b69cf17

fix invalid_at tests

866c846

improve datatree printing

d26fd9b

disable dfa shrinking tests for now

1fe712e

tybug added 3 commits May 22, 2024 12:33

allow worse [""] shrink for test_non_trivial_json

ce2a9c7

add slip shrink test

d9a077d

Merge branch 'master' into shrinker-ir

2a96a5c

tybug mentioned this pull request May 22, 2024

Shrinker-adjacent improvements #4001

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Migrate most shrinker functions to the ir #3962

Migrate most shrinker functions to the ir #3962

tybug commented May 1, 2024 •

edited

tybug May 1, 2024 •

edited

tybug May 2, 2024

Zalathar May 3, 2024

DRMacIver May 3, 2024

tybug May 8, 2024

tybug May 1, 2024 •

edited

DRMacIver May 4, 2024

Zac-HD commented May 3, 2024

jobh May 3, 2024

tybug commented May 3, 2024

Zac-HD commented May 3, 2024

tybug commented May 4, 2024

Zac-HD commented May 4, 2024

tybug commented May 22, 2024 •

edited

		# floats close to math.inf can overflow in this intermediate step.
		# probably something we should fix?

Migrate most shrinker functions to the ir #3962

Are you sure you want to change the base?

Migrate most shrinker functions to the ir #3962

Conversation

tybug commented May 1, 2024 • edited

tybug May 1, 2024 • edited

Choose a reason for hiding this comment

tybug May 2, 2024

Choose a reason for hiding this comment

Zalathar May 3, 2024

Choose a reason for hiding this comment

DRMacIver May 3, 2024

Choose a reason for hiding this comment

tybug May 8, 2024

Choose a reason for hiding this comment

tybug May 1, 2024 • edited

Choose a reason for hiding this comment

DRMacIver May 4, 2024

Choose a reason for hiding this comment

Zac-HD commented May 3, 2024

jobh May 3, 2024

Choose a reason for hiding this comment

tybug commented May 3, 2024

Zac-HD commented May 3, 2024

tybug commented May 4, 2024

Zac-HD commented May 4, 2024

tybug commented May 22, 2024 • edited

tybug commented May 1, 2024 •

edited

tybug May 1, 2024 •

edited

tybug May 1, 2024 •

edited

tybug commented May 22, 2024 •

edited