Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add type annotations to hypothesis/internal/conjecture/engine.py #3961

Merged

Conversation

qthequartermasterman
Copy link
Contributor

@qthequartermasterman qthequartermasterman commented May 1, 2024

In an attempt to help with #3074, I have added type annotations to hypothesis/internal/conjecture/engine.py. I have checked these type hints with both mypy and pyright, and verified a few of the trickier-to-statically-determine types by checking their runtime types in the tests.

There are three things that I'm still confused about.

  1. Can Overrun be a valid value for self.interesting_examples?

In all places but this one sample (lines 389 in this PR), all of the keys for self.interesting_examples are ConjectureResult. However, this assignment may possibly be Overrun. Indeed, in one test, test_backend_can_shrink_bytes, this does occur. This seems (to me) to be the only place in the tests where this Overrun assignment occurs. If Overrun is a valid value, then this violates the typing in several other places where self.interesting_examples is accessed, but nothing seemed to ever trip up the asserts I had placed in those places (and since removed). If Overrun is indeed valid, then I will need to update the hints for self.interesting_examples, and I'd like to better understand the guarantees that ensure Overrun is never a value in other places (so the type narrowing asserts are legal). Many, but not all, of those references are in shrink_interesting_examples for example.

            if changed:
                self.save_buffer(data.buffer)
                self.interesting_examples[key] = data.as_result()  # type: ignore
                self.__data_cache.pin(data.buffer)
                self.shrunk_examples.discard(key)
  1. Potentially uninitialized variable trial_data because of a caught PreviouslyUnseenBehavior exception

In line 842, trial_data is defined inside of a try-except block. In the happy path, this variable is always defined. On the otherhand, unless I am misunderstanding something, if a PreviouslyUnseenBehavior is thrown, that variable will not be defined, meaning trial_data.observer.killed and a few other later instances will reference an uninitialized variable. This never happens in the tests, and I've certainly never experienced it while using hypothesis.

Am I misunderstanding the flow of the code here? Is this try-except superfluous? Or is there a genuine error in how the exception is handled?

                try:
                    trial_data = self.new_conjecture_data(
                        prefix=prefix, max_length=max_length
                    )
                    self.tree.simulate_test_function(trial_data)
                    continue
                except PreviouslyUnseenBehaviour:
                    pass

                # If the simulation entered part of the tree that has been killed,
                # we don't want to run this.
                assert isinstance(trial_data.observer, TreeRecordingObserver)
                if trial_data.observer.killed:
                    continue

                ...
  1. What is the proper type for stack in debug_data?

In the debug_data method there is a stack=[[]] on line 587. The expected type for stack is unclear to me. My initial impression based on stack[-1].append(int_from_bytes(data.buffer[ex.start : ex.end])) is that stack should be List[List[int]]. But these conditionals, where node:List[int] give conflicting types.

                if len(node) == 1:
                    stack[-1].extend(node)
                else:
                    stack[-1].append(node)

In the case of length-1 node, the final element of stack (which I thought is a List[int]) gets extended by one more integer. This makes sense to me. The else case, however, would mean that the last element of stack would have a list of ints appended as its final element. I.e. stack[-1] == [1,2,3,..., [1,2,3,4]] This means that stack would have to be List[List[int | List[int]]]. But this type then starts tripping on variance issues.

To further confuse matters, my debugger never seemed to stop on that strange else clause, so I never could actually find the proper runtime type.

What is the correct type here? I'm clearly not understanding what the code is doing to this data structure.

    def debug_data(self, data: ConjectureData | ConjectureResult) -> None:
        if not self.report_debug_info:
            return

        stack: list = [[]]

        def go(ex: Example) -> None:
            if ex.length == 0:
                return
            if len(ex.children) == 0:
                stack[-1].append(int_from_bytes(data.buffer[ex.start : ex.end]))
            else:
                node: list[int] = []
                stack.append(node)

                for v in ex.children:
                    go(v)
                stack.pop()
                if len(node) == 1:
                    stack[-1].extend(node)
                else:
                    stack[-1].append(node)

        go(data.examples[0])
        assert len(stack) == 1

On top of those three confusions I have, mypy also complains about some string literals as keys to a TypedDict (line 247), because it does not consider the concatenation of two literals to be a literal. See the code snippet below. Personally, I feel like this is an error in mypy--pyright does not have this complaint. If we would like mypy to be fully happy, I can either add a # type: ignore, or refactor the code to only use true literals. If we don't care that mypy complains, then no change is needed. (I see now that mypy is enforced in CI. Happy to do whatever the maintainers recommend to satisfy mypy.)

    def _log_phase_statistics(self, phase: Literal["reuse", "generate", "shrink"]) -> Generator[None, None, None]:
        ...
        finally:
            self.statistics[phase + "-phase"] = {  # The key here is the concatenation of two string literals.
                "duration-seconds": time.perf_counter() - start_time,
                "test-cases": list(self.stats_per_test_case),
                "distinct-failures": len(self.interesting_examples),
                "shrinks-successful": self.shrinks,
            }

Doubtless, I've missed or misunderstood some pieces. I am happy to revise and iterate!

@tybug
Copy link
Member

tybug commented May 1, 2024

d5b7136

ah, this one is my fault...I created the release file in the wrong location in #3957, and I think because we were in an interim period we had a correct file on master and CI didn't catch it. We should delete that file since that change has already been rolled into 6.100.2.

@Zac-HD
Copy link
Member

Zac-HD commented May 1, 2024

Wow! Thanks so much for contributing, this looks great 😀

  1. Can Overrun be a valid value for self.interesting_examples?

No - these are specifically ConjectureResult objects with .status=Status.INTERESTING, whereas Overrun is a special-case to represent =Status.OVERRUN with lower memory usage.

  1. Potentially uninitialized variable trial_data because of a caught PreviouslyUnseenBehavior exception

Ah, PreviouslyUnseenBehavior can only be raised by simulate_test_function(), so this is safe in practice. However it's also perfectly safe to move the trial_data = self.new_conjecture_data(...) assignment outside the try: block, so let's do that for clarity.

  1. What is the proper type for stack in debug_data?

Looks like Ls: TypeAlias = list["Ls | int"], in that it can be arbitrarily nested. I haven't dug into this but the test might be helpful. We should also have complete branch coverage from the tests/conjecture/ tests, so adding a raise Exception to that line and then running the tests should find some code that exercises it if that'd be helpful.

On top of those three confusions I have, mypy also complains about some string literals as keys to a TypedDict (line 247), because it does not consider the concatenation of two literals to be a literal. See the code snippet below. Personally, I feel like this is an error in mypy--pyright does not have this complaint. If we would like mypy to be fully happy, I can either add a # type: ignore, or refactor the code to only use true literals.

Either of these sound like a good solution to me.

@qthequartermasterman
Copy link
Contributor Author

qthequartermasterman commented May 3, 2024

Sorry for the delay in addressing comments. I've had a busy couple of days.


  1. Can Overrun be a valid value for self.interesting_examples?

No - these are specifically ConjectureResult objects with .status=Status.INTERESTING [...]

Thanks for this clarification. I've added a type: ignore on line 418, because putting a narrowing assert causes one of the unit tests to fail, for some reason. Evidently in test_backend_can_shrink_bytes, Overrun ends up being the result.

  1. Potentially uninitialized variable trial_data because of a caught PreviouslyUnseenBehavior exception

[...] it's also perfectly safe to move the trial_data = self.new_conjecture_data(...) assignment outside the try: block, so let's do that for clarity.

Thanks for this clarification. I've gone ahead and moved the trial_data out of the try block.

  1. What is the proper type for stack in debug_data?

Looks like Ls: TypeAlias = list["Ls | int"], in that it can be arbitrarily nested. [...]

Looks like stack: List[Ls] and node: Ls is the right combination to make the type checker happy, and it seems to match the runtime conditions. Great call out.

On top of those three confusions I have, mypy also complains about some string literals as keys to a TypedDict (line 247), because it does not consider the concatenation of two literals to be a literal. See the code snippet below. Personally, I feel like this is an error in mypy--pyright does not have this complaint. If we would like mypy to be fully happy, I can either add a # type: ignore, or refactor the code to only use true literals.

Either of these sound like a good solution to me.

I went with a type: ignore. Putting a bunch of if-else statements just to construct string literals felt a little too convoluted.


I believe I've addressed all comments at this point. Both mypy and pyright are happy. Happy to make any additional changes, if desired.

@qthequartermasterman
Copy link
Contributor Author

@Zac-HD I don't think I understand what is failing in CI at the moment. Do you have any hints on what needs to be fixed?

Comment on lines 217 to 219
self.interesting_examples: Dict[
Optional[InterestingOrigin], ConjectureResult
] = {}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe this is Dict[InterestingOrigin, ConjectureResult]. data.interesting_origin is optional, but we only add to this dict when it's nonnull.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typing this as Dict[InterestingOrigin, ConjectureResult], and then adding assert key is not None before writing to this dictionary causes these tests to fail at that assert:

FAILED hypothesis-python/tests/conjecture/test_alt_backend.py::test_backend_can_shrink_bytes - AssertionError
FAILED hypothesis-python/tests/conjecture/test_data_tree.py::test_stores_the_tree_flat_until_needed - AssertionError
FAILED hypothesis-python/tests/conjecture/test_data_tree.py::test_split_in_the_middle - AssertionError
FAILED hypothesis-python/tests/conjecture/test_data_tree.py::test_stores_forced_nodes - AssertionError
FAILED hypothesis-python/tests/conjecture/test_data_tree.py::test_correctly_relocates_forced_nodes - AssertionError
FAILED hypothesis-python/tests/conjecture/test_data_tree.py::test_is_not_flaky_on_positive_zero_and_negative_zero - AssertionError
FAILED hypothesis-python/tests/conjecture/test_data_tree.py::test_low_probabilities_are_still_explored - AssertionError
FAILED hypothesis-python/tests/conjecture/test_data_tree.py::test_can_generate_hard_values - AssertionError
FAILED hypothesis-python/tests/conjecture/test_data_tree.py::test_can_generate_hard_floats - AssertionError
FAILED hypothesis-python/tests/conjecture/test_engine.py::test_can_index_results - AssertionError
FAILED hypothesis-python/tests/conjecture/test_engine.py::test_non_cloneable_intervals - AssertionError
FAILED hypothesis-python/tests/conjecture/test_engine.py::test_deletable_draws - AssertionError

That's why I had typed it as Optional[InterestingOrigin]. If we're certain the type should be Dict[InterestingOrigin, ConjectureResult], then a type-narrowing assert will cause the tests to fail. We could get around this with a type: ignore on the lines that write to the dictionary, but I'm curious why so many tests fail.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah, we've been abusing mark_interesting in tests by not specifying an interesting origin. It's undeniably ergonomic but also means the type is more lenient than it needs to be in practice. I think we can leave a note here saying the type is InterestingOrigin everywhere but tests, where it is Optional[InterestingOrigin] for convenience, and leave cleaning this up for later.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good to me! I left a comment # At runtime, the keys are only ever type InterestingOrigin, but can be None during tests. above both this hint and self.shrunk_examples that you mentioned below.

Should an issue be opened up to track making that change to the tests?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I think it's worth it - we can define a simple InterestingOrigin and paste it into all the tests, and then tightening up these annotations will help us work on the engine 🙂

@Zac-HD
Copy link
Member

Zac-HD commented May 3, 2024

@Zac-HD I don't think I understand what is failing in CI at the moment. Do you have any hints on what needs to be fixed?

I think some change in this PR probably caused this test to start failing, although I don't know why:

@pytest.mark.skipif(sys.version_info[:2] == (3, 9), reason="stack depth varies???")
def test_warns_on_strategy_annotation():
with pytest.warns(HypothesisWarning, match="Return-type annotation") as w:
@st.composite
def my_integers(draw: st.DrawFn) -> st.SearchStrategy[int]:
return draw(st.integers())
assert len(w.list) == 1
assert w.list[0].filename == __file__ # check stacklevel points to user code

#3968 is also failing in other PRs, so I'm confident that's not anything in this PR.

@Zac-HD
Copy link
Member

Zac-HD commented May 3, 2024

Sorry for the delay in addressing comments. I've had a busy couple of days.

No worries at all, you're not the only one! But more importantly, volunteer projects like Hypothesis are a labor of love and it's really important to me that they don't start to feel like an obligation. If you want to work on it, we're delighted to have you helping out! But if that changes, or if you'd just like a maintainer to finish polishing one PR so you can start on another, let us know and move on with our blessings 🙂

@Zac-HD Zac-HD added the internals Stuff that only Hypothesis devs should ever see label May 3, 2024
Copy link
Member

@Zac-HD Zac-HD left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This all looks great to me - thanks @qthequartermasterman!

The remaining CI failures are all from test_warns_on_strategy_annotation() on Python 3.10 or 3.11 - and we already have a skipif-py39 on that due to the best-guess stack depth here. Probably we just need to add an additional case for the affected Python versions... great if you want to look into that, or I can try fixing it over the weekend.

Comment on lines 217 to 219
self.interesting_examples: Dict[
Optional[InterestingOrigin], ConjectureResult
] = {}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I think it's worth it - we can define a simple InterestingOrigin and paste it into all the tests, and then tightening up these annotations will help us work on the engine 🙂

@Zac-HD Zac-HD enabled auto-merge May 13, 2024 01:29
@Zac-HD Zac-HD merged commit affb5b8 into HypothesisWorks:master May 13, 2024
52 of 53 checks passed
@qthequartermasterman
Copy link
Contributor Author

Thanks for finishing this out @Zac-HD !

@Zac-HD
Copy link
Member

Zac-HD commented May 13, 2024

Happy to help! It was a remarkably annoying merge conflict 😅

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
internals Stuff that only Hypothesis devs should ever see
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants