Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding support for long lists #3967

Closed
TobiasEnergyMachines opened this issue May 3, 2024 · 1 comment
Closed

Adding support for long lists #3967

TobiasEnergyMachines opened this issue May 3, 2024 · 1 comment
Labels
question not sure it's a bug? questions welcome

Comments

@TobiasEnergyMachines
Copy link

TobiasEnergyMachines commented May 3, 2024

Working with hypothesis version 6.100.2 (and hypothesis-jsonschema) to test some simulation code using pydantic models as inputs. During this I have run into a limitation on the length of a list that is allowed in hypothesis .

The usage case is a timeseries of hourly data for one year - a list of floats with length=8760.

The error raised comes from the class ListStrategy in "hypothesis\strategies_internal\collections.py" on line 152:

if min_size > BUFFER_SIZE:
    raise InvalidArgument(
        f"{self!r} can never generate an example, because min_size is larger "
        "than Hypothesis supports.  Including it is at best slowing down your "
        "tests for no benefit; at worst making them fail (maybe flakily) with "
        "a HealthCheck error."
    )

where the BUFFER_SIZE is hardcoded to BUFFER_SIZE = 8 * 1024

Any ideas of how to work with long timeseries data in hypothesis ?

@Zac-HD
Copy link
Member

Zac-HD commented May 4, 2024

If the smallest possible input to your test function is a 8,760-element long list, it seems like this might make it hard to debug! My top suggestions are to

  1. Accept much smaller inputs to your test - you typically get some performance improvements, and in some circumstances this also makes it easier to find bugs.
    (Researchers sometimes refer to the "small scope hypothesis", which says that most bugs can be reproduced by a very small input. As usual, the truth is a bit more complicated; smaller and large both have situational benefits.)
  2. Use "sparse inputs": pick a default value or linear trend for the timeseries, and then replace that value at chosen indices to complicate the generated data. This is basically how our Numpy and Pandas support works, but it's easy to implement yourself with the st.dictionaries() strategy with valid-indices as keys.

The reason we have that limit in the first place is for performance reasons; without some cap it'd be easy to waste a lot of time and run out of memory in various parts of our internals.

@Zac-HD Zac-HD added the question not sure it's a bug? questions welcome label May 4, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question not sure it's a bug? questions welcome
Projects
None yet
Development

No branches or pull requests

2 participants