Adding support for long lists #3967

TobiasEnergyMachines · 2024-05-03T08:08:29Z

Working with hypothesis version 6.100.2 (and hypothesis-jsonschema) to test some simulation code using pydantic models as inputs. During this I have run into a limitation on the length of a list that is allowed in hypothesis .

The usage case is a timeseries of hourly data for one year - a list of floats with length=8760.

The error raised comes from the class ListStrategy in "hypothesis\strategies_internal\collections.py" on line 152:

if min_size > BUFFER_SIZE:
    raise InvalidArgument(
        f"{self!r} can never generate an example, because min_size is larger "
        "than Hypothesis supports.  Including it is at best slowing down your "
        "tests for no benefit; at worst making them fail (maybe flakily) with "
        "a HealthCheck error."
    )

where the BUFFER_SIZE is hardcoded to BUFFER_SIZE = 8 * 1024

Any ideas of how to work with long timeseries data in hypothesis ?

The text was updated successfully, but these errors were encountered:

Zac-HD · 2024-05-04T07:14:52Z

If the smallest possible input to your test function is a 8,760-element long list, it seems like this might make it hard to debug! My top suggestions are to

Accept much smaller inputs to your test - you typically get some performance improvements, and in some circumstances this also makes it easier to find bugs.
(Researchers sometimes refer to the "small scope hypothesis", which says that most bugs can be reproduced by a very small input. As usual, the truth is a bit more complicated; smaller and large both have situational benefits.)
Use "sparse inputs": pick a default value or linear trend for the timeseries, and then replace that value at chosen indices to complicate the generated data. This is basically how our Numpy and Pandas support works, but it's easy to implement yourself with the st.dictionaries() strategy with valid-indices as keys.

The reason we have that limit in the first place is for performance reasons; without some cap it'd be easy to waste a lot of time and run out of memory in various parts of our internals.

Zac-HD added the question not sure it's a bug? questions welcome label May 4, 2024

TobiasEnergyMachines closed this as completed May 16, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adding support for long lists #3967

Adding support for long lists #3967

TobiasEnergyMachines commented May 3, 2024 •

edited

Zac-HD commented May 4, 2024 •

edited

Adding support for long lists #3967

Adding support for long lists #3967

Comments

TobiasEnergyMachines commented May 3, 2024 • edited

Zac-HD commented May 4, 2024 • edited

TobiasEnergyMachines commented May 3, 2024 •

edited

Zac-HD commented May 4, 2024 •

edited