oaieval hangs a lot #1292

shamas- · 2023-07-05T21:20:27Z

Describe the bug

oaieval hangs near the end, before reporting, a lot.

To Reproduce

✗ EVALS_THREADS=12 EVALS_THREAD_TIMEOUT=10 oaieval gpt-3.5-turbo myeval
[2023-06-28 18:09:56,280] [registry.py:266] Loading registry from /Users/username/development/evals/evals/registry/evals
[2023-06-28 18:09:56,615] [registry.py:266] Loading registry from /Users/username/.evals/evals
[2023-06-28 18:09:56,617] [oaieval.py:138] Run started: runid
[2023-06-28 18:09:56,618] [data.py:83] Fetching myeval/samples.jsonl
[2023-06-28 18:09:56,619] [eval.py:33] Evaluating 69 samples
[2023-06-28 18:09:56,627] [eval.py:139] Running in threaded mode with 10 threads!
 99%|█████████████████████████████████████████████████████████████████████████  | 68/69 [00:20<00:00,  7.93it/s]

style of call hangs at this point for many minutes, even though EVALS_THREAD_TIMEOUT is set to 10 seconds. This is devastating to eval turnaround time.

Code snippets

No response

OS

macOS Ventura (13.4)

Python version

3.11.3

Library version

1.0.3

The text was updated successfully, but these errors were encountered:

yayachenyi · 2023-07-07T09:26:34Z

Indeed, I have encountered a similar issue. It appears that the output might take an extended period to return. In my specific experience, the model only completed its return after generating all 4097 tokens. To address this problem, I found success in augmenting the EVALS_THREAD_TIMEOUT to 500, as the previous setting of 100 was insufficient for my requirements.

As has been brought up before (#1384, #1292, #270), evals suffer from a hanging issue, where an evaluation run will hang for a very long time (if not indefinitely) at the end of a run (say, on the 99th sample of out 100). This PR addresses this issue, by replacing a seemingly redundant single-threaded thread creation that was happening when making requests, nested inside the already multi-threaded eval loop. My impression is that this nested multithreading was causing overhead that resulted in the hanging experienced. I had also noticed this hanging issue in `EVALS_SEQUENTIAL=1` mode (where it no longer occurs at the end, but instead randomly in the middle of the run). I was able to identify the source of this issue though debugging print statements that ultimately pointed to the `request_with_timeout` function as the culprit. We have tested the new `request_with_timeout` code on a fork where we have run multiple new and pre-existing evals, including with 3rd party solvers, and found no change in behaviour or errors, and a clear improvement on the hanging issue.

shamas- added the bug Something isn't working label Jul 5, 2023

thesofakillers mentioned this issue Mar 14, 2024

Address sporadic hanging of evals on certain samples #1482

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

oaieval hangs a lot #1292

oaieval hangs a lot #1292

shamas- commented Jul 5, 2023

yayachenyi commented Jul 7, 2023

oaieval hangs a lot #1292

oaieval hangs a lot #1292

Comments

shamas- commented Jul 5, 2023

Describe the bug

To Reproduce

Code snippets

OS

Python version

Library version

yayachenyi commented Jul 7, 2023