Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

oaieval hangs a lot #1292

Open
shamas- opened this issue Jul 5, 2023 · 1 comment
Open

oaieval hangs a lot #1292

shamas- opened this issue Jul 5, 2023 · 1 comment
Labels
bug Something isn't working

Comments

@shamas-
Copy link

shamas- commented Jul 5, 2023

Describe the bug

oaieval hangs near the end, before reporting, a lot.

To Reproduce

✗ EVALS_THREADS=12 EVALS_THREAD_TIMEOUT=10 oaieval gpt-3.5-turbo myeval
[2023-06-28 18:09:56,280] [registry.py:266] Loading registry from /Users/username/development/evals/evals/registry/evals
[2023-06-28 18:09:56,615] [registry.py:266] Loading registry from /Users/username/.evals/evals
[2023-06-28 18:09:56,617] [oaieval.py:138] Run started: runid
[2023-06-28 18:09:56,618] [data.py:83] Fetching myeval/samples.jsonl
[2023-06-28 18:09:56,619] [eval.py:33] Evaluating 69 samples
[2023-06-28 18:09:56,627] [eval.py:139] Running in threaded mode with 10 threads!
 99%|█████████████████████████████████████████████████████████████████████████  | 68/69 [00:20<00:00,  7.93it/s]

style of call hangs at this point for many minutes, even though EVALS_THREAD_TIMEOUT is set to 10 seconds. This is devastating to eval turnaround time.

Code snippets

No response

OS

macOS Ventura (13.4)

Python version

3.11.3

Library version

1.0.3

@shamas- shamas- added the bug Something isn't working label Jul 5, 2023
@yayachenyi
Copy link
Contributor

Indeed, I have encountered a similar issue. It appears that the output might take an extended period to return. In my specific experience, the model only completed its return after generating all 4097 tokens. To address this problem, I found success in augmenting the EVALS_THREAD_TIMEOUT to 500, as the previous setting of 100 was insufficient for my requirements.

etr2460 pushed a commit that referenced this issue Mar 25, 2024
As has been brought up before (#1384, #1292,
#270), evals suffer from a hanging
issue, where an evaluation run will hang for a very long time (if not
indefinitely) at the end of a run (say, on the 99th sample of out 100).

This PR addresses this issue, by replacing a seemingly redundant
single-threaded thread creation that was happening when making requests,
nested inside the already multi-threaded eval loop. My impression is
that this nested multithreading was causing overhead that resulted in
the hanging experienced.

I had also noticed this hanging issue in `EVALS_SEQUENTIAL=1` mode
(where it no longer occurs at the end, but instead randomly in the
middle of the run).

I was able to identify the source of this issue though debugging print
statements that ultimately pointed to the `request_with_timeout`
function as the culprit.

We have tested the new `request_with_timeout` code on a fork where we
have run multiple new and pre-existing evals, including with 3rd party
solvers, and found no change in behaviour or errors, and a clear
improvement on the hanging issue.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants