Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

When batches expire, are we dealing with it correctly? #226

Open
RyanMarten opened this issue Dec 7, 2024 · 7 comments
Open

When batches expire, are we dealing with it correctly? #226

RyanMarten opened this issue Dec 7, 2024 · 7 comments

Comments

@RyanMarten
Copy link
Contributor

RyanMarten commented Dec 7, 2024

Should at least use the responses that have been completed.

Ofc the best thing would be to resubmit only requests that weren't completed.

Instead of resubmitting the whole batch that expired.

But the issue with this is now there isn't a 1 to 1 connection between requests --> batch --> responses. Keeping it simple we can just submit a smaller batch which is the remaining amount for that original batch.

@RyanMarten
Copy link
Contributor Author

@RyanMarten
Copy link
Contributor Author

@RyanMarten
Copy link
Contributor Author

RyanMarten commented Dec 7, 2024

(1) The first thing we should do just allow for brute-force retry (aka run the same program again, Curator treats it as a completely failed batch and resubmits).

(2) Then we should add the more fine-grained solution

@RyanMarten
Copy link
Contributor Author

RyanMarten commented Dec 7, 2024

On (1) add an if statement when we check which batches are already submitted and don't mark as submitted if failed or expired or cancelled

For the quickest fix, can just add to _submitted.jsonl.

But maybe should write to _failed.jsonl and _expired.jsonl and should rewrite _submitted only with those not failed or expired??

can write all these files before each sleep

https://github.com/bespokelabsai/curator/blob/dev/src/bespokelabs/curator/request_processor/openai_batch_request_processor.py#L721-L755

@RyanMarten
Copy link
Contributor Author

We also want to resubmit requests that don't have a valid response format #86

(although very expensive to do if we resubmit the whole batch multiple times, so just submit failed requests)

@RyanMarten
Copy link
Contributor Author

Responses in the file will look like this

https://platform.openai.com/docs/guides/batch/getting-started?lang=node#batch-expiration

{"id": "batch_req_123", "custom_id": "request-3", "response": null, "error": {"code": "batch_expired", "message": "This request could not be executed before the completion window expired."}}
{"id": "batch_req_123", "custom_id": "request-7", "response": null, "error": {"code": "batch_expired", "message": "This request could not be executed before the completion window expired."}}

@RyanMarten
Copy link
Contributor Author

When we get failed requests (or requests with finish_reason of length or content filter, resubmit these in batch).
for the finish_reason we can use the litellm map. Started PR for improving that function.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant