-
-
Notifications
You must be signed in to change notification settings - Fork 291
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add batch processing to MMLU and Humaneval evaluation scripts to prevent OOM errors #597
base: master
Are you sure you want to change the base?
Conversation
Just out of interest, have you tried just doing a |
Yes, unfortunately it didn't seem to affect anything. |
Basically my commited memory in Windows fills up to circa 180GBs (which is give or take 3x my system RAM) and then it OOMs. |
I tried this variant out. Limiting the batch to 64 allowed me to run the test with 16GB VRAM under Windows, while 128 resulted in OOM. I didn't try to optimize the batch size. Used torch 2.4.1 and python 3.11 along with cuda 12.4. |
@jim-plus I'm curious if the OoM you're getting is due to VRAM or system RAM. The issue this PR means to address is a system memory leak of some kind in PyTorch or HF Tokenizers (maybe SentencePiece?), which is why I'm a little hesitant to merge it. While enqueued each sequence would use a couple of kB of system RAM at most, and zero VRAM until they're moved to the active list. So it's bizarre that a few thousand jobs can overcommit 3x the available system RAM in this way, or if limiting the length of the queue affects VRAM allocation somehow. It's definitely unintended behavior, and if it is a bug in ExLlama I'd rather fix it than work around it. Or if it's a memory leak in the tokenizer, perhaps tokenization could be batched in an isolated context. |
@turboderp Agreed, it's always better to fix the root cause than workaround it. Still I just wanted to have it out in the open, in case someone might find it useful. If there's anything else you'd like for me to try to help debug this I'm all ears :) |
Something curious is happening. Running the baseline script with default batch 128 will OOM when preparing questions, but that doesn't happen when I select batch size 128 for the updated script above. There seems to be a memory spike when initially preparing questions which levels off. |
That's correct. The changes I introduced all the script to process the 164 questions in batches (with a default batch size value of 50, if not altered via cmd line argument)) instead of all 164 as done in the original version. This helps to avoid the OOM which happens when the commited memory reaches the Window's limit of circa 3x the system RAM size in GB. |
This is a workaround for the OOM errors I get when running MMLU and humaneval tests on my WIndows PC. It works! :)