Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add batch processing to MMLU and Humaneval evaluation scripts to prevent OOM errors #597

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

LlamaEnjoyer
Copy link
Contributor

This is a workaround for the OOM errors I get when running MMLU and humaneval tests on my WIndows PC. It works! :)

@turboderp
Copy link
Member

Just out of interest, have you tried just doing a gc.collect() periodically while it's creating jobs?

@LlamaEnjoyer
Copy link
Contributor Author

Yes, unfortunately it didn't seem to affect anything.

@LlamaEnjoyer
Copy link
Contributor Author

Basically my commited memory in Windows fills up to circa 180GBs (which is give or take 3x my system RAM) and then it OOMs.

@jim-plus
Copy link

I tried this variant out. Limiting the batch to 64 allowed me to run the test with 16GB VRAM under Windows, while 128 resulted in OOM. I didn't try to optimize the batch size. Used torch 2.4.1 and python 3.11 along with cuda 12.4.

@turboderp
Copy link
Member

@jim-plus I'm curious if the OoM you're getting is due to VRAM or system RAM. The issue this PR means to address is a system memory leak of some kind in PyTorch or HF Tokenizers (maybe SentencePiece?), which is why I'm a little hesitant to merge it. While enqueued each sequence would use a couple of kB of system RAM at most, and zero VRAM until they're moved to the active list. So it's bizarre that a few thousand jobs can overcommit 3x the available system RAM in this way, or if limiting the length of the queue affects VRAM allocation somehow.

It's definitely unintended behavior, and if it is a bug in ExLlama I'd rather fix it than work around it. Or if it's a memory leak in the tokenizer, perhaps tokenization could be batched in an isolated context.

@LlamaEnjoyer
Copy link
Contributor Author

@turboderp Agreed, it's always better to fix the root cause than workaround it. Still I just wanted to have it out in the open, in case someone might find it useful.

If there's anything else you'd like for me to try to help debug this I'm all ears :)

@jim-plus
Copy link

Something curious is happening. Running the baseline script with default batch 128 will OOM when preparing questions, but that doesn't happen when I select batch size 128 for the updated script above. There seems to be a memory spike when initially preparing questions which levels off.

@LlamaEnjoyer
Copy link
Contributor Author

LlamaEnjoyer commented Sep 25, 2024

That's correct. The changes I introduced all the script to process the 164 questions in batches (with a default batch size value of 50, if not altered via cmd line argument)) instead of all 164 as done in the original version. This helps to avoid the OOM which happens when the commited memory reaches the Window's limit of circa 3x the system RAM size in GB.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants