OOM issue when finetuning unsloth/llama-3-8b-bnb-4bit on Colab with T4 with 18000 context length #465

rycfung · 2024-05-14T22:14:48Z

I'm using the unsloth colab notebook to finetune the unsloth/llama-3-8b-bnb-4bit model with data with a max context length of 18000. Whenever I kick off training, it always run out of memory. That doesn't seem to be the case with the yahma/alpaca example. Here's the error:

==((====))==  Unsloth - 2x faster free finetuning | Num GPUs = 1
   \\   /|    Num examples = 102 | Num Epochs = 5
O^O/ \_/ \    Batch size per device = 2 | Gradient Accumulation steps = 4
\        /    Total batch size = 8 | Total steps = 60
 "-____-"     Number of trainable parameters = 41,943,040
---------------------------------------------------------------------------
OutOfMemoryError                          Traceback (most recent call last)
[<ipython-input-7-3d62c575fcfd>](https://localhost:8080/#) in <cell line: 1>()
----> 1 trainer_stats = trainer.train()

13 frames
[/usr/local/lib/python3.10/dist-packages/accelerate/utils/operations.py](https://localhost:8080/#) in _convert_to_fp32(tensor)
    779 
    780     def _convert_to_fp32(tensor):
--> 781         return tensor.float()
    782 
    783     def _is_fp16_bf16_tensor(tensor):

OutOfMemoryError: CUDA out of memory. Tried to allocate 9.47 GiB. GPU 0 has a total capacity of 14.75 GiB of which 3.78 GiB is free. Process 2116 has 10.95 GiB memory in use. Of the allocated memory 10.79 GiB is allocated by PyTorch, and 23.53 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation.  See documentation for Memory Management  (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables)

Is the longer context length the reason for this to run run out of memory? What's the recommendation in this case to make this fine-tuning job possible

The text was updated successfully, but these errors were encountered:

danielhanchen · 2024-05-15T19:12:23Z

Yes too long contexts will cause OOMs.
According to our blog: https://unsloth.ai/blog/llama3, the max context length on Tesla T4s (16GB) is 10K ish

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

OOM issue when finetuning unsloth/llama-3-8b-bnb-4bit on Colab with T4 with 18000 context length #465

OOM issue when finetuning unsloth/llama-3-8b-bnb-4bit on Colab with T4 with 18000 context length #465

rycfung commented May 14, 2024 •

edited

danielhanchen commented May 15, 2024

OOM issue when finetuning unsloth/llama-3-8b-bnb-4bit on Colab with T4 with 18000 context length #465

OOM issue when finetuning unsloth/llama-3-8b-bnb-4bit on Colab with T4 with 18000 context length #465

Comments

rycfung commented May 14, 2024 • edited

danielhanchen commented May 15, 2024

rycfung commented May 14, 2024 •

edited