Llava-7b model Conversion to ONNX and Latency Optimization - OOM error (even after setting paging file size) #1144

Harini-Vemula-2382 · 2024-05-09T10:32:18Z

Describe the bug
When attempting to run the optimization process using the llm.py script, encountering an error not enough memory even after setting paging file size file to maximum.

To Reproduce
Steps to reproduce the behavior.

Expected behavior
Please update or give me resolution to solve this issue.

Olive config
Add Olive configurations here.

Olive logs
Add logs here.

Other information

OS: [e.g. Windows, Linux]
Olive version: [e.g. 0.4.0 or main]
ONNXRuntime package and version: [e.g. onnxruntime-gpu: 1.16.1]

Additional context
Please help to convert and execute LLava on DirectML.

jambayk · 2024-05-09T17:54:53Z

@PatriceVignola could you look at this? Thanks!

PatriceVignola · 2024-05-09T19:39:13Z

Hi @Harini-Vemula-2382,

This error comes from DirectML itself, which indicates that the GPU doesn't have enough VRAM to load the model.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Llava-7b model Conversion to ONNX and Latency Optimization - OOM error (even after setting paging file size) #1144

Llava-7b model Conversion to ONNX and Latency Optimization - OOM error (even after setting paging file size) #1144

Harini-Vemula-2382 commented May 9, 2024

jambayk commented May 9, 2024

PatriceVignola commented May 9, 2024

Llava-7b model Conversion to ONNX and Latency Optimization - OOM error (even after setting paging file size) #1144

Llava-7b model Conversion to ONNX and Latency Optimization - OOM error (even after setting paging file size) #1144

Comments

Harini-Vemula-2382 commented May 9, 2024

jambayk commented May 9, 2024

PatriceVignola commented May 9, 2024