Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[REQUEST] graceful fallback when gpu_split is misspecified #711

Open
3 tasks done
jwlockhart opened this issue Jan 4, 2025 · 2 comments
Open
3 tasks done

[REQUEST] graceful fallback when gpu_split is misspecified #711

jwlockhart opened this issue Jan 4, 2025 · 2 comments

Comments

@jwlockhart
Copy link

Problem

When len(gpu_split) > torch.cuda.device_count(), exllama crashes with a cryptic stack trace.

Solution

It would be nice for exllama to fall back to gpu_split = None with a warning to the user. I thought about just writing up a pr, but I wasn't sure what refactoring choices the project would prefer. The fix probably belongs in model_init.py, but that would require adding torch imports. Alternatively, it could go in model.py, but then it would be repeated across functions, and model.py doesn't know whether to be quiet, etc.

Alternatives

Alternatively, one could just complain to the user with a clear error message and exit.

Explanation

This is user error and mostly affects users who work on multiple systems with different GPU setups, so maybe low priority. Still, it seems like a small change to handle the error more gracefully.

Examples

No response

Additional context

No response

Acknowledgements

  • I have looked for similar requests before submitting this one.
  • I understand that the developers have lives and my issue will be answered when possible.
  • I understand the developers of this program are human, and I will make my requests politely.
@turboderp
Copy link
Member

turboderp commented Jan 9, 2025

Importing torch in model_init.py isn't an issue. I added the check in the dev branch. Not sure what the rationale would be for falling back to no split (i.e. single GPU) when the requested split is too long?

@jwlockhart
Copy link
Author

Ah, my specific case I specified a list of 4 GPUs on a machine with only 2. Seems like a fallback to autosplit is most graceful. Not sure what I was thinking when I said None.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants