You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When len(gpu_split) > torch.cuda.device_count(), exllama crashes with a cryptic stack trace.
Solution
It would be nice for exllama to fall back to gpu_split = None with a warning to the user. I thought about just writing up a pr, but I wasn't sure what refactoring choices the project would prefer. The fix probably belongs in model_init.py, but that would require adding torch imports. Alternatively, it could go in model.py, but then it would be repeated across functions, and model.py doesn't know whether to be quiet, etc.
Alternatives
Alternatively, one could just complain to the user with a clear error message and exit.
Explanation
This is user error and mostly affects users who work on multiple systems with different GPU setups, so maybe low priority. Still, it seems like a small change to handle the error more gracefully.
Examples
No response
Additional context
No response
Acknowledgements
I have looked for similar requests before submitting this one.
I understand that the developers have lives and my issue will be answered when possible.
I understand the developers of this program are human, and I will make my requests politely.
The text was updated successfully, but these errors were encountered:
Importing torch in model_init.py isn't an issue. I added the check in the dev branch. Not sure what the rationale would be for falling back to no split (i.e. single GPU) when the requested split is too long?
Ah, my specific case I specified a list of 4 GPUs on a machine with only 2. Seems like a fallback to autosplit is most graceful. Not sure what I was thinking when I said None.
Problem
When
len(gpu_split) > torch.cuda.device_count()
, exllama crashes with a cryptic stack trace.Solution
It would be nice for exllama to fall back to
gpu_split = None
with a warning to the user. I thought about just writing up a pr, but I wasn't sure what refactoring choices the project would prefer. The fix probably belongs inmodel_init.py
, but that would require adding torch imports. Alternatively, it could go inmodel.py
, but then it would be repeated across functions, andmodel.py
doesn't know whether to be quiet, etc.Alternatives
Alternatively, one could just complain to the user with a clear error message and exit.
Explanation
This is user error and mostly affects users who work on multiple systems with different GPU setups, so maybe low priority. Still, it seems like a small change to handle the error more gracefully.
Examples
No response
Additional context
No response
Acknowledgements
The text was updated successfully, but these errors were encountered: