vLLM can do this. #114

Closed

entrepeneur4lyf started this conversation in Results

entrepeneur4lyf
Aug 3, 2024

You may want to look at vLLM for inspiration since it supports distributed GPU inference.

Replies: 1 comment

b4rtaz
Aug 9, 2024
Maintainer

Thanks for sharing this!

0 replies

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment