-
Notifications
You must be signed in to change notification settings - Fork 30
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Multi-machine support #21
Comments
@fredmontet Thank you very much for pointing it out! After getting the same request from another institute, we recently added Let me know if it doesn't work for you or you need something more specific - always happy to help :) |
Hi @fredmontet! We are improving them overtime and getting feedback would be super helpful in pointing the things that we should focus on. Let me know if you have some availability for it. |
Hi. great work. Is there a way to handle servers with different kinds of GPUs? We can specify what kind of gpu to use. Also, how do I handle files (e.g. data) in different servers? |
Hi @yix081, Right now you cann't specify a gpu type as part of your environment configuration. What do you mean when you say "handeling data on different servers"? |
We have an nfs server storing data or experiments log, and each server also stores some files locally. If it is managed by genv, how does it specify the storage resource? |
Genv doesn't manage storage resources. It focus on GPU management. |
Does Run:AI have such solution? |
hi @yix081! meaning, in Genv you would need your system administrator to mount a shared NFS storage to all the remote hosts. Run.ai also does not manage NFS, which should be done by the system administrator, but let's you mount it into Pods that are running in your cluster. does that answer your question? or maybe I'm missing something here and would very much be happy to understand your use case better. btw, NFS is only one option, you can also use other storage solutions like S3, etc. |
That makes sense. Thanks. |
As part of an AI focused institute,
genv
is a great tool to handle single machines. However, we have a parc composed of many machines. Are there any plans on your side to support/develop features going in the direction of handling the GPU availability in a cluster or simply multiple machines?The text was updated successfully, but these errors were encountered: