Multi-machine support #21

fredmontet · 2023-01-12T14:26:01Z

As part of an AI focused institute, genv is a great tool to handle single machines. However, we have a parc composed of many machines. Are there any plans on your side to support/develop features going in the direction of handling the GPU availability in a cluster or simply multiple machines?

The text was updated successfully, but these errors were encountered:

EkinKarabulut · 2023-01-12T15:28:35Z

@fredmontet Thank you very much for pointing it out!

After getting the same request from another institute, we recently added genv remote as a feature, which is for handling multiple machines with GPUs. You can check it out here in the documentation.

Let me know if it doesn't work for you or you need something more specific - always happy to help :)

razrotenberg · 2023-02-12T06:56:55Z

Hi @fredmontet!
Have you had the chance to check out remote features in Genv?

We are improving them overtime and getting feedback would be super helpful in pointing the things that we should focus on.

Let me know if you have some availability for it.
Thanks in advance!

yix081 · 2023-03-27T05:43:20Z

Hi. great work.

Is there a way to handle servers with different kinds of GPUs? We can specify what kind of gpu to use.

Also, how do I handle files (e.g. data) in different servers?

davidLif · 2023-03-28T06:25:55Z

Hi @yix081,

Right now you cann't specify a gpu type as part of your environment configuration.
However, You can specify memory requirments for your environment. Gpus without enoght memory won't be reserved to the environment when you activate it.

What do you mean when you say "handeling data on different servers"?

yix081 · 2023-03-28T17:06:17Z

We have an nfs server storing data or experiments log, and each server also stores some files locally. If it is managed by genv, how does it specify the storage resource?

davidLif · 2023-03-28T22:21:21Z

Genv doesn't manage storage resources. It focus on GPU management.

yix081 · 2023-03-28T22:54:05Z

Does Run:AI have such solution?

razrotenberg · 2023-03-29T06:31:32Z

hi @yix081!
so neither Genv or Run.ai manage NFS, they both let you use it however.

meaning, in Genv you would need your system administrator to mount a shared NFS storage to all the remote hosts.
after that, using Genv remote features would be an option for you as you will have access to this NFS storage after activating an environment on a remote host (i.e. genv remote activate.

Run.ai also does not manage NFS, which should be done by the system administrator, but let's you mount it into Pods that are running in your cluster.

does that answer your question?
can you make the NFS accessible to all remote hosts? (better to have them mounted at the same path as well)
if so, does that make Genv or Run.ai useful for you?

or maybe I'm missing something here and would very much be happy to understand your use case better.

btw, NFS is only one option, you can also use other storage solutions like S3, etc.

yix081 · 2023-03-29T14:28:47Z

That makes sense. Thanks.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Multi-machine support #21

Multi-machine support #21

fredmontet commented Jan 12, 2023

EkinKarabulut commented Jan 12, 2023

razrotenberg commented Feb 12, 2023

yix081 commented Mar 27, 2023

davidLif commented Mar 28, 2023

yix081 commented Mar 28, 2023

davidLif commented Mar 28, 2023

yix081 commented Mar 28, 2023

razrotenberg commented Mar 29, 2023

yix081 commented Mar 29, 2023

Multi-machine support #21

Multi-machine support #21

Comments

fredmontet commented Jan 12, 2023

EkinKarabulut commented Jan 12, 2023

razrotenberg commented Feb 12, 2023

yix081 commented Mar 27, 2023

davidLif commented Mar 28, 2023

yix081 commented Mar 28, 2023

davidLif commented Mar 28, 2023

yix081 commented Mar 28, 2023

razrotenberg commented Mar 29, 2023

yix081 commented Mar 29, 2023