This repository has been archived by the owner on Feb 14, 2024. It is now read-only.
-
Notifications
You must be signed in to change notification settings - Fork 8
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
* Installation * Running * Troubleshooting
- Loading branch information
1 parent
c38a1bd
commit 7e9e353
Showing
1 changed file
with
44 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,6 +1,50 @@ | ||
# rntop | ||
A top-like tool for monitoring GPUs across a cluster. | ||
|
||
## Installation | ||
Running `runtop` is possible with [Docker](https://docs.docker.com/get-docker/). | ||
|
||
In the future there will be native installations for Linux distributions. | ||
|
||
## Running | ||
#### Setup | ||
`rntop` uses SSH connections for monitoring the remote GPU machines. | ||
Therefore, it is necessary that you will have SSH access to all machines you want to monitor. | ||
|
||
Connecting with password is not supported at the moment so [set up](https://superuser.com/a/8110) your SSH configuration to work with SSH keys if needed. | ||
|
||
> You can verify the SSH connection to a GPU machine by running `ssh user@server nvidia-smi` | ||
#### Execution | ||
Run `rntop` using the following command and place the machine hostnames or IPs instead of the `...` placeholder: | ||
``` | ||
docker run -it --rm -v $HOME/.ssh:/root/.ssh runai/rntop ... | ||
``` | ||
|
||
#### Passing a username | ||
It is possible to specify a username for the connections. | ||
|
||
If you are using the same username for all machines pass it as the argument `--username`. | ||
If you are using different usernames for different machines you can pass them as part of the hostname (e.g. `john@server`). | ||
|
||
Note that we mount the SSH directory from the host to the container so that it would be able to use the SSH configuration file and keys to establish the SSH connections. | ||
|
||
#### Examples | ||
Here are some examples of commands (`...` is used for simplification): | ||
1. `docker run ... runai/rntop [email protected]` | ||
2. `docker run ... runai/rntop --username john 192.168.1.60 192.168.1.61 [email protected]` | ||
|
||
> Pass `--help` to see all the available arguments. | ||
## Troubleshooting | ||
### SSH | ||
`rntop` uses [libssh](https://www.libssh.org/) for the SSH connections by default. | ||
If you are encountering SSH connection problems try using the native `ssh` agent by passing the argument `--ssh` to the `rntop` command. | ||
|
||
### Bugs | ||
Please open a [GitHub issue](https://github.com/run-ai/rntop/issues) in case you encounter a bug. | ||
To help us in fixing the issue please describe the scenario well and provide any needed information. | ||
|
||
## Development | ||
### Setup | ||
#### Build a Development Docker Image | ||
|