Skip to content
This repository has been archived by the owner on Feb 14, 2024. It is now read-only.

Commit

Permalink
Improved README
Browse files Browse the repository at this point in the history
* Installation
* Running
* Troubleshooting
  • Loading branch information
razrotenberg committed Feb 23, 2022
1 parent c38a1bd commit 7e9e353
Showing 1 changed file with 44 additions and 0 deletions.
44 changes: 44 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,50 @@
# rntop
A top-like tool for monitoring GPUs across a cluster.

## Installation
Running `runtop` is possible with [Docker](https://docs.docker.com/get-docker/).

In the future there will be native installations for Linux distributions.

## Running
#### Setup
`rntop` uses SSH connections for monitoring the remote GPU machines.
Therefore, it is necessary that you will have SSH access to all machines you want to monitor.

Connecting with password is not supported at the moment so [set up](https://superuser.com/a/8110) your SSH configuration to work with SSH keys if needed.

> You can verify the SSH connection to a GPU machine by running `ssh user@server nvidia-smi`
#### Execution
Run `rntop` using the following command and place the machine hostnames or IPs instead of the `...` placeholder:
```
docker run -it --rm -v $HOME/.ssh:/root/.ssh runai/rntop ...
```

#### Passing a username
It is possible to specify a username for the connections.

If you are using the same username for all machines pass it as the argument `--username`.
If you are using different usernames for different machines you can pass them as part of the hostname (e.g. `john@server`).

Note that we mount the SSH directory from the host to the container so that it would be able to use the SSH configuration file and keys to establish the SSH connections.

#### Examples
Here are some examples of commands (`...` is used for simplification):
1. `docker run ... runai/rntop [email protected]`
2. `docker run ... runai/rntop --username john 192.168.1.60 192.168.1.61 [email protected]`

> Pass `--help` to see all the available arguments.
## Troubleshooting
### SSH
`rntop` uses [libssh](https://www.libssh.org/) for the SSH connections by default.
If you are encountering SSH connection problems try using the native `ssh` agent by passing the argument `--ssh` to the `rntop` command.

### Bugs
Please open a [GitHub issue](https://github.com/run-ai/rntop/issues) in case you encounter a bug.
To help us in fixing the issue please describe the scenario well and provide any needed information.

## Development
### Setup
#### Build a Development Docker Image
Expand Down

0 comments on commit 7e9e353

Please sign in to comment.