Skip to content

Latest commit

 

History

History
127 lines (84 loc) · 6.51 KB

README.md

File metadata and controls

127 lines (84 loc) · 6.51 KB

Shell Script Python 3 PyTorch Lightning Docker Comet

sr-pytorch-lightning

Introduction

Super resolution algorithms implemented with Pytorch Lightning. Based on code by So Uchida.

Currently supports the following models:

Requirements

  • docker
  • make
    • install support for Makefile on Ubuntu-based distros using sudo apt install build-essential

Usage

I decided to split the logic of dealing with docker (contained in Makefile) from running the python code itself (contained in start_here.sh). Since I run my code in a remote machine, I use gnu screen to keep the code running even if my connection fails.

In Makefile there is a environment variables section, where a few variables might be set. More specifically, DATASETS_PATH must point to the root folder of your super resolution datasets.

In start_here.sh a few variables might be set in the variables region. Default values have been set to allow easy experimentation.

Creating docker image

make

If you want to use the specific versions I used during my last experiments, check the pytorch_1.11 branch. To build the docker image using the specific versions that I used, simply run:

make DOCKERFILE=Dockerfile_fixed_versions

Testing docker image

make test

This should print information about all available GPUs, like this:

Found 2 devices:
        _CudaDeviceProperties(name='NVIDIA Quadro RTX 8000', major=7, minor=5, total_memory=48601MB, multi_processor_count=72)
        _CudaDeviceProperties(name='NVIDIA Quadro RTX 8000', major=7, minor=5, total_memory=48601MB, multi_processor_count=72)

Training model

If you haven't configured the telegram bot to notify when running is over, or don't want to use it, simply remove the line

$(TELEGRAM_BOT_MOUNT_STRING) \

from the make run command on the Makefile, and also comment the line

send_telegram_msg=1

in start_here.sh.

Then, to train the models, simply call

make run

By default, it will run the file start_here.sh.

If you want to run another command inside the docker container, just change the default value for the RUN_STRING variable.

make RUN_STRING="ipython3" run

Creating your own model

To create your own model, create a new file inside models/ and create a class that inherits from SRModel. Your class should implement the forward method. Then, add your model to __init__.py. The model will be automatically available as a model parameter option in train.py or test.py.

Some good starting points to create your own model are the SRCNN and EDSR models.

Using Comet

If you want to use Comet to log your experiments data, just create a file named .comet.config in the root folder here, and add the following lines:

[comet]
api_key=YOUR_API_KEY

More configuration variables can be found here.

Most of the things that I found useful to log (metrics, codes, log, image results) are already being logged. Check train.py and srmodel.py for more details. All these loggings are done by the comet logger already available from pytorch lightning. An example of these experiments logged in Comet can be found here.

Finished experiment Telegram notification

Since the experiments can run for a while, I decided to use a telegram bot to notify me when experiments are done (or when there is an error). For this, I use the telegram-send python package. I recommend you to install it in your machine and configure it properly.

To do this, simply use:

pip3 install telegram-send
telegram-send --configure

Then, simply copy the configuration file created under ~/.config/telegram-send.conf to another directory to make it easier to mount on the docker image. This can be configured in the source part of the TELEGRAM_BOT_MOUNT_STRING variable (by default is set to $(HOME)/Docker/telegram_bot_config) in the Makefile.