This code is based on the thesis I did for the completion of my studies in the Big Data Management and Analytics (BDMA) Erasmus Mundus MSc.
It was presented in LWDA 2021 and published in the Datenbank Spektrum journal.
In this project, we experiment with continuous training and deployment of Deep Learning models.
This work draws on previous work on continuous training of ML models, which proposes proactive stochastic training with mini-batch SGD using a combination of historical data and new data.
It is a middleground solution that lies between two extremes:
- Online learning updates a model by only training on new samples that arrive in the system.
- Full retraining trains a new model from scratch using all available data samples, when enough new data are available.
Proactive training reuses trained model and historical data, while incorporating new data as soon as they arrive.
In the code, it is simulated by:
data_loader/data_loaders.py : BatchRandomDynamicSampler
Training as soon as data arrives, requires a way to quickly deploy the model. For this, we propose sparse continuous deployment.
Drawing from work on the distributed training setting, we propose gradient sparsification. This means only changing a small percentage of the model parameters at each iteration keeping residuals in gradient memory for the changes that did not happen.
The logic of the sparse training and deployment is in trainer/sparse_trainer.py
- Python >= 3.5 (3.6 recommended)
- PyTorch >= 0.4 (1.2 recommended)
- tqdm (Optional for
test.py
) - tensorboard >= 1.14
This project is licensed under the MIT License. See LICENSE for more details
- This project template is created using the Pytorch-Project-Template
- get_top_k, get_random_k gradient sparsification functions are adjusted from ChocoSGD