Replication package for the paper:
S. del Rey, S. Martínez-Fernández, L. Cruz and X. Franch, "Do DL models and training environments have an impact on energy consumption?," 2023 49th Euromicro Conference on Software Engineering and Advanced Applications (SEAA), Durres, Albania, 2023, pp. 150-158, doi: 10.1109/SEAA60479.2023.00031.
Before executing the code, you must first install the required dependencies. We use Poetry to manage the dependencies.
If you want to use any other dependency manager, you can look at the pyproject.toml file for the required dependencies.
We use MLflow to keep track of the different experiments. By default, its usage is disabled. If you want to use MLflow, you need to:
- Configure your own tracking server.
- Activate MLflow logging in the experiment.yaml configuration file.
- Create a
secrets.yaml
file in the project root with the following structure:
MLFLOW:
URL: https://url/to/your/tracking/server
USERNAME: tracking_server_username
PASSWORD: tracking_server_password
If you do not need users credentials for MLflow leave the fields empty.
This package uses a resources configuration to manage the GPU memory limit allowed and the use of cache.
We do not share this file since each machine has its own hardware specifications.
You will need to create a resources.yaml
file inside the config
folder with the following structure:
GPU:
MEM_LIMIT: 2048
USE_CACHE: true
The GPU memory limit must be specified in Megabytes. If you do not want to set a GPU memory limit, leave the field empty.
WARNING! The memory limit value is just an example. Do not take it as a reference.
Once the environment is set up, you can run the experiment by executing the following command:
$ python -m src.profiling.profile_model [--experiment_name EXPERIMENT_NAME] [-d DATA] {local, cloud}
positional arguments:
{local,cloud} The type of training environment.
options:
--experiment_name EXPERIMENT_NAME
The name of the MLflow experiment.
-d DATA, --data DATA Path to the dataset folder. The default is the data/dataset folder.
The raw measurements for each architecture will be saved in the data/metrics/raw/{local, cloud}/architecture_name
folder.
If MLflow is enabled, the measurements will also be saved in the MLflow tracking server, together with the trained models.
If not, the trained models will be saved in the models
folder and the training history will be saved with the raw measurements as performance-%Y%m%dT%H%M%S.csv
.
You can also train a single model without profiling the energetic metrics by executing the following command:
$ python -m src.models.run_training [--experiment_name EXPERIMENT_NAME] [-d DATA] {local,cloud} {vgg16,resnet50,xception,mobilenet_v2,nasnet_mobile}
positional arguments:
{local,cloud} Whether is to be executed locally or in the cloud.
{vgg16,resnet50,xception,mobilenet_v2,nasnet_mobile}
Architecture of the DNN
options:
--experiment_name EXPERIMENT_NAME
The name of the MLflow experiment.
-d DATA, --data DATA Path to the dataset folder. The default is the data/dataset folder.
The training history and the model will be saved following the same rules as the profiling script.
We do not share the training data used in this experiment. However, you can use any dataset you want, as long as it is intended for binary image classification, and obtain the energy measurements for it.
All the data collected during the experiment can be found in the data/metrics folder. The data is organized in the following structure:
.
├── - raw
├── - interim
└── - processed
The raw
folder contains the raw measurements collected during the experiment.
The interim
folder contains the processed data that is used to generate the final dataset.
The processed
folder contains the final data used to perform the analysis.
The data analysis is done using Jupyter Notebooks. You can find the analysis in the data-analysis.ipynb file. All the plots generated are saved in the out/figures
folder.
The software under this project is licensed under the terms of the Apache 2.0 license. See the LICENSE file for more information.
The data used in this project is licensed under the terms of the CC BY 4.0 license. See the LICENSE file for more information.