Merge pull request #61 from KatherLab/exp_odelia_multisite_revision

Exp odelia multisite revision
KatherLab · Feb 28, 2024 · 3a3a131 · 3a3a131
2 parents 856bb9e + e2a93e3
commit 3a3a131
Show file tree

Hide file tree

Showing 54 changed files with 769 additions and 3,867 deletions.
diff --git a/DUKE_dataset_preparation.md b/DUKE_dataset_preparation.md
@@ -0,0 +1,37 @@
+## Data Preparation
+### Notes
+This will take a long time. Just run with either command, `get_dataset_gdown.sh` is recommended to run before you have done step 2, `get_dataset_scp.sh` is recommended to run after you have done step 2.
+`get_dataset_gdown.sh` will download the dataset from Google Drive.
+```sh
+$ sh workspace/automate_scripts/sl_env_setup/get_dataset_gdown.sh
+```
+The [-s sentinel_ip] flag is only necessary for `get_dataset_scp.sh` The script will download the dataset from the sentinel node.
+```sh
+$ sh workspace/automate_scripts/sl_env_setup/get_dataset_scp.sh -s <sentinel_ip>
+```
+
+### Instructions
+
+1. Make sure you have downloaded Duke data.
+
+2. Create the folder `WP1` and in it `test` and `train_val`
+```bash
+mkdir workspace/<workspace-name>/user/data-and-scratch/data/WP1
+mkdir workspace/<workspace-name>/user/data-and-scratch/data/WP1/{test,train_val}
+```
+3. Search for your institution in the [Node list](#nodelist) and note the data series in the column "Data"
+
+4. Prepare the clinical tables
+```sh
+cp workspace/<workspace-name>/user/data-and-scratch/data/*.xlsx workspace/<workspace-name>/user/data-and-scratch/data/WP1
+```
+
+5. Copy the nifty files from feature folder into `WP1/test` from 801 to 922
+```sh
+cp -r workspace/<workspace-name>/user/data-and-scratch/data/odelia_dataset_only_sub/{801..922}_{right,left} workspace/<workspace-name>/user/data-and-scratch/data/WP1/test
+```
+
+6. Copy the nifty files from feature folder with the order you noted into `WP1/train_val` from xxx to yyy
+```sh
+cp -r workspace/<workspace-name>/user/data-and-scratch/data/odelia_dataset_only_sub/{<first_number>..<second_number>} workspace/<workspace-name>/user/data-and-scratch/data/WP1/train_val
+```
diff --git a/README.md b/README.md
@@ -2,7 +2,7 @@
 
 [![standard-readme compliant](https://img.shields.io/badge/readme%20style-standard-brightgreen.svg?style=flat-square)](https://github.com/RichardLitt/standard-readme)
 
-Swarm learning based on HPE platform, experiments performed based on HPE Swarm Learning version number 2.1.0
+Swarm learning based on HPE platform, experiments performed based on HPE Swarm Learning version number 2.2.0
 
 This repository contains:
 
@@ -65,47 +65,56 @@ This is the Swarm Learning framework:
  * Any experimental release of Ubuntu greater than LTS 20.04 MAY result in unsuccessful swop node running.
  * It also works on WSL2(Ubuntu 20.04.2 LTS) on Windows systems. WSL1 may have some issues with the docker service.
 
-### Upgrade the Swarm Learning Environment
+### Upgrade the Swarm Learning Environment from Older Version
 1. Run the following command to upgrade the Swarm Learning Environment from 1.x.x to 2.x.x
 ```sh
-$ sh workspace/automate_scripts/server_setup/cleanup_old_sl.sh
+sh workspace/automate_scripts/server_setup/cleanup_old_sl.sh
 ```
 Then proceed to 1. `Prerequisite` in [Setting up the Swarm Learning Environment](#setting-up-the-swarm-learning-environment)
 
 ### Setting up the user and repository
 1. Create a user named "swarm" and add it to the sudoers group.
 Login with user "swarm".
 ```sh
-$ sudo adduser swarm
-$ sudo usermod -aG sudo swarm
-$ sudo su - swarm
+sudo adduser swarm
+sudo usermod -aG sudo swarm
+sudo su - swarm
+```
+2. Add the Docker user to the sudoers group
+```sh
+sudo usermod -aG docker swarm
+```
+After running this command, you will need to log out and log back in for the changes to take effect, or you can use the newgrp command like so:
+```sh
+newgrp docker
 ```
-2. Run the following commands to set up the repository:
+
+3. Run the following commands to set up the repository:
 
 ```sh
-$ cd / && sudo mkdir opt/hpe && cd opt/hpe && sudo chmod 777 -R /opt/hpe
-$ git clone https://github.com/KatherLab/swarm-learning-hpe.git && cd swarm-learning-hpe
+cd / && sudo mkdir opt/hpe && cd opt/hpe && sudo chmod 777 -R /opt/hpe
+git clone https://github.com/KatherLab/swarm-learning-hpe.git && cd swarm-learning-hpe
 ```
 
-3. Install cuda environment and nvidia drivers, as soon as you could see correct outputs of the following command you may proceed.
+4. Install cuda environment and nvidia drivers, as soon as you could see correct outputs of the following command you may proceed.
 ```sh
-$ nvidia-smi
+nvidia-smi
 ```
 Please disable secure boot. On some systems, Secure Boot might prevent unsigned kernel modules (like NVIDIA's) from loading.
 Check Loaded Kernel Modules:
 - To see if the NVIDIA kernel module is loaded:
 ```sh
-$ lsmod | grep nvidia
+lsmod | grep nvidia
 ```
 Review System Logs:
 - Sometimes, system logs can provide insights into any issues with the GPU or driver:
 ```sh
-$ dmesg | grep -i nvidia
+dmesg | grep -i nvidia
 ```
 Manually Load the NVIDIA Module:
 - You can try manually loading the NVIDIA kernel module using the modprobe command:
 ```sh
-$ sudo modprobe nvidia
+sudo modprobe nvidia
 ```
 Requirements and dependencies will be automatically installed by the script mentioned in the following section.
 
@@ -120,30 +129,27 @@ Requirements and dependencies will be automatically installed by the script ment
 
 **Please only proceed to the next step by observing "... is done successfully" from the log**
 
-0. Optional: download preprocessed datasets. This will take a long time. Just run with either command, `get_dataset_gdown.sh` is recommended to run before you have done step 2, `get_dataset_scp.sh` is recommended to run after you have done step 2.
-`get_dataset_gdown.sh` will download the dataset from Google Drive.
-```sh
-$ sh workspace/automate_scripts/sl_env_setup/get_dataset_gdown.sh
-```
-The [-s sentinel_ip] flag is only necessary for `get_dataset_scp.sh` The script will download the dataset from the sentinel node.
-```sh
-$ sh workspace/automate_scripts/sl_env_setup/get_dataset_scp.sh -s <sentinel_ip>
-```
+0. Optional: download preprocessed datasets. Please refer to the [Data Preparation](DUKE_dataset_preparation.md) section for more details.
+
 1. `Prerequisite`: Runs scripts that check for required software and open/exposed ports.
 ```sh
-$ sh workspace/automate_scripts/automate.sh -a
+sh workspace/automate_scripts/automate.sh -a
 ```
 2. `Server setup`: Runs scripts that set up the swarm learning environment on a server.
 ```sh
-$ sh workspace/automate_scripts/automate.sh -b -s <sentinel_ip> -d <host_index>
+sh workspace/automate_scripts/automate.sh -b -s <sentinel_ip> -d <host_index>
 ```
 3. `Final setup`: Runs scripts that finalize the setup of the swarm learning environment. Only <> is required. The [-n num_peers] and [-e num_epochs] flags are optional.
 ```sh
-$ sh workspace/automate_scripts/automate.sh -c -w <workspace_name> -s <sentinel_ip> -d <host_index> [-n num_peers] [-e num_epochs]
+sh workspace/automate_scripts/automate.sh -c -w <workspace_name> -s <sentinel_ip> -d <host_index> [-n num_peers] [-e num_epochs]
 ```
 
 Optional 5. Reconnect to VPN
-In case your machine got restarted or lost the vpn connection. Here is the guide to reconnect: [VPN connect guide](https://support.goodaccess.com/configuration-guides/linux/linux-terminal)
+```sh
+sh /workspace/automate_scripts/server_setup/setup_vpntunnel.sh
+```
+
+In case your machine got restarted or lost the vpn connection. Here is the guide to reconnect: [VPN connect guide](https://support.goodaccess.com/configuration-guides/linux)
 The file.ovpn is the config file that TUD assigned to you.
 
 If a problem is encountered, please observe this [README.md](workspace%2Fautomate_scripts%2FREADME.md) file for step-by-step setup. Specific instructions are given about how to run the commands.
@@ -152,56 +158,93 @@ All the processes are automated, so you can just run the above command and wait
 If any problem occurs, please first try to figure out which step is going wrong, try to google for solutions and find solution in [Troubleshooting.md](Troubleshooting.md). Then contact the maintainer of the Swarm Learning Environment and document the error in the Troubleshooting.md file.
 
 ## Usage
-### Data Preparation
-1. Make sure you have downloaded Duke data.
-
-2. Create the folder `WP1` and in it `test` and `train_val`
-```bash
-mkdir workspace/<workspace-name>/user/data-and-scratch/data/WP1
-mkdir workspace/<workspace-name>/user/data-and-scratch/data/WP1/{test,train_val}
-```
-3. Search for your institution in the [Node list](#nodelist) and note the data series in the column "Data"
+### Ensuring Dataset Structure
+
+To ensure proper organization of your dataset, please follow the steps outlined below:
+
+1. **Directory Location**
+
+ Place your dataset under the specified path:
+
+/workspace/odelia-breast-mri/user/data-and-scratch/data
+
+
+Within this path, create a folder named `multi_ext`. Your directory structure should then resemble:
+/opt/hpe/swarm-learning-hpe/workspace/odelia-breast-mri/user/data-and-scratch/data
+└── multi_ext
+├── datasheet.csv # Your clinical tabular data
+├── test # External validation dataset
+├── train_val # Your own site training data
+└── segmentation_metadata_unilateral.csv # External validation table
+
+2. **Data Organization**
+
+Inside the `train_val` or `test` directories, place folders that directly contain NIfTI files. The folders should be named according to the following convention:
+
+<patientID>_right
+<patientID>_left
+
+Here, `<patientID>` should correspond with the patient ID in your tables (`datasheet.csv` and `segmentation_metadata_unilateral.csv`). This convention assists in linking the imaging data with the respective clinical information efficiently.
+
+#### Summary
+
+- **Step 1:** Ensure your dataset is placed within `/workspace/odelia-breast-mri/user/data-and-scratch/data/multi_ext`.
+- **Step 2:** Organize your clinical tabular data, external validation dataset, your own site training data, and external validation table as described.
+- **Step 3:** Name folders within `train_val` and `test` as `<patientID>_right` or `<patientID>_left`, matching the patient IDs in your datasheets.
+
+Following these structured steps will help in maintaining a well-organized dataset, thereby enhancing data management and processing in your projects.
 
-4. Prepare the clinical tables
-```sh
-cp workspace/<workspace-name>/user/data-and-scratch/data/*.xlsx workspace/<workspace-name>/user/data-and-scratch/data/WP1
-```
 
-5. Copy the nifty files from feature folder into `WP1/test` from 801 to 922
-```sh
-cp -r workspace/<workspace-name>/user/data-and-scratch/data/odelia_dataset_only_sub/{801..922}_{right,left} workspace/<workspace-name>/user/data-and-scratch/data/WP1/test
-```
-
-6. Copy the nifty files from feature folder with the order you noted into `WP1/train_val` from xxx to yyy
-```sh
-cp -r workspace/<workspace-name>/user/data-and-scratch/data/odelia_dataset_only_sub/{<first_number>..<second_number>} workspace/<workspace-name>/user/data-and-scratch/data/WP1/train_val
-```
 
 ### Running Swarm Learning Nodes
 To run a Swarm Network node -> Swarm SWOP Node -> Swarm SWCI node. Please open a terminal for each of the nodes to run. Observe the following commands:
+#### SN
 - To run a Swarm Network (or sentinel) node:
 ```sh
-$ ./workspace/automate_scripts/launch_sl/run_sn.sh -s <sentinel_ip_address> -d <host_index>
+./workspace/automate_scripts/launch_sl/run_sn.sh -s <sentinel_ip_address> -d <host_index>
 ```
-
+or
+```sh
+runsn
+```
+#### SWOP
 - To run a Swarm SWOP node:
 ```sh
-$ ./workspace/automate_scripts/launch_sl/run_swop.sh -w <workspace_name> -s <sentinel_ip_address> -d <host_index>
+./workspace/automate_scripts/launch_sl/run_swop.sh -w <workspace_name> -s <sentinel_ip_address> -d <host_index>
+```
+or
+```sh
+runswop
 ```
+#### SWCI
 
 - To run a Swarm SWCI node(SWCI node is used to generate training task runners, could be initiated by any host, but currently we suggest only the sentinel host is allowed to initiate):
 ```sh
-$ ./workspace/automate_scripts/launch_sl/run_swci.sh -w <workspace_name> -s <sentinel_ip_address> -d <host_index>
+./workspace/automate_scripts/launch_sl/run_swci.sh -w <workspace_name> -s <sentinel_ip_address> -d <host_index>
+```
+or
+```sh
+runswci
 ```
 
+
 - To check the logs from training:
 ```sh
-$ ./workspace/automate_scripts/launch_sl/check_latest_log.sh
+./workspace/automate_scripts/launch_sl/check_latest_log.sh
+```
+or
+```sh
+cklog [--ml] [--swci] [--swop] [--sn] 
 ```
 
+
 - To stop the Swarm Learning nodes, --[node_type] is optional, if not specified, all the nodes will be stopped. Otherwise, could specify --sn, --swop for example.
 ```sh
-$ ./workspace/swarm_learning_scripts/stop-swarm --[node_type]
+./workspace/swarm_learning_scripts/stop-swarm --[node_type]
+```
+or
+```sh
+stopswarm [--node_type]
 ```
 
 - To view results, see logs under `workspace/<workspace_name>/user/data-and-scratch/scratch`
@@ -220,15 +263,20 @@ Please observe [Troubleshooting.md](Troubleshooting.md) section 10 for successfu
 Nodes will be added to vpn and will be able to communicate with each other after setting up the Swarm Learning Environment with [Install](#install)
 | Project | Node Name | Location | Hostname | Data | Maintainer |
 | ------- | --------- | ------------------| ---------| --------- | ------------------------------------------|
-| Sentinel node | TUD | Dresden, Germany | swarm | 1-100 | [@Jeff](https://github.com/Ultimate-Storm) |
-| ODELIA | VHIO | Madrid, Spain | radiomics | 401-500 | [@Adrià]([email protected]) |
-| | UKA | Aachen, Germany | swarm | 101-200 | [@Gustav]([email protected]) |
-| | RADBOUD | Nijmegen, Netherlands | swarm | 501-600 | [@Tianyu]([email protected]) |
-| | MITERA | | | 201-300 | |
-| | RIBERA | | | 301-400 | |
-| | UTRECHT | | | 601-700 | |
-| | CAMBRIDGE | | | 701-800 | |
-| | ZURICH | | | | |
+| Sentinel node | TUD | Dresden, Germany | swarm | | [@Jeff](https://github.com/Ultimate-Storm) |
+| ODELIA | VHIO | Madrid, Spain | radiomics | | [@Adrià]([email protected]) |
+| | UKA | Aachen, Germany | swarm | | [@Gustav]([email protected]) |
+| | RADBOUD | Nijmegen, Netherlands | swarm | | [@Tianyu]([email protected]) |
+| | MITERA | Paul, Greece | | | |
+| | RIBERA | Lopez, Spain | | | |
+| | UTRECHT | | | | |
+| | CAMBRIDGE | Nick, Britain | | | |
+| | ZURICH | Sreenath, Switzerland | | | |
+| SWAG | | | swarm | | |
+
+| DECADE | | | swarm | | |
+
+
 | Other nodes | UCHICAGO | Chicago, USA | swarm | | [@Sid]([email protected]) |
 
 ## Models implemented

diff --git a/sllib/src/README.md b/sllib/src/README.md
diff --git a/sllib/src/python-client/pyproject.toml b/sllib/src/python-client/pyproject.toml