Implementation for the paper - Prime editing functionally corrects Cystic Fibrosis-causing CFTR mutations in human organoids and airway epithelial cells (Cell Reports Medicine)
This code serves two purposes: firstly, it accurately counts the total number of organoids in each image using Bayesian Crowd Counting, resolving the issue of segmentation inaccuracy in dense organoids. Secondly, it identifies swelling organoids resulting from gene editing using YOLOv7.
If you use this code for your research, please cite our paper:
@article{bulcaenPrimeEditingFunctionally2024,
title = {Prime Editing Functionally Corrects Cystic Fibrosis-Causing {{CFTR}} Mutations in Human Organoids and Airway Epithelial Cells},
author = {Bulcaen, Mattijs and Kortleven, Phéline and Liu, Ronald B. and Maule, Giulia and Dreano, Elise and Kelly, Mairead and Ensinck, Marjolein M. and Thierie, Sam and Smits, Maxime and Ciciani, Matteo and Hatton, Aurelie and Chevalier, Benoit and Ramalho, Anabela S. and family=Solvas, given=Xavier Casadevall, prefix=i, useprefix=false and Debyser, Zeger and Vermeulen, François and Gijsbers, Rik and Sermet-Gaudelus, Isabelle and Cereseto, Anna and Carlon, Marianne S.},
date = {2024-05-01},
journaltitle = {Cell Reports Medicine},
issn = {2666-3791},
doi = {10.1016/j.xcrm.2024.101544},
url = {https://www.cell.com/cell-reports-medicine/abstract/S2666-3791(24)00234-9}
}
a) Organoids swelling with CFTR gene editing; b) Bayesian Crowd Counting for dense organoids & Swelling detection; c) Swelling detection results
- We solved the issue of segmentation inaccuracy in dense organoids by using Bayesian Crowd Counting.
- The swelling organoids are detected using YOLOv7, a real-time object detection algorithm, achieved the state-of-the-art performance in the detection of dense organoids.
- This code is optimized for fast interference on CPU, and across Windows/Mac/Linux platforms.
Images for detection: Examples of the images used for testing can be found in the data/input folder.
Examples of the results can be found in the data/Output folder.
The tool requires models to be inputted!
If the user prefers to use our pre-trained models, those can be downloaded from Google Drive or Harvard Dataverse. The organization of the folder is analogous to the data/trained_models folder.
Alternatively, more experienced users can train their own models. However, this will require a lot of input data and a high-end GPU to do. The datasets that we used for training the models are available in Datasets for training
The scripts can be downloaded via this link. After manual retrieval, the folder needs to be extracted before it can be used.
If git is installed on the operating system
(Git can be installed by this link).
git clone https://github.com/RL-arch/detector.git
There are several options to run the conda environment:
The code is written in Python and depends on a conda environment. Such an environment can easily run in Anaconda.
On windows, if Anaconda was never used before, the simplest option is to download Anaconda navigator.
After opening Anaconda navigator, you could simply import the environment file environment.yaml that is located in the detector-main folder and was downloaded within step 1. See: Importing an environment.
For the interested reader, The documentation of using conda environments within Anaconda can be found here.
In a terminal, the following command can be used to create a new conda environment:
conda env create -f environment.yaml
create a conda environment with python 3.10:
conda create --name <your name> python=3.10
activate the environment:
conda activate <your name>
and install the required packages:
pip install -r requirements.txt
If an error occurs, any missing packages can be installed through pip install <package>
, according to the error message.
Our input images were exported from ZEISS ZEN software, exported in (512,512) .tif.
But our program can process other image formats (jpg, png, etc), and resize the input image size to (512, 512) if they are not. (See resize_images())
This can be adjusted in run.py, line 16.
This program aims to process time-lapse sequences; the images should have the format:
prefix_s{}t{}
;
where s
follows sequence number and t
follows the time number.
for example:
exp1 242en435-CF N1303K 20220921timeseries-01_s01t01.tif
exp1 242en435-CF N1303K 20220921timeseries-01_s01t02.tif
exp1 242en435-CF N1303K 20220921timeseries-01_s01t03.tif
exp1 242en435-CF N1303K 20220921timeseries-01_s01t04.tif
....
exp1 242en435-CF N1303K 20220921timeseries-01_s02t01.tif
exp1 242en435-CF N1303K 20220921timeseries-01_s02t02.tif
....
exp1 242en435-CF N1303K 20220921timeseries-01_s96t013.tif
....
And for this experiment, " exp1 242en435-CF N1303K 20220921timeseries-01 " will be the "prefix"
The test image names are following the format from ZEISS software, and with the experiment time of 2h;
If your experiment time is 1h, please use /data/rename.py to rename the files.
Therefore, the names are with suffix s...t01 to s...t13
!!Important The dynamic morphological changes are based on the detection of dynamic morphological changes from the t02 to t12.
If your experiment is 2h: the image sequences are t01 t02 t03 t04 t05 t06 t07 t08 t09 t10 t11 t12 (t13)
If your experiment is 1h: the image sequences are t00 t02 t04 t06 t08 t10 t12
Then, we suggest organizing the data from one experiment into one folder and collect all those into one input folder. Our program will automatically rename the subfolders as "Experiment_1, Experiment_2, ..."
For example: "/Input"
In the /Input there are 3 folders: exp 1 One, exp 2 Deux, exp 3 Drie
Input
|
|__exp 1 One
|
|__exp 2 Deux
|
|__exp 3 Drie
The program will process the subfolders as:
Input
|
|__Experiment_1
|
|__Experiment_2
|
|__Experiment_3
4. Run the run.py
The script can be run from an IDE. We recommend to use VSCode. To execute the script:
- when using VSCode, install the python extension
- Open Anaconda Navigator after downloading VSCode
- Navigate to the "Home" tab
- Check whether your environment name is shown in the top of the screen after: "all applications on"
- Look for VSCode within this tab and launch the IDE.
- Open the unzipped detector-main folder within VSCode. Find here how to do this.
- Upon opening the folder, select the run.py file from the dropdown menu
- Run the code via the
▶️ button. To learn more about using IDEs within anaconda navigator, click here
In the terminal the conda environment can be activated via:
conda activate <your name>
then, open the terminal in the root folder of the package (where run.py
located) and run the code:
python run.py
Upon running, there will be a GUI asking for the paths to the input folder, the output folder and the pre-trained models. An example of an input and output folder can be found in the detector-main folder under "data". Don't select the "experiment1" in the input. Just select the input folder.
For the pre-trained models, as explained, two options exist. 1 Using our pre-trained models which can be found on Google Drive or Harvard Dataverse. 2 Using your own pre-trained models.
After saving, the window should be closed and the program will continue to run the code. Wait carefully as it might take a while to run the code.
Check the output in your defined Output folder. An example of an output image:
The statistics are saved in .xlsx files in the /excel folder.
Within the terminal, the progress of the code can be monitored as well as the total runtime. Example output:
Skipping exp151 426-CF L227R 20220902 timeseries-01_s10t07.tif - already at target size (512, 512)
...
Checked and resized images to (512, 512) where necessary.
Your folder Experiment_1 is renamed as Experiment_1.
Images are from 1 experiment(s) in total
Preprocessing data structure...
Done with preprocessing.
Calculating frame differences...
Frame difference images are saved at /data/Output/diff_images.
Counting total amounts of organoids in the images...
Processing file exp151 426-CF L227R 20220902 timeseries-01_s09t02.tif
Processing file xxx
Image saved.
Result of total amount saved in /data/Output/excel.
Namespace(weights='data/trained_models/model_detect.pt', source='/data/Output/diff_images', img_size=512, conf_thres=0.3, iou_thres=0.45, device='cpu', view_img=False, save_txt=False, save_conf=False, nosave=False, classes=None, agnostic_nms=False, augment=False, update=False, project='/data/Output', name='img3', exist_ok=False, no_trace=False)
YOLOR 2024-10-30 torch 2.1.0+cpu CPU
Fusing layers...
Model Summary: 819 layers, 164816216 parameters, 0 gradients, 225.6 GFLOPS
Convert model to Traced-model...
traced_script_module saved!
model is traced!
512x512 102 Growing-cellss, Done. (0.743s)
The image with the result is saved in: detector_v2.0\data\Output\img3\exp151 426-CF L227R 20220902 timeseries-01_s09t02_diff.png
512x512 96 Growing-cellss, Done. (0.597s)
The image with the result is saved in: detector_v2.0\data\Output\img3\exp151 426-CF L227R 20220902 timeseries-01_s10t02_diff.png
512x512 120 Growing-cellss, Done. (0.559s)
The image with the result is saved in: detector_v2.0\data\Output\img3\exp151 426-CF L227R 20220902 timeseries-01_s11t02_diff.png
Done. (1.976s)
Excel saved in /excel.
[{'Name': 'exp151 426-CF L227R 20220902 timeseries-01_s09t02_diff.png', 'Swelling': 102}, {'Name': 'exp151 426-CF L227R 20220902 timeseries-01_s10t02_diff.png', 'Swelling': 96}, {'Name': 'exp151 426-CF L227R 20220902 timeseries-01_s11t02_diff.png', 'Swelling': 120}]
Save at /data/Output/final_image/exp151 426-CF L227R 20220902 timeseries-01_s09t02.tif
...
Final images saved!
...
Final images saved!
cache in /data/Output released.
Final Excel saved as 'final results.xlsx'.
Total time taken: 0.219 minutes
-
We tested the code on Linux, Mac OS and Windows. For any issues, please create an issue topic under the repository.
-
(2024) we use
writer.close()
to replacewriter.save()
with new version of Pandas version >=1.2.0. If you encounter any problems, please reinstall pandas refer to the Pandas document. -
(2024) prefix inputs are not required in the new version of Pandas.
The brightness of the microscopic image will influence the results of total number estimation. When the image condition is dark and dense crowd, the total number estimate can be less than actual.
The position shift will influence the swelling organoids detection and will make fewer organoids detected.
The network needs to stay on and be able to connect to Google to download initial files like model weights. (see google_utils.py)