Skip to content

chenyhvvvv/submission_version

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

11 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Cell Segmentation Competition

Background

Dataset

The Stereo-seq dataset [1] captures a whole adult mouse brain slice. The barcoded spots are arranged in a grid with a distance of 0.5 μm between spots. In total, this dataset profiled 26,177 genes in more than 42,000,000 spots with an average of 3.3 unique molecular identifier (UMI) counts per spot. The brain slice was imaged with nucleic acid staining, allowing for segmentation of the nucleus using image-based methods.

To get the raw complete data for this competition, you can download the transcriptomics data Mouse_brain_Adult_GEM_bin1.tsv.gz and the stain image data Mouse_brain_Adult.tif from MOSTA.

In this competition, we select only ten 1200*1200 patches from the original dataset for training and testing. These ten patches have the highest gene counts. The selected patches are as follows: '6000/9600/1200/1200', '7200/8400/1200/1200', '8400/3600/1200/1200', '4800/1200/1200/1200', '3600/1200/1200/1200', '8400/6000/1200/1200', '8400/4800/1200/1200', '4800/10800/1200/1200', '7200/1200/1200/1200', '6000/1200/1200/1200' ('start row index/start column index/patchsize/patchsize').

The dataset of this competition can be download from here.

The stain images are stored in tiff folder. They are 8-bit DNA Fluorescent stains. And the corresponding gene expressions are in gene folder. They follow the following format in a tab-delimited file:

geneID          row     column      counts
0610009B22Rik   426     1021        3

means a geneID 0610009B22Rik lies in (426,1021) with gene counts 3.

The segmentation[2] results by SCS are stored in seg folder. For each patch, there are several files:

  • mask_[patch_id].png: the segmentation mask of the patch
  • spot2cell_SCS_[patch_id].txt: the mapping from spot coordinates to cell indexes of the segmentation by SCS. Each line has the following format: row:column cell_id. Important: this is also the submission format of your result!
  • (For evaluation)spot2cell_cellpose_[patch_id].txt: the mapping from spot coordinates to cell indexes of the segmentation by Cellpose[3] method.
  • (For evaluation)spot2nucl_[patch_id].txt: the mapping from spot coordinates to cell indexes by nucleus segmentation[4].

By simply modify the path, you can reproduce the segmentation by running the source code.

You can also refer to the source code of the paper to know how to preprocess the data in details. A jupyter version of part of the preprocess code with additional note is available (Not exactly the same since SCS does downscale, and some code of multiprocessing is added).

File structure

.
├── dataset
│   ├── gene                                           # Gene expression data
│       ├── patch_tsv_6000/1200/1200/1200.tsv          # Gene expression data of patch 6000/1200/1200/1200
│       ├── ... (Other 9 patches)
│   ├── seg                                            # Segmentation results
│       ├── mask_6000_1200_1200_1200.png               # Segmentation mask fig by SCS of patch 6000/1200/1200/1200
│       ├── spot2cell_SCS_6000_1200_1200_1200.txt      # Spot to cell mapping by SCS of patch 6000/1200/1200/1200
│       ├── spot2cell_cellpose_6000_1200_1200_1200.txt # Spot to cell mapping by Cellpose of patch 6000/1200/1200/1200
│       ├── spot2nucl_6000_1200_1200_1200.txt          # Spot to cell mapping by nucleus segmentation of patch 6000/1200/1200/1200
│       ├── ... (Other 9 patches)
│   ├── tiff                                           # Stain images
│       ├── raw_stain_6000/1200/1200/1200.tif          # 8-bit Stain image of patch 6000/1200/1200/1200
│       ├── ... (Other 9 patches)
├── document
│   ├── README.md                                      # Competition description
│   ├── evaluation.ipynb                               # Evaluation demo
│   ├── preprocess_demo.ipynb                          # Preprocess demo
│   ├── evaluation.py                                  # Evaluation code from SCS
└── ...

Preprocess demo

See the notebook preprocess_demo.ipynb for details. Since the patch data provided in this competition is already preprocessed, this notebook is only for demonstration purpose and show you how to deal with the other patches in the large scale mouse brain dataset. You can use it to get familiar with the data. The ten pairs of stain image and gene expression data can be used directly for your model.

Evaluation

See the notebook evaluation.ipynb for details.

References

[1] Chen, A. et al. Spatiotemporal transcriptomic atlas of mouse organogenesis using DNA nanoball-patterned arrays. Cell 185, 1777–1792 (2022).

[2] Chen, H., Li, D. & Bar-Joseph, Z. SCS: cell segmentation for high-resolution spatial transcriptomics. Nat Methods (2023).

[3] Stringer, C., Wang, T., Michaelos, M. & Pachitariu, M. Cellpose: a generalist algorithm for cellular segmentation. Nat. Methods 18, 100–106 (2021).

[4] Beucher, S. Use of watersheds in contour detection. In Proc. International Workshop on Image Processing 17–21 (CCETT, 1979).

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published