Identifying Cell Nuclei in Divergent Images

Class project for ChBE 8803, based on Kaggle data science bowl

Download training data here

Milestones:

02/02/18 - Project proposal draft
02/09/18 - Data retrieval and storage
02/16/18 - Proposal revision and literature review
03/01/18 - Initial workflow and peer review
03/30/18 - Final report draft
04/12/18 - Presentation
04/24/18 - Final report

1. Background

Pathologists use immunohistochemistry (IHC) to detect tumours by identifying and quantifying the presence of important biomarkers expressed in cell nuclei. However, manual identification is time-consuming, thus it is highly desirable to develop an automated, high-accuracy method for isolating and analyzing nuclei in different kinds of IHC images.

2. Data Description & Challenge

The dataset is challenging because of high volume and dimensionality. Our data is divided into a training set (665 images, each containing between 4 to 384 masks for distinct nuclei) and test set (65 images). The images vary in size (total pixels) and were collected from many different cell types under a variety of imaging conditions (magnification, modality, etc). To achieve success, we will have to work with all the given data to develop a robust method for cell nucleus identification.

3. Hypotheses & Goals

Goal 1. Normalize across set of images.

The variety of cell type, staining and imaging condition all complicate cell identi cation. Pre-processing the data will ensure comparison across uniform images.

Goal 2. Detect all objects in each image.

We will separate objects from background, categorizing each pixel as ground or non-ground.

Goal 3. Separate individual cell nuclei. Once we have distinguished all objects as distinct from ground, the

next step is to determine individual cells. For each image, we will return a set of masks, each mask only covering one nucleus with no overlap between any of the masks.

Goal 4. Maximize performance.

We will consider statistical metrics important in binary classification (accuracy, precision, and the F1 score), as well as average precision in image classification as measured by the Jaccard index (also called the intersection over union) for a set of predicted pixels A and a set of true object pixels B.

4. Definition of Success

Success is a defined as a work ow that consists of pre-processing to normalize variation in imaging condition, separating objects from background in each image, and distinguishing individual cell nuclei. The differences in low, expected and high success are based on model performance as follows.

Low: Accomplish Goals 1-2. For Goal 3, predict nuclei regardless of accuracy.

Expected: Accomplish Goals 1-3. For Goal 4, achieve

at least 60% accuracy, precision, F1 score
at least 50% IoU

High: Accomplish Goals 1-3. For Goal 4, achieve

at least 80% accuracy, precision, F1 score
at least 65% IoU

5. Deliverables

The key deliverable will be a Jupyter notebook containing:

Code to normalize the data from different imaging conditions
Code to detect objects in the image
Code to identify individual nuclei
Documentation of the inputs and outputs of all functions
Quantitative assessment of model accuracy
Written critical analysis of successes/failures of the model

Name		Name	Last commit message	Last commit date
Latest commit History 47 Commits
images		images
README.md		README.md
Random_Forest_FullSample.pkl		Random_Forest_FullSample.pkl
Random_Forest_KMEANS_0.pkl		Random_Forest_KMEANS_0.pkl
Random_Forest_KMEANS_1.pkl		Random_Forest_KMEANS_1.pkl
Random_Forest_KMEANS_2.pkl		Random_Forest_KMEANS_2.pkl
Random_Forest_KMEANS_3.pkl		Random_Forest_KMEANS_3.pkl
Random_Forest_KMEANS_4.pkl		Random_Forest_KMEANS_4.pkl
Random_Forest_KMEANS_5.pkl		Random_Forest_KMEANS_5.pkl
Score_KmeansRFC_FullSample.csv		Score_KmeansRFC_FullSample.csv
Score_RandomForest_FullSample.csv		Score_RandomForest_FullSample.csv
Score_Watershed_FullSample.csv		Score_Watershed_FullSample.csv
Summary_Kmeans_FullSample.csv		Summary_Kmeans_FullSample.csv
cm_kmrfc.csv		cm_kmrfc.csv
final_report.ipynb		final_report.ipynb
milestone_02162018.ipynb		milestone_02162018.ipynb
milestone_03012018_FINAL.ipynb		milestone_03012018_FINAL.ipynb
milestone_03302018_SUBMISSION.ipynb		milestone_03302018_SUBMISSION.ipynb
presentation.ipynb		presentation.ipynb
schematic.png		schematic.png

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Identifying Cell Nuclei in Divergent Images

1. Background

2. Data Description & Challenge

3. Hypotheses & Goals

Goal 1. Normalize across set of images.

Goal 2. Detect all objects in each image.

Goal 3. Separate individual cell nuclei. Once we have distinguished all objects as distinct from ground, the

Goal 4. Maximize performance.

4. Definition of Success

Low: Accomplish Goals 1-2. For Goal 3, predict nuclei regardless of accuracy.

Expected: Accomplish Goals 1-3. For Goal 4, achieve

High: Accomplish Goals 1-3. For Goal 4, achieve

5. Deliverables

6. Schematic

About

Releases

Packages

Languages

rbhan/chbe8803-nuclei-identification

Folders and files

Latest commit

History

Repository files navigation

Identifying Cell Nuclei in Divergent Images

1. Background

2. Data Description & Challenge

3. Hypotheses & Goals

Goal 1. Normalize across set of images.

Goal 2. Detect all objects in each image.

Goal 3. Separate individual cell nuclei. Once we have distinguished all objects as distinct from ground, the

Goal 4. Maximize performance.

4. Definition of Success

Low: Accomplish Goals 1-2. For Goal 3, predict nuclei regardless of accuracy.

Expected: Accomplish Goals 1-3. For Goal 4, achieve

High: Accomplish Goals 1-3. For Goal 4, achieve

5. Deliverables

6. Schematic

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages