Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Infinite max range on normed FCS files #4

Open
emrizzi opened this issue Jan 17, 2020 · 5 comments
Open

Infinite max range on normed FCS files #4

emrizzi opened this issue Jan 17, 2020 · 5 comments

Comments

@emrizzi
Copy link

emrizzi commented Jan 17, 2020

Hello - Firstly thank you so much for this code, it will be a game changer for analyzing CyTOF data between batches.

It has worked perfectly for me to normalize the FCS files and analyze the resulting files using other R packages, however some of my collaborators do not have coding experience and prefer the user-friendly versions of viSNE and FlowSOM through Cytobank. I've had trouble getting the normalized FCS files to be compatible with Cytobank. Originally I thought it may have been an issue with my FCS files, so I then normalized the flow repository files provided and I think it is an issue with the normed output files.

The algorithm changes the max range for the expression of each channel in a way that causes infinite outputs for some channels as show below:
image
image

Unfortunately the code in Cytobank requires the max range to be a finite value in order to do any higher order analyses (viSNE, FlowSOM, CITRUS, etc.). I've tried to play around with the code a bit to manually set the max range but haven't been successful. Do you have a suggestion as to how to address this issue?

Thanks!
Elise

@emmanuelaaaaa
Copy link

Hello,

I have been coming across similar issues so I was wondering if you managed to resolve this, @emrizzi . For me, it's not just the range that has the infinite values but the "expression" values as well (assay(sce, "exprs") from the CATALYST object). I guess that was similar for you? In which case, how did you use them on the downstream analysis? FlowSOM clustering doesn't accept Inf values and I'm pretty sure TSNE/UMAP doesn't either (if you are unlucky and the subsampling done for the TSNE/UMAP includes the cells which have Inf in some markers).

Many thanks and best wishes,
Emma

@SofieVG
Copy link
Member

SofieVG commented Jul 16, 2020 via email

@emmanuelaaaaa
Copy link

Hi Sofie,
That is very helpful! Thanks!
As a side question to the range of the expression observed, I also get some negative values after normalisation, that could also introduce some problems on the downstream analysis. Do you think I can replace those by 0? Is there any reason you can think, that I shouldn't?
Many thanks and best wishes,
Emma

@tomashhurst
Copy link

@emmanuelaaaaa @emrizzi @SofieVG this is interesting. @ghar1821 and I have occasionally found the odd couple of cells that have been given extremely high values (or extremely high negative values) after alignment. So perhaps the max range recorded there is because you have a couple of cells with extreme values. We had a quick look into it, but aren't sure why they change -- partly because there are only very few cells where this happens. @SofieVG did you figure out any possible causes? @ghar1821 and I just filtered them out of our dataset before we proceeded with the analysis.

In terms of a workaround solution for the ranges, you could pull the files into R, modify the max values directly (as Sofie suggested, using 99.9th percentile or something similar), and then re-export as an FCS file. You could run it in a loop over all the samples to save you having to sit there and modify each sample. Here is a quick bit of code that could probably do it (I've just pulled some bits out of https://github.com/sydneycytometry/CSV-to-FCS, but I haven't tested this in R, so good chance it won't work perfectly as is):

Read the FCS file into R

library('flowCore')

# 'file' here is the name of an FCS file in your working directory

dat <- exprs(read.FCS(file, transformation = FALSE))
dat <- dat[1:nrow(dat),1:ncol(dat)]

# dat is a now a data.frame of parameters (cols) vs cells (rows)

Normally you could calculate and save the max and min of each column like this:

      metadata <- data.frame(name=dimnames(dat)[[2]],desc=paste('column',dimnames(dat)[[2]],'from dataset')) # or copy the column metadata from when the FCS file gets read in
  
      #metadata$range <- apply(apply(dat,2,range),2,diff)
      metadata$maxRange <- apply(dat,2,max) # uses 'apply' to calculate the max of each column of the table
      metadata$minRange <- apply(dat,2,min) # uses 'apply' to calculate the min of each column of the table

But in this case you could replace 'min' and 'max' with something finds the 99.9th percentile (instead of max) and let's say the 0.1th percentile (instead of min). The quantile function should do this:

metadata$maxRange <- apply(dat, 2, quantile(x, probs = .999))
metadata$minRange <- apply(dat, 2, quantile(x, probs = 0.001))

I don't often calculate metadata$range but you might need it for CytoBank -- it could be calculated as the 99.9th percentile minus the 0.1th percentile, and calculated using apply as above.

You could also just use an expected max/min-- i.e. 262000 for flow data (or ~2x10^4 ish for CyTOF data) and whatever a typical minimum after compensation is (-1000?).

Then you can construct a flowFrame and save the FCS file

      dat.ff <- new("flowFrame",exprs=as.matrix(dat), parameters=AnnotatedDataFrame(metadata))
      write.FCS(dat.ff, paste0("Sample.fcs"))

It's a bit more fiddling with the files, but shouldn't be too difficult to setup in a reproducible script. If it helps, tomorrow I can test the above code and re-post it a working version here. Important to mention, I've never taken FCS files from R into CytoBank, so I'm not sure if other issues might come up.

@thinkCara
Copy link

I have just run into this very problem when using my CytoNormed files on Cytobank. Does anyone have a tested R script that fixes the infinite maxRange problem?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants