-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Infinite max range on normed FCS files #4
Comments
Hello, I have been coming across similar issues so I was wondering if you managed to resolve this, @emrizzi . For me, it's not just the range that has the infinite values but the "expression" values as well (assay(sce, "exprs") from the CATALYST object). I guess that was similar for you? In which case, how did you use them on the downstream analysis? FlowSOM clustering doesn't accept Inf values and I'm pretty sure TSNE/UMAP doesn't either (if you are unlucky and the subsampling done for the TSNE/UMAP includes the cells which have Inf in some markers). Many thanks and best wishes, |
Dear Emma and Elise,
Not yet a solution and I should certainly investigate further in this
issue, but one option as a temporary solution could be to use the
theoretical maximum that you would expect possible based on the range from
the original file or e.g. compute the 99.9% quantile for your markers and
replace all higher values (including the infinity values) by this value, as
a kind of truncation step.
…On Thu, 16 Jul 2020 at 11:56, Emma ***@***.***> wrote:
Hello,
I have been coming across similar issues so I was wondering if you managed
to resolve this, @emrizzi <https://github.com/emrizzi> . For me, it's not
just the range that has the infinite values but the "expression" values as
well (assay(sce, "exprs") from the CATALYST object). I guess that was
similar for you? In which case, how did you use them on the downstream
analysis? FlowSOM clustering doesn't accept Inf values and I'm pretty sure
TSNE/UMAP doesn't either (if you are unlucky and the subsampling done for
the TSNE/UMAP includes the cells which have Inf in some markers).
Many thanks and best wishes,
Emma
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#4 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAOS725Z42ORIUWOJTAN2UDR33FENANCNFSM4KINGYKA>
.
|
Hi Sofie, |
@emmanuelaaaaa @emrizzi @SofieVG this is interesting. @ghar1821 and I have occasionally found the odd couple of cells that have been given extremely high values (or extremely high negative values) after alignment. So perhaps the max range recorded there is because you have a couple of cells with extreme values. We had a quick look into it, but aren't sure why they change -- partly because there are only very few cells where this happens. @SofieVG did you figure out any possible causes? @ghar1821 and I just filtered them out of our dataset before we proceeded with the analysis. In terms of a workaround solution for the ranges, you could pull the files into R, modify the max values directly (as Sofie suggested, using 99.9th percentile or something similar), and then re-export as an FCS file. You could run it in a loop over all the samples to save you having to sit there and modify each sample. Here is a quick bit of code that could probably do it (I've just pulled some bits out of https://github.com/sydneycytometry/CSV-to-FCS, but I haven't tested this in R, so good chance it won't work perfectly as is): Read the FCS file into R
Normally you could calculate and save the max and min of each column like this:
But in this case you could replace 'min' and 'max' with something finds the 99.9th percentile (instead of max) and let's say the 0.1th percentile (instead of min). The quantile function should do this:
I don't often calculate You could also just use an expected max/min-- i.e. 262000 for flow data (or ~2x10^4 ish for CyTOF data) and whatever a typical minimum after compensation is (-1000?). Then you can construct a flowFrame and save the FCS file
It's a bit more fiddling with the files, but shouldn't be too difficult to setup in a reproducible script. If it helps, tomorrow I can test the above code and re-post it a working version here. Important to mention, I've never taken FCS files from R into CytoBank, so I'm not sure if other issues might come up. |
I have just run into this very problem when using my CytoNormed files on Cytobank. Does anyone have a tested R script that fixes the infinite maxRange problem? |
Hello - Firstly thank you so much for this code, it will be a game changer for analyzing CyTOF data between batches.
It has worked perfectly for me to normalize the FCS files and analyze the resulting files using other R packages, however some of my collaborators do not have coding experience and prefer the user-friendly versions of viSNE and FlowSOM through Cytobank. I've had trouble getting the normalized FCS files to be compatible with Cytobank. Originally I thought it may have been an issue with my FCS files, so I then normalized the flow repository files provided and I think it is an issue with the normed output files.
The algorithm changes the max range for the expression of each channel in a way that causes infinite outputs for some channels as show below:
Unfortunately the code in Cytobank requires the max range to be a finite value in order to do any higher order analyses (viSNE, FlowSOM, CITRUS, etc.). I've tried to play around with the code a bit to manually set the max range but haven't been successful. Do you have a suggestion as to how to address this issue?
Thanks!
Elise
The text was updated successfully, but these errors were encountered: