-
Notifications
You must be signed in to change notification settings - Fork 21
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Difference between openCyto.count and xml.count #356
Comments
Can you paste the results from |
Sure,
|
It is likely that the gate is extended to the far left because the parser sees the gate left bound is negative (i.e. -1194). |
see RGLab/CytoML#106 for more on the reason for gate extension |
Hey @mikejiang I am getting the opposite problem. That is, I am getting far fewer cells in openCyto.count than in xml.count (second row,
The gate coordinates match well though:
When I plot the "Time Gate" with ggcyto vs FlowJo, the values in ggcyto look much more spread out than in FlowJo: Interestingly, when I look at the range of values for Time, they look similar to FlowJo though:
Do you know what could possibly be wrong? Any help would be much appreciated. |
Session info for comment above.
|
Can you provide example workspace and fcs for troubleshooting? |
Thank you so much for the quick reply, @mikejiang! I appreciate your help! The files are here: https://drive.google.com/drive/folders/1GR5QtFTmjRS3X4VL1LK5bXMhtMi1t8Q6 |
@PedroMilanezAlmeida , sorry for the delayed response. It was time channel scaling problem. Original FCS data stores time channel in units. i.e. range(fr, "data")[["Time"]]
[1] 0 4171631 and flowJo defines the time gate in the real time seconds.
So data needs to be scaled properly, and usually it is done by multiply the scale factor defined in FCS keyword
Apparently this value is not accurate for this case. There are another source we can compute the timestep from
they happen to be correct in this case, but aren't always reliable based on other use cases in the past.
which isn't always present, but whenever it is available in xml, we will use that instead. Now with the latest patches in cytolib and CytoML, the gate should be fixed. |
Thank you very much, @mikejiang! I will try the latest patches and get back to you if I still have any issues. |
Hello @mikejiang, as mentioned you have said that it is possible that the counts calculated in FlowJo will differ to those calculated using your package because of numerical errors. I wonder if you could possibly clarify if these numerical errors are from your package or from FlowJo (or from both) and which counts are more accurate ? And if there is known to be any bias, such as whether the counts calculated are generally under- or over-estimated ? I have tried to find information online but have been unsuccessful so far. Some context: I am a masters in statistics student working on a research paper with an immunological institute. To speed up the gating and data wrangling process, they have asked me to try generate count data in R using the gh_pop_get_indices() from your package to count the number of cells expressing a certain combination of cytokines, for every possible combination of cytokines. My supervisors are trying to assess from an immunological perspective whether the differences in the counts I produced in R compared to the FlowJo counts are significant or not and whether they can use the R counts instead. To guide us in our decision making, it would be very helpful to know if one is more accurate than the other and why. |
Both should be reliable and fit for purpose. When there are differences the question is where they lead to significant differences in downstream inference.
OpenCyto has been used for years to support vaccine clinical trials within the Fred Hutch and has been validated there. Best thing you can do is test it out and compare then track down the source of discrepancies to understand they're discrepancies and assess whether these lead to differences in downstream inference. |
yes, it was resolved by RGLab/flowCore#86 |
Thank you so much @gfinak for the detailed response and @mikejiang for the confirmation ! The information was very helpful and I suspect that our discrepancies are due to your second point. |
Is there a common explanation for the situation where the openCyto.counts are all 0 (except for root) while xml.count has values? Thanks in advance! |
It is likely gates are not parsed/transformed properly, which can be caused by different reasons, has to diagnose case by case by providing reproducible wsp and fcs files |
Hello,
I encountered an issue, that in some cases, the difference between the count of events gated in FlowJo and the count of events calculated by your packages (checking with
gh_pop_compare_stats()
) is significantly different (I am aware that you warned, that there can be slight differences due to numeric flaws and I also read the closed issue #256 , but I came across a case when the calculated amount was more than twice greater).Using the following code (following bioconductor manual):
the
openCyto.count
is 2253, whilexml.count
is 1116.Unfortunately I did not get the permission to give you the data files, but I will try my best with the chunks and hopefully I will soon get some other data resulting in same behavior, that I could share.
I tried to reproduce part of the process - parsed the .wsp file to get the transformations and gating parameters and tried to apply them on compensated data (for compensation i used flowCore
compensate()
).I guess there is probably no documentation or information about the meaning of xml attributes in the .wsp but with the help of Gating-ML documentation at least the rectangular and polygon gates seem to be easy to manage.
Confusing part are the transformations. I tried to work with this population described in .wsp like this (in simplified form):
The transforms nodes containing those fcs-dimensions look like this:
There are only linear transforms. The strange part is that when I applied gating without transforming data at all in this case I got the same amount of events inside the gate that was in the count attribute of the population tag. Yet when using flowWorkspace the result was twice more.
I got similarly good results with other only linearly transformed dims.
I tried to follow your thoughts in flowWorkspace and cytolib etc. but was not really successful and I understand that the whole concept is much more complex, but I only want to ask if you have any idea what could cause such difference between the counts and if my strange results could help in any way?
I checked the versions of your packages to be up to date according to bioconductor:
I understand that my issue may be closed if I did not follow the instruction of creating a new issue.
Thank you very much
The text was updated successfully, but these errors were encountered: