Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug in bulk_expression_QC.ipynb RLE candidate outlier output #533

Open
grennfp opened this issue Feb 17, 2023 · 1 comment
Open

Bug in bulk_expression_QC.ipynb RLE candidate outlier output #533

grennfp opened this issue Feb 17, 2023 · 1 comment

Comments

@grennfp
Copy link
Contributor

grennfp commented Feb 17, 2023

I've been testing the bulk_expression_QC.ipynb notebook to conduct QC on a 559 sample RNASeq dataset. All three parts (Hierarchical clustering, D-statistic correlations, RLE) produce candidate outliers, but none of the listed outliers overlap, leading to a final outlier count of zero.

I noticed the samples on the right of the RLE plot (with high IQRs) are not the same as the samples printed out to the log file. The samples printed to the log file for the RLE step are the last 5% of the samples in the input TPM matrix, which aren't the actual RLE outlier samples.

I believe the issue lies in this line of code:

RLEFilterList <- unique(bymedian[((length(bymedian)-ExpPerSample*RLEFilterLength)+1):length(bymedian)]) #filtered

replacing bymedian with levels(bymedian) seemed to fix the issue. Using this code gave me the correct RLE outlier samples:

RLEFilterList <- unique(levels(bymedian)[((length(levels(bymedian))-(RLEFilterLength))+1):(length(levels(bymedian))+1)])

The correct RLE outliers produced from this change overlapped with candidate outliers from the hierarchical clustering and D-statistic steps, unlike before the change when there were no overlaps.

@gaow
Copy link
Contributor

gaow commented Feb 17, 2023

hmm @grennfp I think it is worth a zoom discussion ... maybe between you and @hsun3163 is good enough for starters then Hao can fill me in. Could you guys arrange something offline for next week? You can also show this to us during the Monday WG meeting. Thanks for looking carefully at the diagnosis plot and catching the possible bug!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants