-
Notifications
You must be signed in to change notification settings - Fork 42
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bug in bulk_expression_QC.ipynb RLE candidate outlier output #533
Comments
hmm @grennfp I think it is worth a zoom discussion ... maybe between you and @hsun3163 is good enough for starters then Hao can fill me in. Could you guys arrange something offline for next week? You can also show this to us during the Monday WG meeting. Thanks for looking carefully at the diagnosis plot and catching the possible bug! |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
I've been testing the bulk_expression_QC.ipynb notebook to conduct QC on a 559 sample RNASeq dataset. All three parts (Hierarchical clustering, D-statistic correlations, RLE) produce candidate outliers, but none of the listed outliers overlap, leading to a final outlier count of zero.
I noticed the samples on the right of the RLE plot (with high IQRs) are not the same as the samples printed out to the log file. The samples printed to the log file for the RLE step are the last 5% of the samples in the input TPM matrix, which aren't the actual RLE outlier samples.
I believe the issue lies in this line of code:
replacing bymedian with levels(bymedian) seemed to fix the issue. Using this code gave me the correct RLE outlier samples:
The correct RLE outliers produced from this change overlapped with candidate outliers from the hierarchical clustering and D-statistic steps, unlike before the change when there were no overlaps.
The text was updated successfully, but these errors were encountered: