Bug in bulk_expression_QC.ipynb RLE candidate outlier output #533

grennfp · 2023-02-17T17:08:24Z

I've been testing the bulk_expression_QC.ipynb notebook to conduct QC on a 559 sample RNASeq dataset. All three parts (Hierarchical clustering, D-statistic correlations, RLE) produce candidate outliers, but none of the listed outliers overlap, leading to a final outlier count of zero.

I noticed the samples on the right of the RLE plot (with high IQRs) are not the same as the samples printed out to the log file. The samples printed to the log file for the RLE step are the last 5% of the samples in the input TPM matrix, which aren't the actual RLE outlier samples.

I believe the issue lies in this line of code:

RLEFilterList <- unique(bymedian[((length(bymedian)-ExpPerSample*RLEFilterLength)+1):length(bymedian)]) #filtered

replacing bymedian with levels(bymedian) seemed to fix the issue. Using this code gave me the correct RLE outlier samples:

RLEFilterList <- unique(levels(bymedian)[((length(levels(bymedian))-(RLEFilterLength))+1):(length(levels(bymedian))+1)])

The correct RLE outliers produced from this change overlapped with candidate outliers from the hierarchical clustering and D-statistic steps, unlike before the change when there were no overlaps.

The text was updated successfully, but these errors were encountered:

gaow · 2023-02-17T20:41:25Z

hmm @grennfp I think it is worth a zoom discussion ... maybe between you and @hsun3163 is good enough for starters then Hao can fill me in. Could you guys arrange something offline for next week? You can also show this to us during the Monday WG meeting. Thanks for looking carefully at the diagnosis plot and catching the possible bug!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bug in bulk_expression_QC.ipynb RLE candidate outlier output #533

Bug in bulk_expression_QC.ipynb RLE candidate outlier output #533

grennfp commented Feb 17, 2023

gaow commented Feb 17, 2023

Bug in bulk_expression_QC.ipynb RLE candidate outlier output #533

Bug in bulk_expression_QC.ipynb RLE candidate outlier output #533

Comments

grennfp commented Feb 17, 2023

gaow commented Feb 17, 2023