Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix plot_cnv for contigs factors with NA labels #248

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

jan-glx
Copy link

@jan-glx jan-glx commented Jul 20, 2020

If plot_cnv is called with an infercnv object of which the C_CHR column of the gene_order slot is a factor which has NA as a label it fails like this:

STEP 15: Clustering samples (not defining tumor subclusters)

INFO [2020-07-17 23:02:32] define_signif_tumor_subclusters(p_val=0.1
INFO [2020-07-17 23:02:32] define_signif_tumor_subclusters(), tumor: 3
INFO [2020-07-17 23:02:36] cut tree into: 1 groups
INFO [2020-07-17 23:02:36] -processing 3,3_s1
INFO [2020-07-17 23:02:36] define_signif_tumor_subclusters(), tumor: 1
INFO [2020-07-17 23:04:08] cut tree into: 1 groups
INFO [2020-07-17 23:04:08] -processing 1,1_s1
INFO [2020-07-17 23:04:08] define_signif_tumor_subclusters(), tumor: 2
INFO [2020-07-17 23:04:33] cut tree into: 1 groups
INFO [2020-07-17 23:04:33] -processing 2,2_s1
INFO [2020-07-17 23:04:33] define_signif_tumor_subclusters(), tumor: 0
INFO [2020-07-18 04:04:39] cut tree into: 1 groups
INFO [2020-07-18 04:04:39] -processing 0,0_s1
INFO [2020-07-18 04:04:39] -mirroring for hspike
INFO [2020-07-18 04:04:39] define_signif_tumor_subclusters(p_val=0.1
INFO [2020-07-18 04:04:39] define_signif_tumor_subclusters(), tumor: spike_tumor_cell_0
INFO [2020-07-18 04:04:43] cut tree into: 1 groups
INFO [2020-07-18 04:04:43] -processing spike_tumor_cell_0,spike_tumor_cell_0_s1
INFO [2020-07-18 04:04:43] define_signif_tumor_subclusters(), tumor: simnorm_cell_0
INFO [2020-07-18 04:04:44] cut tree into: 1 groups
INFO [2020-07-18 04:04:44] -processing simnorm_cell_0,simnorm_cell_0_s1
INFO [2020-07-18 04:08:36] ::plot_cnv:Start
INFO [2020-07-18 04:08:36] ::plot_cnv:Current data dimensions (r,c)=10121,10773 Total=109784435.602225 Min=0.108562602149896 Max=9.87176157420995.
INFO [2020-07-18 04:08:38] ::plot_cnv:Depending on the size of the matrix this may take a moment.
INFO [2020-07-18 04:11:54] plot_cnv(): auto thresholding at: (0.506649 , 1.507125)

Error in rep(contig_name, contig_tbl[contig_name]) : invalid 'times' argument
5.
	
FUN(X[[i]], ...)
4.
	
lapply(contig_labels, function(contig_name) { rep(contig_name, contig_tbl[contig_name]) })
3.
	
unlist(lapply(contig_labels, function(contig_name) { rep(contig_name, contig_tbl[contig_name]) }))
2.
	
plot_cnv(infercnv_obj, k_obs_groups = k_obs_groups, cluster_by_groups = cluster_by_groups, cluster_references = cluster_references, out_dir = out_dir, title = "Preliminary infercnv (pre-noise filtering)", output_filename = "infercnv.preliminary", output_format = output_format, write_expr_matrix = TRUE, ...
1.
	
infercnv::run(infercnv_obj, cutoff = 0.1, out_dir = tempfile(), cluster_by_groups = TRUE, denoise = TRUE, HMM = TRUE)

While this is likely a rare edge case it can perhaps easily be fixed ( if this is the only issue) by simplifying the code like this.
To see how this works look at the following example code:

Normally everything is ok:

contig_tbl <- table(factor(c("1", "2", "3", "2", "3", "3", NA, NA) ))
contig_labels <- names(contig_tbl)
rep(names(contig_tbl), contig_tbl) # new
# [1] "1" "2" "2" "3" "3" "3"
unlist(lapply(contig_labels, function(contig_name) {rep(contig_name, contig_tbl[contig_name])})) # old
# [1] "1" "2" "2" "3" "3" "3"

But if some user thinks NA is an appropriate contig (chromosome) name the old code breaks:

contig_tbl <- table(factor(c("1", "2", "3", "2", "3", "3", NA, NA), exclude="" )) # allow NA labels
contig_labels <- names(contig_tbl)
rep(names(contig_tbl), contig_tbl) # new
# [1] "1" "2" "2" "3" "3" "3" NA  NA 
unlist(lapply(contig_labels, function(contig_name) {rep(contig_name, contig_tbl[contig_name])})) # old
#Error in rep(contig_name, contig_tbl[contig_name]) : 
#  invalid 'times' argument

I did not tests this patch locally & hope there is CI running that will check this.

kind regards,
Jan

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant