Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parallelize get_predicted_CNV_regions #350

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

blakecaldwell
Copy link

When running with HMM_report_by='cell', the time spent in get_predicted_CNV_regions dominates total runtime as each cell is processed sequentially. This pull request parallelizes the bulk of the loop to speed up runtime. When the number of cells is over 10k, this can reduce the runtime from days to hours.

The parallel framework was used because it is already imported in inferCNV_BayesNet.R. mclapply parallelizes the loop over the number of cores specifed by num_threads when infercnv::run() is called.

Note that parallel refactoring was complicated by the counter variable that ensures unique names for cnv regions. The workaround was to assign the region names in a loop at the end, after parallel execution. This has the same effect as incrementing a counter, but means that the call to .get_cnv_gene_region_bounds must be placed outside the parallel loop. The function is simple enough that running it on a single core won't significantly delay overall runtime.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant