Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature: Parallel merge #19

Open
giang-nghg opened this issue Apr 13, 2020 · 3 comments
Open

Feature: Parallel merge #19

giang-nghg opened this issue Apr 13, 2020 · 3 comments

Comments

@giang-nghg
Copy link
Contributor

Batch merging currently process batches sequentially in order to maintain the ordering of samples in the original input. But in cases this is not necessary, batches can be processed in parallel to speed up the entire process.

@Zhicheng-Liu
Copy link
Collaborator

This is of low priority, imho. There are work around for some users who have access to compute clusters. It would be more efficient in terms of overall time to break down a large number of files to be merged into multiple chunks and merge them in separate jobs in multiple steps. If designed carefully, the ordering of samples in the original input can be still easily maintained.

@Zhicheng-Liu
Copy link
Collaborator

Even with batch processing in parallel, the ordering of samples can still be maintained. That's the contract we promised and we should not break it. Most users would expect that imho.

@martinghunt
Copy link
Member

I wouldn't care about the order of the samples in the final merged VCF file.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants