Feature: Parallel merge #19

giang-nghg · 2020-04-13T12:42:17Z

Batch merging currently process batches sequentially in order to maintain the ordering of samples in the original input. But in cases this is not necessary, batches can be processed in parallel to speed up the entire process.

Zhicheng-Liu · 2020-04-14T11:02:15Z

This is of low priority, imho. There are work around for some users who have access to compute clusters. It would be more efficient in terms of overall time to break down a large number of files to be merged into multiple chunks and merge them in separate jobs in multiple steps. If designed carefully, the ordering of samples in the original input can be still easily maintained.

Zhicheng-Liu · 2020-04-14T11:03:54Z

Even with batch processing in parallel, the ordering of samples can still be maintained. That's the contract we promised and we should not break it. Most users would expect that imho.

martinghunt · 2020-04-14T11:07:51Z

I wouldn't care about the order of the samples in the final merged VCF file.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature: Parallel merge #19

Feature: Parallel merge #19

giang-nghg commented Apr 13, 2020

Zhicheng-Liu commented Apr 14, 2020

Zhicheng-Liu commented Apr 14, 2020

martinghunt commented Apr 14, 2020

Feature: Parallel merge #19

Feature: Parallel merge #19

Comments

giang-nghg commented Apr 13, 2020

Zhicheng-Liu commented Apr 14, 2020

Zhicheng-Liu commented Apr 14, 2020

martinghunt commented Apr 14, 2020