On a large GPU cluster, DynamicBucketingSampler.next spend a lot of time #1399

shushanxingzhe · 2024-10-09T10:38:22Z

@pzelasko When I use DynamicBucketingSampler on a 600 gpu card cluster, the code at

Line 297 in e2b149d

for _ in range(self.world_size):

waste a lot of time. since the world_size 600 need a lot of time to loop. Could you please give me any advice to reduce the time on that.

pzelasko · 2024-10-09T10:50:53Z

I suggest either moving to Lhotse shar format (see the tutorial in examples directory); or sharding your manifest into a lot of small chunks and using CutSet.from_files with random seed set to „trng”, calling .repeat() in CutSet (makes it infinite) and then manually overriding rank to 0 and world size to 1 in the sampler on every GPU. Finally you can wrap both the sampler and dataset into IterableDatasetWrapper (but with non shar data maybe its not needed). This will cause the order of data iteration to be different on each dataloading worker instead of trying to deduplicate. In practice it works just as well but you need to count training steps instead of epochs.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

On a large GPU cluster, DynamicBucketingSampler.next spend a lot of time #1399

On a large GPU cluster, DynamicBucketingSampler.next spend a lot of time #1399

shushanxingzhe commented Oct 9, 2024

pzelasko commented Oct 9, 2024 •

edited

Loading

On a large GPU cluster, DynamicBucketingSampler.__next__ spend a lot of time #1399

On a large GPU cluster, DynamicBucketingSampler.__next__ spend a lot of time #1399

Comments

shushanxingzhe commented Oct 9, 2024

pzelasko commented Oct 9, 2024 • edited Loading

On a large GPU cluster, DynamicBucketingSampler.next spend a lot of time #1399

On a large GPU cluster, DynamicBucketingSampler.next spend a lot of time #1399

pzelasko commented Oct 9, 2024 •

edited

Loading