You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I suggest either moving to Lhotse shar format (see the tutorial in examples directory); or sharding your manifest into a lot of small chunks and using CutSet.from_files with random seed set to „trng”, calling .repeat() in CutSet (makes it infinite) and then manually overriding rank to 0 and world size to 1 in the sampler on every GPU. Finally you can wrap both the sampler and dataset into IterableDatasetWrapper (but with non shar data maybe its not needed). This will cause the order of data iteration to be different on each dataloading worker instead of trying to deduplicate. In practice it works just as well but you need to count training steps instead of epochs.
@pzelasko When I use DynamicBucketingSampler on a 600 gpu card cluster, the code at
lhotse/lhotse/dataset/sampling/base.py
Line 297 in e2b149d
The text was updated successfully, but these errors were encountered: