About approx proportional sampling for DynamicBucketingSampler #1273

t13m · 2024-01-27T01:04:15Z

t13m
Jan 27, 2024

Hello! In my setting of experiment, a jump of loss is observed at the end of every epoch. After some investigation, I find that is because in the DynamicBucketingSampler the buckets for short utterance are used up more quickly than the buckets for longer ones. This situation seems related to #364 and #372 .

It seems that using BucketingSampler (not the dynamic version) would solve my problem. But it ran out the memory of my server to convert the lazy manifest to_eager().

My question is, is there any way to use DynamicBucketingSampler but also with the approx proportional sampling? or is there any way to mitigate the memory issue when using BucketingSampler? Any help would be very appreciated!

pzelasko · 2024-01-29T18:41:15Z

pzelasko
Jan 29, 2024
Maintainer

Thank you, this is a great point that I missed before. I'll look into adding proportional sampling to dynamic bucketing sampler.

1 reply

pzelasko Jan 31, 2024
Maintainer

As it happens I don't have a good idea how to incorporate proportional sampling to dynamic bucketing sampler. The algorithm in BucketingSampler depends on knowing how much duration is left for each bucket -- however in dynamic case, we don't know it, and it may so happen that each bucket is infinite anyway.

I tried inspecting how fast the buckets are depleting on a subset of librispeech, but they seem to be depleting roughly at the same tempo... although clearly some buckets are "left behind" and it's possible with a much larger data effectively we'd observe some durations significantly less frequently than others. There is also a "catching up" peak towards the end. See the plot below where X axis is mini-batch, and Y axis is total number of batches from a given bucket (each line is a different bucket).

If you have any suggestions I'm to open to them.

t13m · 2024-02-09T09:20:56Z

t13m
Feb 9, 2024
Author

Hi Piotr, thank you for your kind response. For my question, I resort to the StatelessSampler in the end, and it works like a charm, although the "epoch" disappears.

About the proportional sampling for dynamic sampler. It's true that there's no way to know the duration left in the dynamic scenario. Not a thoughtful one, but I wonder what if we can provide some kind of duration histogram before the training start?

1 reply

pzelasko Feb 9, 2024
Maintainer

Yes, you can via ‘duration_bins’ argument. It affects the bucket allocation which helps, but it doesn’t solve the problem. As Dan mentioned in the issues you linked, in the expectation the fastest depleting bucket tends to deplete in O(sqrt(N)) steps IIRC.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

About approx proportional sampling for DynamicBucketingSampler #1273

{{title}}

Replies: 2 comments 2 replies

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

Select a reply

About approx proportional sampling for DynamicBucketingSampler #1273

t13m Jan 27, 2024

Replies: 2 comments · 2 replies

pzelasko Jan 29, 2024 Maintainer

pzelasko Jan 31, 2024 Maintainer

t13m Feb 9, 2024 Author

pzelasko Feb 9, 2024 Maintainer

t13m
Jan 27, 2024

Replies: 2 comments 2 replies

pzelasko
Jan 29, 2024
Maintainer

pzelasko Jan 31, 2024
Maintainer

t13m
Feb 9, 2024
Author

pzelasko Feb 9, 2024
Maintainer