v1.24 - The World's Highest Wingsuit Jump
What's Changed
New features
Notably, there's a new optimization for dynamic bucketing sampler in multi-GPU training - it will choose the same (or the closest possible) bucket on each DDP rank to keep the total training step times closer. The expected speedup is dependent on the model and the number of GPUs. We observed 8 and 13% speedups across two experiments compared to non-synchronized bucket selection. The new option is called sync_buckets
and is enabled by default.
- Dynamic bucket selection RNG sync by @pzelasko in #1341
- Add new sampler: weighted sampler by @marcoyang1998 in #1344
reverb_rir
: support Cut input and in memory data by @pzelasko in #1332
Recipes
Other improvements
- Missing 'subset' parameter by @daniel-dona in #1336
- Fix describe on cuts by @keeofkoo in #1340
- Use libsndfile in recording chunk dataset by @pzelasko in #1335
- Fix librispeech manifest caching by @haerski in #1343
- Fix one-off edge case in split_lazy by @pzelasko in #1347
- Increase the start diff tolerance for feature loading by @pzelasko in #1349
- More test coverage for lhotse subset by @pzelasko in #1345
New Contributors
- @keeofkoo made their first contribution in #1340
- @haerski made their first contribution in #1343
- @Triplecq made their first contribution in #1330
Full Changelog: v1.23...v1.24