v1.22 - Sherpa's Paradise
What's Changed
New features
As an experimental feature, we are extending the API of Lhotse samplers to enable key sampling features for non-audio data such as text. That means text (and other) data can be dynamically multiplexed and bucketed in the same way as audio data with some lightweight wrappers. Please refer to new documentation here: https://lhotse.readthedocs.io/en/latest/datasets.html#customizing-sampling-constraints
- Multi-channel support improvements
Lhotse MultiCut
s:
- are now exportable into Lhotse Shar format
- gained a new method
cut = cut.with_channels([0, 1, ...])
to modify the channels they refer to - can have multi-channel custom Recordings with channels selectable via a special custom key (e.g., if defining
cut.target_recording
, audio can be read viacut.load_target_recording()
and channels will be auto-selected by looking upcut.target_recording_channel_selector
).
Recipes
- Add new recipe: speechio by @yuekaizhang in #1297
- tedlium2 recipe by @JinZr in #1296
Other improvements
- Use audio backends and export custom fields in Lhotse Shar by @pzelasko in #1290
- Documentation for random seeds in lhotse + extended support of lazy r… by @pzelasko in #1291
- Cutconcat fixed max duration by @swigls in #1292
- Fix feature_dim of Spectrogram extractors. by @csukuangfj in #1294
- fix whisper for multi-channel data by @yuekaizhang in #1289
- Xfail flaky SileroVAD tests by @pzelasko in #1300
New Contributors
Full Changelog: v1.21...v1.22