Release v1.22 - Sherpa's Paradise · lhotse-speech/lhotse

What's Changed

New features

Extending Lhotse dataloading to text/multimodal data by @pzelasko in #1295

As an experimental feature, we are extending the API of Lhotse samplers to enable key sampling features for non-audio data such as text. That means text (and other) data can be dynamically multiplexed and bucketed in the same way as audio data with some lightweight wrappers. Please refer to new documentation here: https://lhotse.readthedocs.io/en/latest/datasets.html#customizing-sampling-constraints

Multi-channel support improvements
- Fix loading multi-channel custom recording fields in multi cuts by @pzelasko in #1298
- Channel selection for multi-channel custom recording fields by @pzelasko in #1299

Lhotse MultiCuts:

are now exportable into Lhotse Shar format
gained a new method cut = cut.with_channels([0, 1, ...]) to modify the channels they refer to
can have multi-channel custom Recordings with channels selectable via a special custom key (e.g., if defining cut.target_recording, audio can be read via cut.load_target_recording() and channels will be auto-selected by looking up cut.target_recording_channel_selector).

Recipes

Add new recipe: speechio by @yuekaizhang in #1297
tedlium2 recipe by @JinZr in #1296

Other improvements

Use audio backends and export custom fields in Lhotse Shar by @pzelasko in #1290
Documentation for random seeds in lhotse + extended support of lazy r… by @pzelasko in #1291
Cutconcat fixed max duration by @swigls in #1292
Fix feature_dim of Spectrogram extractors. by @csukuangfj in #1294
fix whisper for multi-channel data by @yuekaizhang in #1289
Xfail flaky SileroVAD tests by @pzelasko in #1300

New Contributors

@swigls made their first contribution in #1292

Full Changelog: v1.21...v1.22

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v1.22 - Sherpa's Paradise

What's Changed

New features

Recipes

Other improvements

New Contributors

Contributors