Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add pseudo-labeling based semi-supervised training recipe #1721

Open
wants to merge 1 commit into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
118 changes: 118 additions & 0 deletions egs/librispeech/PL/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,118 @@
# Introduction

This is a pseudo-labeling based semi-supervised ASR recipe for the LibriSpeech dataset. The ASR model is Zipformer Transducer. The labeled data is Labeled data is LibriSpeech train-clean-100. Unlabeled data can be LibriSpeech "train-clean-360 + train-other-500" for conventional semi-supervised learning or TedLium3 training set for unsupervised domain adaptation.

## Description of the recipe

### Preparation of data

The data required in this recipe is the same with LibriSpeech/TedLium3 ASR recipe. And the tokenizer of LibriSpeech is used to build the model. Therefore, we can reuse the `prepare.sh` scripts in those recipes.

### Supervised training for the seed ASR model

Firstly, we need to perform supervised training on the LibriSpeech train-clean-100 subset to generate the seed model for the following pseudo-labeling based semi-supervsed training.

```
export CUDA_VISIBLE_DEVICES="0,1,2,3"
./zipformer/train_seed.py \
--world-size 4 \
--num-epochs 70 \
--start-epoch 1 \
--use-fp16 1 \
--exp-dir zipformer/exp_seed \
--max-duration 1000
```

For better performance of the seed model, we average the checkpoints as follows:

```
./zipformer/generate_averaged_model.py \
--epoch 70 \
--avg 30 \
--exp-dir ./zipformer/exp_seed
```

The above command generates the final seed model `./zipformer/exp_seed/epoch-70-avg-30.pt`

### Semi-supervised training for the final ASR model

Then, we peform semi-supervised training with the seed model as the initialization.

- Conventional semi-supervised learning setting where unlabeled data is "train-clean-360 + train-other-500":

```
./zipformer/train_pl.py \
--world-size 4 \
--num-epochs 20 \
--start-epoch 1 \
--use-fp16 1 \
--exp-dir zipformer/exp_pl_librispeech \
--max-duration 1000 \
--seed-model-path "zipformer/exp_seed/epoch-70-avg-30.pt" \
--unlabeled-dataset "librispeech"
```

- Unsupervised domain adaptation setting where unlabeled data is TedLium3 training set:

```
./zipformer/train_pl.py \
--world-size 4 \
--num-epochs 20 \
--start-epoch 1 \
--use-fp16 1 \
--exp-dir zipformer/exp_pl_tedlium \
--max-duration 1000 \
--seed-model-path "zipformer/exp_seed/epoch-70-avg-30.pt" \
--unlabeled-dataset "tedlium"
```

### Decode

Finally, we decode the ASR model to evaluate the performance.

- Evaluate on the LibriSpeech dataset:

```
./zipformer/decode.py \
--epoch 20 \
--avg 10 \
--exp-dir ./zipformer/exp_pl_librispeech \
--max-duration 600 \
--decoding-method modified_beam_search \
--beam-size 4 \
--dataset "librispeech"
```

- Evaluate on the TedLium3 dataset:

```
./zipformer/decode.py \
--epoch 20 \
--avg 10 \
--exp-dir ./zipformer/exp_pl_tedlium \
--max-duration 600 \
--decoding-method modified_beam_search \
--beam-size 4 \
--dataset "tedlium"
```

## Results

- Conventional semi-supervised learning (LibriSpeech 100h/LibriSpeech 860h)

| Model | test-clean | test-other | comment |
|-------------------------|------------|------------|---------------------|
| supervised seed model | 5.45 | 13.7 | --epoch 70 --avg 30 |
| pseudo-labeling model | 4.33 | 9.61 | --epoch 20 --avg 10 |

- Unsupervised domain adaptation (LibriSpeech 100h/TedLium3)

| Model | tedlium3 dev | tedlium3 test | comment |
|-------------------------|------------|------------|---------------------|
| supervised seed model | 18.29 | 18.16 | --epoch 70 --avg 30 |
| pseudo-labeling model | 14.97 | 14.65 | --epoch 20 --avg 10 |


## Pre-trained models and logs

You can find the pre-trained models, training logs, tensorboard logs, decoding logs and decoding results at <https://huggingface.co/zhu-han/icefall-pl-librispeech-zipformer-medium-2023-08-06>
Loading
Loading