Data requirements and recommendations for training large-scaled Zipformer2 #1580

bharathraj-v · 2024-04-03T15:17:02Z

bharathraj-v
Apr 3, 2024

Hi,

We have trained normal-scaled zipformer2-transducer on around 1k hours, a subset of a dataset of 4-5k hours of telugu language data which is low quality - lossy 8k sr with 20-30kbps bitrate - and somewhat weakly supervised with an error rate of about 4-7%, and have gotten good results compared to conformer_ctc3, zipformer2_ctc on an unseen test set of 800 samples from the same dataset (in-domain) and 800 samples from google fleurs data (OOD). More details on the comparison:

Model	in-domain WER	in-domain CER	OOD WER	OOD CER
ConformerCTC3	0.41	0.20	0.60	0.24
Zipformer2 Transducer	0.36	0.20	0.61	0.26
Zipformer2 Transducer + RNN_LM (Shallow fusion)	0.34	0.18	0.48	0.26
Zipformer2 CTC	0.43	0.22	0.58	0.23

We also fine-tuned the librispeech large-scaled checkpoint for zipformer2 with the same data to compare the performance and it performed worse than training from scratch. The reason we tried that was that, in a separate experiment, NVIDIA NeMo Fastconformer-CTC en_large fine-tuned on the same data performed very well even without an LM so we figured fine-tuning could help. Could fine-tuning the gigaspeech zipformer checkpoint have gotten better performance?

Model	WER	CER	OOD WER	OOD CER
NeMo Fastconformer CTC en_large checkpoint	0.32	0.15	0.51	0.20
Zipformer2 Transducer Librispeech large-scaled checkpoint	0.44	0.23	0.61	0.26

Mainly, we are looking to train a large-scaled model of zipformer2 with the instructions from pr #1058 to get a better performance in terms of accuracy. What are the data requirements for the large-scaled model, is 4-5k hours of the data mentioned above + 600 hours of data from better sources enough for training a large-scaled model that is more robust and performs better?

Our goal is to prepare a base e2e telugu model that can be fine-tuned on domain-specific telephony data, any suggestions for that or the answers to the questions would be greatly appreciated and would be of much help!

Thank you!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Data requirements and recommendations for training large-scaled Zipformer2 #1580

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 0 comments

Select a reply

Data requirements and recommendations for training large-scaled Zipformer2 #1580

bharathraj-v Apr 3, 2024

Replies: 0 comments

bharathraj-v
Apr 3, 2024