Skip to content

Latest commit

 

History

History
38 lines (33 loc) · 1.87 KB

README.md

File metadata and controls

38 lines (33 loc) · 1.87 KB

Readme

This recipe contains data preparation for the VoxPopuli dataset (pdf). At the moment, without model training.

audio per language

language Size Hrs. untranscribed Hrs. transcribed
bg 295G 17.6K -
cs 308G 18.7K 62
da 233G 13.6K -
de 379G 23.2K 282
el 305G 17.7K -
en 382G 24.1K 543
es 362G 21.4K 166
et 179G 10.6K 3
fi 236G 14.2K 27
fr 376G 22.8K 211
hr 132G 8.1K 43
hu 297G 17.7K 63
it 361G 21.9K 91
lt 243G 14.4K 2
lv 217G 13.1K -
mt 147G 9.1K -
nl 322G 19.0K 53
pl 348G 21.2K 111
pt 300G 17.5K -
ro 296G 17.9K 89
sk 201G 12.1K 35
sl 190G 11.3K 10
sv 272G 16.3K -
total 6.3T 384K 1791