Token scoring for other languages #2636

FOLSc · 2024-12-17T01:56:35Z

"The C++ version of Whisper can generate probability scores for subtokens or tokens for English speech. So I am wondering if it can be used to do similar scoring for other languages, such as French, Japanese, Chinese, etc. I have tried to score input speech from another language, but it turns out to be the probabilities of English translation tokens or some corrupted text with scores (changing different encoding formats doesn't help). What can I do to get it to work properly?"

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Token scoring for other languages #2636

Token scoring for other languages #2636

FOLSc commented Dec 17, 2024

Token scoring for other languages #2636

Token scoring for other languages #2636

Comments

FOLSc commented Dec 17, 2024