Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Token scoring for other languages #2636

Open
FOLSc opened this issue Dec 17, 2024 · 0 comments
Open

Token scoring for other languages #2636

FOLSc opened this issue Dec 17, 2024 · 0 comments

Comments

@FOLSc
Copy link

FOLSc commented Dec 17, 2024

"The C++ version of Whisper can generate probability scores for subtokens or tokens for English speech. So I am wondering if it can be used to do similar scoring for other languages, such as French, Japanese, Chinese, etc. I have tried to score input speech from another language, but it turns out to be the probabilities of English translation tokens or some corrupted text with scores (changing different encoding formats doesn't help). What can I do to get it to work properly?"

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant