-
-
Notifications
You must be signed in to change notification settings - Fork 234
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add string decoding benchmarks #352
base: master
Are you sure you want to change the base?
Conversation
|
||
// LLM-generated poems in east asian languages. | ||
// Hopefully a lot of surrogate pairs in here. | ||
var str = ` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As someone who doesn't speak these languages, I don't know what the content is beyond this comment; and I'm hoping the LLM didn't mistranslate.
Is there anything we can reference that maybe is public domain (and a known quantity) instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was originally looking at the Thousand Character Classic -- an ancient chinese text for teaching written chinese, but I decided it wasn't long enough (but we could just .repeat
it I suppose).
I did plug all of this into google translate. It's some pretty generic and generally positive/optimistic poetry. Nothing weird that I could see.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you're really worried, I have a friend literate in Japanese and several friends literate in Chinese. I could have them double check it. The Korean part will be harder. I don't know anyone literate in Korean who can read beyond a 2nd grade level.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I'll combine a few things. I'm finding some stuff on wikisource. The Art of War has a ton of text we can use. I'll find something for other languages too.
No description provided.