Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add string decoding benchmarks #352

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

chjj
Copy link
Contributor

@chjj chjj commented Feb 5, 2024

No description provided.


// LLM-generated poems in east asian languages.
// Hopefully a lot of surrogate pairs in here.
var str = `
Copy link
Collaborator

@dcousens dcousens Feb 8, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As someone who doesn't speak these languages, I don't know what the content is beyond this comment; and I'm hoping the LLM didn't mistranslate.

Is there anything we can reference that maybe is public domain (and a known quantity) instead?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was originally looking at the Thousand Character Classic -- an ancient chinese text for teaching written chinese, but I decided it wasn't long enough (but we could just .repeat it I suppose).

I did plug all of this into google translate. It's some pretty generic and generally positive/optimistic poetry. Nothing weird that I could see.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you're really worried, I have a friend literate in Japanese and several friends literate in Chinese. I could have them double check it. The Korean part will be harder. I don't know anyone literate in Korean who can read beyond a 2nd grade level.

Copy link
Contributor Author

@chjj chjj Feb 8, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I'll combine a few things. I'm finding some stuff on wikisource. The Art of War has a ton of text we can use. I'll find something for other languages too.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants