Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Faster tokenizer #137

Open
wants to merge 7 commits into
base: main
Choose a base branch
from
Open

Faster tokenizer #137

wants to merge 7 commits into from

Conversation

LoganDark
Copy link

This one is faster

Benchmark 12700 tokens...
Encode 0.361 MB/s
Decode 12700000.0 MB/s
Encode 2.292 MB/s
Decode 12700000.0 MB/s
Encode 4.277 MB/s
Decode 12700000.0 MB/s
Unit test...
All OK

Benchmark 317500 tokens...
Encode 0.359 MB/s
Decode 17.167 MB/s
Encode 2.143 MB/s
Decode 17.687 MB/s
Encode 3.477 MB/s
Decode 26.242 MB/s
Unit test...
All OK

Not bad for python, I guess.

This one is faster
Sacrifice little runtime performance (~10%) for much faster
loading (~50%).
@LoganDark LoganDark force-pushed the fast-tokenizer branch 2 times, most recently from 20f0c2b to c0cce90 Compare June 5, 2023 00:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant