Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fast str and bytes builders #1036

Open
JukkaL opened this issue Nov 26, 2023 · 0 comments
Open

Fast str and bytes builders #1036

JukkaL opened this issue Nov 26, 2023 · 0 comments
Labels

Comments

@JukkaL
Copy link
Collaborator

JukkaL commented Nov 26, 2023

Right now there is no particularly fast way to construct a str or bytes object from component items, such as code points or characters. Using a list + join(), or StringIO are probably not fast enough. These are common things to do in libraries and low-level code.

We could add native str and bytes builder classes that could be quite fast. Hypothetical example with bytes:

b = BytesBuilder()
b.append(97)  # or ord('a')
b.append(98)
b.extend(b'cd')  # Can also take other iterables
bb = b.bytes()  # b'abcd'

Here are some ideas about how to make this fast:

  • Maintain a freelist of BytesBuilder objects, so we usually wouldn't need to allocate it from the heap (or somehow stack allocate it).
  • Maintain a short fixed-size internal buffer in the builder, so that we don't need to allocate a separate temporary buffer when building small bytes objects (which is likely very common). Allocate a larger buffer only when needed.
  • Inline append() and extend() calls, since we can assume these to be performance-critical.

We can have a similar builder class for str objects, but it needs to also keep track of how many bytes per character we need. Possibly it would support giving a hint about the maximum code point value at construction. This might resemble _PyUnicodeWriter, which is used in CPython.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

1 participant