-
Notifications
You must be signed in to change notification settings - Fork 438
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
refactor: remove callback nesting from done
token parser
#1448
base: master
Are you sure you want to change the base?
Conversation
Codecov Report
@@ Coverage Diff @@
## master #1448 +/- ##
==========================================
- Coverage 80.55% 80.08% -0.47%
==========================================
Files 88 88
Lines 4716 4746 +30
Branches 871 878 +7
==========================================
+ Hits 3799 3801 +2
- Misses 642 664 +22
- Partials 275 281 +6
Continue to review full report at Codecov.
|
d0c2768
to
ec349de
Compare
I also updated the benchmark to account for different packet length values, and the same improvement is still visible: Before
After
|
This is an alternative to #1444.
It only focuses on the "simple" done token parsing code right now, but I think it's also applicable for the other token parsing code.
The idea here is that it is unlikely for us to actually hit the end of the parser's buffer when parsing tokens. This is because the buffer is at least the size of a single packet, so at minimum it's 512 bytes. the default should be 4096 bytes, and the maximum that can be specified by the user is 32767 bytes.
So instead of making each individual read from the parser's buffer take a callback to resume parsing in the potential case that we might hit the end of the buffer, we can in many cases just be optimistic and parse the data synchronously. If we encounter the end of the buffer, we just throw a custom error, throw away all the data we have parsed so far for this token, and resume parsing once more data becomes available.
For small tokens, like
DONE
tokens, this means that the actual parsing logic will be fully synchronous.For more complicated tokens like
COLMETADATA
token, we will want to be a bit smarter. A table can contain up to 255 columns, so restarting the token parsing from the beginning can very quickly turn out to be extremely expensive. In these cases, we can wrap each individual column parsing into atry
/catch
block and "store" the information we parsed so far in case we reach the end of the buffer.Here's the benchmark results for the changes made for the
DONE
token parser:Before
After
These fairly simple changes yield a 20% increase in parsing speed. I can imagine that for more complicated tokens where we have deep callback nesting, the improvements will be even more noticeable.