Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

refactor: remove callback nesting from done token parser #1448

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

arthurschreiber
Copy link
Collaborator

@arthurschreiber arthurschreiber commented Jun 19, 2022

This is an alternative to #1444.

It only focuses on the "simple" done token parsing code right now, but I think it's also applicable for the other token parsing code.

The idea here is that it is unlikely for us to actually hit the end of the parser's buffer when parsing tokens. This is because the buffer is at least the size of a single packet, so at minimum it's 512 bytes. the default should be 4096 bytes, and the maximum that can be specified by the user is 32767 bytes.

So instead of making each individual read from the parser's buffer take a callback to resume parsing in the potential case that we might hit the end of the buffer, we can in many cases just be optimistic and parse the data synchronously. If we encounter the end of the buffer, we just throw a custom error, throw away all the data we have parsed so far for this token, and resume parsing once more data becomes available.

For small tokens, like DONE tokens, this means that the actual parsing logic will be fully synchronous.

For more complicated tokens like COLMETADATA token, we will want to be a bit smarter. A table can contain up to 255 columns, so restarting the token parsing from the beginning can very quickly turn out to be extremely expensive. In these cases, we can wrap each individual column parsing into a try/catch block and "store" the information we parsed so far in case we reach the end of the buffer.

Here's the benchmark results for the changes made for the DONE token parser:

Before

@arthurschreiber ➜ /workspaces/tedious (arthur/parser-cleanup ✗) $ npm run build && node benchmarks/token-parser/done-token.js 

> [email protected] build /workspaces/tedious
> rimraf lib && babel src --out-dir lib --extensions .js,.ts

Successfully compiled 99 files with Babel (2280ms).
token-parser/done-token.js tokenCount=10 n=10: 3,668.7352218759343 (minor: 0 - 0ms, major: 0 - 0ms, incremental: 0 - 0ms)
token-parser/done-token.js tokenCount=100 n=10: 954.8604065700125 (minor: 0 - 0ms, major: 0 - 0ms, incremental: 0 - 0ms)
token-parser/done-token.js tokenCount=1000 n=10: 183.801226619542 (minor: 0 - 0ms, major: 0 - 0ms, incremental: 0 - 0ms)
token-parser/done-token.js tokenCount=10000 n=10: 75.18276196401524 (minor: 0 - 0ms, major: 0 - 0ms, incremental: 0 - 0ms)
token-parser/done-token.js tokenCount=10 n=100: 9,526.590380906426 (minor: 0 - 0ms, major: 0 - 0ms, incremental: 0 - 0ms)
token-parser/done-token.js tokenCount=100 n=100: 2,041.364953777067 (minor: 0 - 0ms, major: 0 - 0ms, incremental: 0 - 0ms)
token-parser/done-token.js tokenCount=1000 n=100: 764.0651889722318 (minor: 0 - 0ms, major: 0 - 0ms, incremental: 0 - 0ms)
token-parser/done-token.js tokenCount=10000 n=100: 107.17033351123787 (minor: 0 - 0ms, major: 0 - 0ms, incremental: 0 - 0ms)
token-parser/done-token.js tokenCount=10 n=1000: 20,246.137558291917 (minor: 0 - 0ms, major: 0 - 0ms, incremental: 0 - 0ms)
token-parser/done-token.js tokenCount=100 n=1000: 7,630.448759434392 (minor: 0 - 0ms, major: 0 - 0ms, incremental: 0 - 0ms)
token-parser/done-token.js tokenCount=1000 n=1000: 1,047.0970340465528 (minor: 0 - 0ms, major: 0 - 0ms, incremental: 0 - 0ms)
token-parser/done-token.js tokenCount=10000 n=1000: 115.24599505942932 (minor: 0 - 0ms, major: 0 - 0ms, incremental: 0 - 0ms)

After

@arthurschreiber ➜ /workspaces/tedious (arthur/parser-cleanup ✗) $ npm run build && node benchmarks/token-parser/done-token.js 

> [email protected] build /workspaces/tedious
> rimraf lib && babel src --out-dir lib --extensions .js,.ts

Successfully compiled 99 files with Babel (2340ms).
token-parser/done-token.js tokenCount=10 n=10: 4,436.1045678568735 (minor: 0 - 0ms, major: 0 - 0ms, incremental: 0 - 0ms)
token-parser/done-token.js tokenCount=100 n=10: 1,268.7933673572963 (minor: 0 - 0ms, major: 0 - 0ms, incremental: 0 - 0ms)
token-parser/done-token.js tokenCount=1000 n=10: 232.13994212844096 (minor: 0 - 0ms, major: 0 - 0ms, incremental: 0 - 0ms)
token-parser/done-token.js tokenCount=10000 n=10: 79.77888993450648 (minor: 0 - 0ms, major: 0 - 0ms, incremental: 0 - 0ms)
token-parser/done-token.js tokenCount=10 n=100: 12,205.529397749886 (minor: 0 - 0ms, major: 0 - 0ms, incremental: 0 - 0ms)
token-parser/done-token.js tokenCount=100 n=100: 2,278.372769763551 (minor: 0 - 0ms, major: 0 - 0ms, incremental: 0 - 0ms)
token-parser/done-token.js tokenCount=1000 n=100: 802.254450530632 (minor: 0 - 0ms, major: 0 - 0ms, incremental: 0 - 0ms)
token-parser/done-token.js tokenCount=10000 n=100: 126.23935916922221 (minor: 0 - 0ms, major: 0 - 0ms, incremental: 0 - 0ms)
token-parser/done-token.js tokenCount=10 n=1000: 23,414.78400095907 (minor: 0 - 0ms, major: 0 - 0ms, incremental: 0 - 0ms)
token-parser/done-token.js tokenCount=100 n=1000: 8,906.37150238998 (minor: 0 - 0ms, major: 0 - 0ms, incremental: 0 - 0ms)
token-parser/done-token.js tokenCount=1000 n=1000: 1,247.3904980590135 (minor: 0 - 0ms, major: 0 - 0ms, incremental: 0 - 0ms)
token-parser/done-token.js tokenCount=10000 n=1000: 140.0829566142033 (minor: 0 - 0ms, major: 0 - 0ms, incremental: 0 - 0ms)

These fairly simple changes yield a 20% increase in parsing speed. I can imagine that for more complicated tokens where we have deep callback nesting, the improvements will be even more noticeable.

@codecov
Copy link

codecov bot commented Jun 19, 2022

Codecov Report

Merging #1448 (ec349de) into master (7ebd236) will decrease coverage by 0.46%.
The diff coverage is 60.00%.

@@            Coverage Diff             @@
##           master    #1448      +/-   ##
==========================================
- Coverage   80.55%   80.08%   -0.47%     
==========================================
  Files          88       88              
  Lines        4716     4746      +30     
  Branches      871      878       +7     
==========================================
+ Hits         3799     3801       +2     
- Misses        642      664      +22     
- Partials      275      281       +6     
Impacted Files Coverage Δ
src/token/done-token-parser.ts 60.78% <60.00%> (-39.22%) ⬇️
src/token/stream-parser.ts 71.12% <0.00%> (-2.68%) ⬇️
src/token/handler.ts 54.43% <0.00%> (-1.19%) ⬇️
src/connection.ts 65.52% <0.00%> (-0.10%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 7ebd236...ec349de. Read the comment docs.

@arthurschreiber
Copy link
Collaborator Author

I also updated the benchmark to account for different packet length values, and the same improvement is still visible:

Before

@arthurschreiber ➜ /workspaces/tedious (arthur/parser-cleanup ✗) $ npm run build && node benchmarks/token-parser/done-token.js 

> [email protected] build /workspaces/tedious
> rimraf lib && babel src --out-dir lib --extensions .js,.ts

Successfully compiled 99 files with Babel (2309ms).
token-parser/done-token.js packetLength=512 tokenCount=10 n=100: 9,126.026871951166 (minor: 0 - 0ms, major: 0 - 0ms, incremental: 0 - 0ms)
token-parser/done-token.js packetLength=4096 tokenCount=10 n=100: 9,235.429286303572 (minor: 0 - 0ms, major: 0 - 0ms, incremental: 0 - 0ms)
token-parser/done-token.js packetLength=32767 tokenCount=10 n=100: 9,153.845942603555 (minor: 0 - 0ms, major: 0 - 0ms, incremental: 0 - 0ms)
token-parser/done-token.js packetLength=512 tokenCount=100 n=100: 1,843.9692588934497 (minor: 0 - 0ms, major: 0 - 0ms, incremental: 0 - 0ms)
token-parser/done-token.js packetLength=4096 tokenCount=100 n=100: 1,810.4835106231205 (minor: 0 - 0ms, major: 0 - 0ms, incremental: 0 - 0ms)
token-parser/done-token.js packetLength=32767 tokenCount=100 n=100: 1,948.0885822372347 (minor: 0 - 0ms, major: 0 - 0ms, incremental: 0 - 0ms)
token-parser/done-token.js packetLength=512 tokenCount=1000 n=100: 652.4302684975643 (minor: 0 - 0ms, major: 0 - 0ms, incremental: 0 - 0ms)
token-parser/done-token.js packetLength=4096 tokenCount=1000 n=100: 671.8205334099173 (minor: 0 - 0ms, major: 0 - 0ms, incremental: 0 - 0ms)
token-parser/done-token.js packetLength=32767 tokenCount=1000 n=100: 715.0766196732137 (minor: 0 - 0ms, major: 0 - 0ms, incremental: 0 - 0ms)
token-parser/done-token.js packetLength=512 tokenCount=10000 n=100: 93.7723420810088 (minor: 0 - 0ms, major: 0 - 0ms, incremental: 0 - 0ms)
token-parser/done-token.js packetLength=4096 tokenCount=10000 n=100: 92.59046112588472 (minor: 0 - 0ms, major: 0 - 0ms, incremental: 0 - 0ms)
token-parser/done-token.js packetLength=32767 tokenCount=10000 n=100: 85.02996810225741 (minor: 0 - 0ms, major: 0 - 0ms, incremental: 0 - 0ms)
token-parser/done-token.js packetLength=512 tokenCount=10 n=1000: 19,232.658099709908 (minor: 0 - 0ms, major: 0 - 0ms, incremental: 0 - 0ms)
token-parser/done-token.js packetLength=4096 tokenCount=10 n=1000: 19,402.525611236793 (minor: 0 - 0ms, major: 0 - 0ms, incremental: 0 - 0ms)
token-parser/done-token.js packetLength=32767 tokenCount=10 n=1000: 18,823.786655418575 (minor: 0 - 0ms, major: 0 - 0ms, incremental: 0 - 0ms)
token-parser/done-token.js packetLength=512 tokenCount=100 n=1000: 6,533.733203903563 (minor: 0 - 0ms, major: 0 - 0ms, incremental: 0 - 0ms)
token-parser/done-token.js packetLength=4096 tokenCount=100 n=1000: 6,292.0264106550185 (minor: 0 - 0ms, major: 0 - 0ms, incremental: 0 - 0ms)
token-parser/done-token.js packetLength=32767 tokenCount=100 n=1000: 6,998.0184061180325 (minor: 0 - 0ms, major: 0 - 0ms, incremental: 0 - 0ms)
token-parser/done-token.js packetLength=512 tokenCount=1000 n=1000: 892.2880869263913 (minor: 0 - 0ms, major: 0 - 0ms, incremental: 0 - 0ms)
token-parser/done-token.js packetLength=4096 tokenCount=1000 n=1000: 858.4501710413463 (minor: 0 - 0ms, major: 0 - 0ms, incremental: 0 - 0ms)
token-parser/done-token.js packetLength=32767 tokenCount=1000 n=1000: 982.4491115307425 (minor: 0 - 0ms, major: 0 - 0ms, incremental: 0 - 0ms)
token-parser/done-token.js packetLength=512 tokenCount=10000 n=1000: 93.75388861538904 (minor: 0 - 0ms, major: 0 - 0ms, incremental: 0 - 0ms)
token-parser/done-token.js packetLength=4096 tokenCount=10000 n=1000: 98.36781801141544 (minor: 0 - 0ms, major: 0 - 0ms, incremental: 0 - 0ms)
token-parser/done-token.js packetLength=32767 tokenCount=10000 n=1000: 95.21740589231679 (minor: 0 - 0ms, major: 0 - 0ms, incremental: 0 - 0ms)

After

@arthurschreiber ➜ /workspaces/tedious (arthur/parser-cleanup ✗) $ npm run build && node benchmarks/token-parser/done-token.js 

> [email protected] build /workspaces/tedious
> rimraf lib && babel src --out-dir lib --extensions .js,.ts

Successfully compiled 99 files with Babel (2472ms).
token-parser/done-token.js packetLength=512 tokenCount=10 n=100: 11,432.207523833009 (minor: 0 - 0ms, major: 0 - 0ms, incremental: 0 - 0ms)
token-parser/done-token.js packetLength=4096 tokenCount=10 n=100: 12,580.203514984345 (minor: 0 - 0ms, major: 0 - 0ms, incremental: 0 - 0ms)
token-parser/done-token.js packetLength=32767 tokenCount=10 n=100: 11,999.173976863432 (minor: 0 - 0ms, major: 0 - 0ms, incremental: 0 - 0ms)
token-parser/done-token.js packetLength=512 tokenCount=100 n=100: 2,118.717190472527 (minor: 0 - 0ms, major: 0 - 0ms, incremental: 0 - 0ms)
token-parser/done-token.js packetLength=4096 tokenCount=100 n=100: 2,301.491787851525 (minor: 0 - 0ms, major: 0 - 0ms, incremental: 0 - 0ms)
token-parser/done-token.js packetLength=32767 tokenCount=100 n=100: 2,254.827020945088 (minor: 0 - 0ms, major: 0 - 0ms, incremental: 0 - 0ms)
token-parser/done-token.js packetLength=512 tokenCount=1000 n=100: 652.0477304676723 (minor: 0 - 0ms, major: 0 - 0ms, incremental: 0 - 0ms)
token-parser/done-token.js packetLength=4096 tokenCount=1000 n=100: 832.4829034859198 (minor: 0 - 0ms, major: 0 - 0ms, incremental: 0 - 0ms)
token-parser/done-token.js packetLength=32767 tokenCount=1000 n=100: 851.6297468138828 (minor: 0 - 0ms, major: 0 - 0ms, incremental: 0 - 0ms)
token-parser/done-token.js packetLength=512 tokenCount=10000 n=100: 98.5433097363429 (minor: 0 - 0ms, major: 0 - 0ms, incremental: 0 - 0ms)
token-parser/done-token.js packetLength=4096 tokenCount=10000 n=100: 109.880603285959 (minor: 0 - 0ms, major: 0 - 0ms, incremental: 0 - 0ms)
token-parser/done-token.js packetLength=32767 tokenCount=10000 n=100: 118.42038587806157 (minor: 0 - 0ms, major: 0 - 0ms, incremental: 0 - 0ms)
token-parser/done-token.js packetLength=512 tokenCount=10 n=1000: 21,992.460148947455 (minor: 0 - 0ms, major: 0 - 0ms, incremental: 0 - 0ms)
token-parser/done-token.js packetLength=4096 tokenCount=10 n=1000: 22,722.215485085333 (minor: 0 - 0ms, major: 0 - 0ms, incremental: 0 - 0ms)
token-parser/done-token.js packetLength=32767 tokenCount=10 n=1000: 18,352.24278079202 (minor: 0 - 0ms, major: 0 - 0ms, incremental: 0 - 0ms)
token-parser/done-token.js packetLength=512 tokenCount=100 n=1000: 6,101.542976492525 (minor: 0 - 0ms, major: 0 - 0ms, incremental: 0 - 0ms)
token-parser/done-token.js packetLength=4096 tokenCount=100 n=1000: 7,162.232670471326 (minor: 0 - 0ms, major: 0 - 0ms, incremental: 0 - 0ms)
token-parser/done-token.js packetLength=32767 tokenCount=100 n=1000: 8,827.152724298105 (minor: 0 - 0ms, major: 0 - 0ms, incremental: 0 - 0ms)
token-parser/done-token.js packetLength=512 tokenCount=1000 n=1000: 959.5495395454244 (minor: 0 - 0ms, major: 0 - 0ms, incremental: 0 - 0ms)
token-parser/done-token.js packetLength=4096 tokenCount=1000 n=1000: 1,121.648569455864 (minor: 0 - 0ms, major: 0 - 0ms, incremental: 0 - 0ms)
token-parser/done-token.js packetLength=32767 tokenCount=1000 n=1000: 1,285.645163002097 (minor: 0 - 0ms, major: 0 - 0ms, incremental: 0 - 0ms)
token-parser/done-token.js packetLength=512 tokenCount=10000 n=1000: 99.44789288554495 (minor: 0 - 0ms, major: 0 - 0ms, incremental: 0 - 0ms)
token-parser/done-token.js packetLength=4096 tokenCount=10000 n=1000: 112.54060101237711 (minor: 0 - 0ms, major: 0 - 0ms, incremental: 0 - 0ms)
token-parser/done-token.js packetLength=32767 tokenCount=10000 n=1000: 128.9056101654226 (minor: 0 - 0ms, major: 0 - 0ms, incremental: 0 - 0ms)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant