refactor: remove callback nesting from `done` token parser #1448

arthurschreiber · 2022-06-19T10:46:11Z

This is an alternative to #1444.

It only focuses on the "simple" done token parsing code right now, but I think it's also applicable for the other token parsing code.

The idea here is that it is unlikely for us to actually hit the end of the parser's buffer when parsing tokens. This is because the buffer is at least the size of a single packet, so at minimum it's 512 bytes. the default should be 4096 bytes, and the maximum that can be specified by the user is 32767 bytes.

So instead of making each individual read from the parser's buffer take a callback to resume parsing in the potential case that we might hit the end of the buffer, we can in many cases just be optimistic and parse the data synchronously. If we encounter the end of the buffer, we just throw a custom error, throw away all the data we have parsed so far for this token, and resume parsing once more data becomes available.

For small tokens, like DONE tokens, this means that the actual parsing logic will be fully synchronous.

For more complicated tokens like COLMETADATA token, we will want to be a bit smarter. A table can contain up to 255 columns, so restarting the token parsing from the beginning can very quickly turn out to be extremely expensive. In these cases, we can wrap each individual column parsing into a try/catch block and "store" the information we parsed so far in case we reach the end of the buffer.

Here's the benchmark results for the changes made for the DONE token parser:

Before

@arthurschreiber ➜ /workspaces/tedious (arthur/parser-cleanup ✗) $ npm run build && node benchmarks/token-parser/done-token.js 

> [email protected] build /workspaces/tedious
> rimraf lib && babel src --out-dir lib --extensions .js,.ts

Successfully compiled 99 files with Babel (2280ms).
token-parser/done-token.js tokenCount=10 n=10: 3,668.7352218759343 (minor: 0 - 0ms, major: 0 - 0ms, incremental: 0 - 0ms)
token-parser/done-token.js tokenCount=100 n=10: 954.8604065700125 (minor: 0 - 0ms, major: 0 - 0ms, incremental: 0 - 0ms)
token-parser/done-token.js tokenCount=1000 n=10: 183.801226619542 (minor: 0 - 0ms, major: 0 - 0ms, incremental: 0 - 0ms)
token-parser/done-token.js tokenCount=10000 n=10: 75.18276196401524 (minor: 0 - 0ms, major: 0 - 0ms, incremental: 0 - 0ms)
token-parser/done-token.js tokenCount=10 n=100: 9,526.590380906426 (minor: 0 - 0ms, major: 0 - 0ms, incremental: 0 - 0ms)
token-parser/done-token.js tokenCount=100 n=100: 2,041.364953777067 (minor: 0 - 0ms, major: 0 - 0ms, incremental: 0 - 0ms)
token-parser/done-token.js tokenCount=1000 n=100: 764.0651889722318 (minor: 0 - 0ms, major: 0 - 0ms, incremental: 0 - 0ms)
token-parser/done-token.js tokenCount=10000 n=100: 107.17033351123787 (minor: 0 - 0ms, major: 0 - 0ms, incremental: 0 - 0ms)
token-parser/done-token.js tokenCount=10 n=1000: 20,246.137558291917 (minor: 0 - 0ms, major: 0 - 0ms, incremental: 0 - 0ms)
token-parser/done-token.js tokenCount=100 n=1000: 7,630.448759434392 (minor: 0 - 0ms, major: 0 - 0ms, incremental: 0 - 0ms)
token-parser/done-token.js tokenCount=1000 n=1000: 1,047.0970340465528 (minor: 0 - 0ms, major: 0 - 0ms, incremental: 0 - 0ms)
token-parser/done-token.js tokenCount=10000 n=1000: 115.24599505942932 (minor: 0 - 0ms, major: 0 - 0ms, incremental: 0 - 0ms)

After

@arthurschreiber ➜ /workspaces/tedious (arthur/parser-cleanup ✗) $ npm run build && node benchmarks/token-parser/done-token.js 

> [email protected] build /workspaces/tedious
> rimraf lib && babel src --out-dir lib --extensions .js,.ts

Successfully compiled 99 files with Babel (2340ms).
token-parser/done-token.js tokenCount=10 n=10: 4,436.1045678568735 (minor: 0 - 0ms, major: 0 - 0ms, incremental: 0 - 0ms)
token-parser/done-token.js tokenCount=100 n=10: 1,268.7933673572963 (minor: 0 - 0ms, major: 0 - 0ms, incremental: 0 - 0ms)
token-parser/done-token.js tokenCount=1000 n=10: 232.13994212844096 (minor: 0 - 0ms, major: 0 - 0ms, incremental: 0 - 0ms)
token-parser/done-token.js tokenCount=10000 n=10: 79.77888993450648 (minor: 0 - 0ms, major: 0 - 0ms, incremental: 0 - 0ms)
token-parser/done-token.js tokenCount=10 n=100: 12,205.529397749886 (minor: 0 - 0ms, major: 0 - 0ms, incremental: 0 - 0ms)
token-parser/done-token.js tokenCount=100 n=100: 2,278.372769763551 (minor: 0 - 0ms, major: 0 - 0ms, incremental: 0 - 0ms)
token-parser/done-token.js tokenCount=1000 n=100: 802.254450530632 (minor: 0 - 0ms, major: 0 - 0ms, incremental: 0 - 0ms)
token-parser/done-token.js tokenCount=10000 n=100: 126.23935916922221 (minor: 0 - 0ms, major: 0 - 0ms, incremental: 0 - 0ms)
token-parser/done-token.js tokenCount=10 n=1000: 23,414.78400095907 (minor: 0 - 0ms, major: 0 - 0ms, incremental: 0 - 0ms)
token-parser/done-token.js tokenCount=100 n=1000: 8,906.37150238998 (minor: 0 - 0ms, major: 0 - 0ms, incremental: 0 - 0ms)
token-parser/done-token.js tokenCount=1000 n=1000: 1,247.3904980590135 (minor: 0 - 0ms, major: 0 - 0ms, incremental: 0 - 0ms)
token-parser/done-token.js tokenCount=10000 n=1000: 140.0829566142033 (minor: 0 - 0ms, major: 0 - 0ms, incremental: 0 - 0ms)

These fairly simple changes yield a 20% increase in parsing speed. I can imagine that for more complicated tokens where we have deep callback nesting, the improvements will be even more noticeable.

codecov · 2022-06-19T11:02:42Z

Codecov Report

Merging #1448 (ec349de) into master (7ebd236) will decrease coverage by 0.46%.
The diff coverage is 60.00%.

@@            Coverage Diff             @@
##           master    #1448      +/-   ##
==========================================
- Coverage   80.55%   80.08%   -0.47%     
==========================================
  Files          88       88              
  Lines        4716     4746      +30     
  Branches      871      878       +7     
==========================================
+ Hits         3799     3801       +2     
- Misses        642      664      +22     
- Partials      275      281       +6

Impacted Files	Coverage Δ
src/token/done-token-parser.ts	`60.78% <60.00%> (-39.22%)`	⬇️
src/token/stream-parser.ts	`71.12% <0.00%> (-2.68%)`	⬇️
src/token/handler.ts	`54.43% <0.00%> (-1.19%)`	⬇️
src/connection.ts	`65.52% <0.00%> (-0.10%)`	⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 7ebd236...ec349de. Read the comment docs.

arthurschreiber · 2022-06-19T12:28:53Z

I also updated the benchmark to account for different packet length values, and the same improvement is still visible:

Before

@arthurschreiber ➜ /workspaces/tedious (arthur/parser-cleanup ✗) $ npm run build && node benchmarks/token-parser/done-token.js 

> [email protected] build /workspaces/tedious
> rimraf lib && babel src --out-dir lib --extensions .js,.ts

Successfully compiled 99 files with Babel (2309ms).
token-parser/done-token.js packetLength=512 tokenCount=10 n=100: 9,126.026871951166 (minor: 0 - 0ms, major: 0 - 0ms, incremental: 0 - 0ms)
token-parser/done-token.js packetLength=4096 tokenCount=10 n=100: 9,235.429286303572 (minor: 0 - 0ms, major: 0 - 0ms, incremental: 0 - 0ms)
token-parser/done-token.js packetLength=32767 tokenCount=10 n=100: 9,153.845942603555 (minor: 0 - 0ms, major: 0 - 0ms, incremental: 0 - 0ms)
token-parser/done-token.js packetLength=512 tokenCount=100 n=100: 1,843.9692588934497 (minor: 0 - 0ms, major: 0 - 0ms, incremental: 0 - 0ms)
token-parser/done-token.js packetLength=4096 tokenCount=100 n=100: 1,810.4835106231205 (minor: 0 - 0ms, major: 0 - 0ms, incremental: 0 - 0ms)
token-parser/done-token.js packetLength=32767 tokenCount=100 n=100: 1,948.0885822372347 (minor: 0 - 0ms, major: 0 - 0ms, incremental: 0 - 0ms)
token-parser/done-token.js packetLength=512 tokenCount=1000 n=100: 652.4302684975643 (minor: 0 - 0ms, major: 0 - 0ms, incremental: 0 - 0ms)
token-parser/done-token.js packetLength=4096 tokenCount=1000 n=100: 671.8205334099173 (minor: 0 - 0ms, major: 0 - 0ms, incremental: 0 - 0ms)
token-parser/done-token.js packetLength=32767 tokenCount=1000 n=100: 715.0766196732137 (minor: 0 - 0ms, major: 0 - 0ms, incremental: 0 - 0ms)
token-parser/done-token.js packetLength=512 tokenCount=10000 n=100: 93.7723420810088 (minor: 0 - 0ms, major: 0 - 0ms, incremental: 0 - 0ms)
token-parser/done-token.js packetLength=4096 tokenCount=10000 n=100: 92.59046112588472 (minor: 0 - 0ms, major: 0 - 0ms, incremental: 0 - 0ms)
token-parser/done-token.js packetLength=32767 tokenCount=10000 n=100: 85.02996810225741 (minor: 0 - 0ms, major: 0 - 0ms, incremental: 0 - 0ms)
token-parser/done-token.js packetLength=512 tokenCount=10 n=1000: 19,232.658099709908 (minor: 0 - 0ms, major: 0 - 0ms, incremental: 0 - 0ms)
token-parser/done-token.js packetLength=4096 tokenCount=10 n=1000: 19,402.525611236793 (minor: 0 - 0ms, major: 0 - 0ms, incremental: 0 - 0ms)
token-parser/done-token.js packetLength=32767 tokenCount=10 n=1000: 18,823.786655418575 (minor: 0 - 0ms, major: 0 - 0ms, incremental: 0 - 0ms)
token-parser/done-token.js packetLength=512 tokenCount=100 n=1000: 6,533.733203903563 (minor: 0 - 0ms, major: 0 - 0ms, incremental: 0 - 0ms)
token-parser/done-token.js packetLength=4096 tokenCount=100 n=1000: 6,292.0264106550185 (minor: 0 - 0ms, major: 0 - 0ms, incremental: 0 - 0ms)
token-parser/done-token.js packetLength=32767 tokenCount=100 n=1000: 6,998.0184061180325 (minor: 0 - 0ms, major: 0 - 0ms, incremental: 0 - 0ms)
token-parser/done-token.js packetLength=512 tokenCount=1000 n=1000: 892.2880869263913 (minor: 0 - 0ms, major: 0 - 0ms, incremental: 0 - 0ms)
token-parser/done-token.js packetLength=4096 tokenCount=1000 n=1000: 858.4501710413463 (minor: 0 - 0ms, major: 0 - 0ms, incremental: 0 - 0ms)
token-parser/done-token.js packetLength=32767 tokenCount=1000 n=1000: 982.4491115307425 (minor: 0 - 0ms, major: 0 - 0ms, incremental: 0 - 0ms)
token-parser/done-token.js packetLength=512 tokenCount=10000 n=1000: 93.75388861538904 (minor: 0 - 0ms, major: 0 - 0ms, incremental: 0 - 0ms)
token-parser/done-token.js packetLength=4096 tokenCount=10000 n=1000: 98.36781801141544 (minor: 0 - 0ms, major: 0 - 0ms, incremental: 0 - 0ms)
token-parser/done-token.js packetLength=32767 tokenCount=10000 n=1000: 95.21740589231679 (minor: 0 - 0ms, major: 0 - 0ms, incremental: 0 - 0ms)

After

@arthurschreiber ➜ /workspaces/tedious (arthur/parser-cleanup ✗) $ npm run build && node benchmarks/token-parser/done-token.js 

> [email protected] build /workspaces/tedious
> rimraf lib && babel src --out-dir lib --extensions .js,.ts

Successfully compiled 99 files with Babel (2472ms).
token-parser/done-token.js packetLength=512 tokenCount=10 n=100: 11,432.207523833009 (minor: 0 - 0ms, major: 0 - 0ms, incremental: 0 - 0ms)
token-parser/done-token.js packetLength=4096 tokenCount=10 n=100: 12,580.203514984345 (minor: 0 - 0ms, major: 0 - 0ms, incremental: 0 - 0ms)
token-parser/done-token.js packetLength=32767 tokenCount=10 n=100: 11,999.173976863432 (minor: 0 - 0ms, major: 0 - 0ms, incremental: 0 - 0ms)
token-parser/done-token.js packetLength=512 tokenCount=100 n=100: 2,118.717190472527 (minor: 0 - 0ms, major: 0 - 0ms, incremental: 0 - 0ms)
token-parser/done-token.js packetLength=4096 tokenCount=100 n=100: 2,301.491787851525 (minor: 0 - 0ms, major: 0 - 0ms, incremental: 0 - 0ms)
token-parser/done-token.js packetLength=32767 tokenCount=100 n=100: 2,254.827020945088 (minor: 0 - 0ms, major: 0 - 0ms, incremental: 0 - 0ms)
token-parser/done-token.js packetLength=512 tokenCount=1000 n=100: 652.0477304676723 (minor: 0 - 0ms, major: 0 - 0ms, incremental: 0 - 0ms)
token-parser/done-token.js packetLength=4096 tokenCount=1000 n=100: 832.4829034859198 (minor: 0 - 0ms, major: 0 - 0ms, incremental: 0 - 0ms)
token-parser/done-token.js packetLength=32767 tokenCount=1000 n=100: 851.6297468138828 (minor: 0 - 0ms, major: 0 - 0ms, incremental: 0 - 0ms)
token-parser/done-token.js packetLength=512 tokenCount=10000 n=100: 98.5433097363429 (minor: 0 - 0ms, major: 0 - 0ms, incremental: 0 - 0ms)
token-parser/done-token.js packetLength=4096 tokenCount=10000 n=100: 109.880603285959 (minor: 0 - 0ms, major: 0 - 0ms, incremental: 0 - 0ms)
token-parser/done-token.js packetLength=32767 tokenCount=10000 n=100: 118.42038587806157 (minor: 0 - 0ms, major: 0 - 0ms, incremental: 0 - 0ms)
token-parser/done-token.js packetLength=512 tokenCount=10 n=1000: 21,992.460148947455 (minor: 0 - 0ms, major: 0 - 0ms, incremental: 0 - 0ms)
token-parser/done-token.js packetLength=4096 tokenCount=10 n=1000: 22,722.215485085333 (minor: 0 - 0ms, major: 0 - 0ms, incremental: 0 - 0ms)
token-parser/done-token.js packetLength=32767 tokenCount=10 n=1000: 18,352.24278079202 (minor: 0 - 0ms, major: 0 - 0ms, incremental: 0 - 0ms)
token-parser/done-token.js packetLength=512 tokenCount=100 n=1000: 6,101.542976492525 (minor: 0 - 0ms, major: 0 - 0ms, incremental: 0 - 0ms)
token-parser/done-token.js packetLength=4096 tokenCount=100 n=1000: 7,162.232670471326 (minor: 0 - 0ms, major: 0 - 0ms, incremental: 0 - 0ms)
token-parser/done-token.js packetLength=32767 tokenCount=100 n=1000: 8,827.152724298105 (minor: 0 - 0ms, major: 0 - 0ms, incremental: 0 - 0ms)
token-parser/done-token.js packetLength=512 tokenCount=1000 n=1000: 959.5495395454244 (minor: 0 - 0ms, major: 0 - 0ms, incremental: 0 - 0ms)
token-parser/done-token.js packetLength=4096 tokenCount=1000 n=1000: 1,121.648569455864 (minor: 0 - 0ms, major: 0 - 0ms, incremental: 0 - 0ms)
token-parser/done-token.js packetLength=32767 tokenCount=1000 n=1000: 1,285.645163002097 (minor: 0 - 0ms, major: 0 - 0ms, incremental: 0 - 0ms)
token-parser/done-token.js packetLength=512 tokenCount=10000 n=1000: 99.44789288554495 (minor: 0 - 0ms, major: 0 - 0ms, incremental: 0 - 0ms)
token-parser/done-token.js packetLength=4096 tokenCount=10000 n=1000: 112.54060101237711 (minor: 0 - 0ms, major: 0 - 0ms, incremental: 0 - 0ms)
token-parser/done-token.js packetLength=32767 tokenCount=10000 n=1000: 128.9056101654226 (minor: 0 - 0ms, major: 0 - 0ms, incremental: 0 - 0ms)

arthurschreiber added 2 commits June 19, 2022 12:21

refactor: remove callback nesting from done token parser

8f3c229

refactor: add packetLength option to benchmark

ec349de

arthurschreiber force-pushed the arthur/parser-cleanup branch from d0c2768 to ec349de Compare June 19, 2022 12:22

mShan0 mentioned this pull request Aug 5, 2022

refactor: remove callback nesting from parsers #1477

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

refactor: remove callback nesting from `done` token parser #1448

refactor: remove callback nesting from `done` token parser #1448

arthurschreiber commented Jun 19, 2022 •

edited

Loading

codecov bot commented Jun 19, 2022 •

edited

Loading

arthurschreiber commented Jun 19, 2022

refactor: remove callback nesting from done token parser #1448

Are you sure you want to change the base?

refactor: remove callback nesting from done token parser #1448

Conversation

arthurschreiber commented Jun 19, 2022 • edited Loading

Before

After

codecov bot commented Jun 19, 2022 • edited Loading

Codecov Report

arthurschreiber commented Jun 19, 2022

Before

After

refactor: remove callback nesting from `done` token parser #1448

refactor: remove callback nesting from `done` token parser #1448

arthurschreiber commented Jun 19, 2022 •

edited

Loading

codecov bot commented Jun 19, 2022 •

edited

Loading