Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parser: box parser #19

Closed
wants to merge 1 commit into from
Closed

Conversation

MarinPostma
Copy link
Contributor

This PR boxes the yyParser in Parser. The reason for that is that it is huge (141464 bytes!), and I encountered stack overflows because of this.

@gwenn
Copy link
Owner

gwenn commented Jan 13, 2023

I guess that the real culprit is the AST (#8)

@MarinPostma
Copy link
Contributor Author

can you point me toward where that could be fixed?

@gwenn
Copy link
Owner

gwenn commented Jan 14, 2023

Just run cargo clippy with enum-variant-size-threshold = 50.
There is also this stack:
https://github.com/gwenn/lemon-rs/blob/master/third_party/lemon/lempar.rs#L189
You can check its max size:
https://github.com/gwenn/lemon-rs/blob/master/third_party/lemon/lempar.rs#L184-L185
Or try to reduce its size:
https://github.com/gwenn/lemon-rs/blob/master/third_party/lemon/lemon.c#L4431-L4435

@MarinPostma
Copy link
Contributor Author

MarinPostma commented Jan 14, 2023

I have investigated this a bit further. Here is the list of the 15 biggest types, along with their size:

141464	parser::parse::yyParser - /Users/mpostma/Documents/code/rust/lemon-rs/target/debug/build/sqlite3-parser-47a895829ff66033/out/parse.rs:2257:1: 2265:2 (#0)
  6480	[&'static str; _] - /Users/mpostma/Documents/code/rust/lemon-rs/target/debug/build/sqlite3-parser-47a895829ff66033/out/parse.rs:2630:1: 3036:3 (#0)
  4688	[&'static str; 293] - /Users/mpostma/Documents/code/rust/lemon-rs/target/debug/build/sqlite3-parser-47a895829ff66033/out/parse.rs:2328:1: 2623:3 (#0)
  4550	[u16; 2275] - /Users/mpostma/Documents/code/rust/lemon-rs/target/debug/build/sqlite3-parser-47a895829ff66033/out/parse.rs:1635:1: 1864:3 (#0)
  4216	[u16; 2108] - /Users/mpostma/Documents/code/rust/lemon-rs/target/debug/build/sqlite3-parser-47a895829ff66033/out/parse.rs:1420:1: 1632:3 (#0)
  1144	parser::Context - src/parser/mod.rs:38:1: 44:2 (#0)
  1114	[u16; 557] - /Users/mpostma/Documents/code/rust/lemon-rs/target/debug/build/sqlite3-parser-47a895829ff66033/out/parse.rs:1981:1: 2038:3 (#0)
  1114	[u16; 557] - /Users/mpostma/Documents/code/rust/lemon-rs/target/debug/build/sqlite3-parser-47a895829ff66033/out/parse.rs:1872:1: 1929:3 (#0)
  1096	parser::parse::yyStackEntry - /Users/mpostma/Documents/code/rust/lemon-rs/target/debug/build/sqlite3-parser-47a895829ff66033/out/parse.rs:2244:1: 2250:2 (#0)
  1096	parser::ast::Cmd - src/parser/ast/mod.rs:117:1: 121:2 (#0)
  1088	parser::parse::YYMINORTYPE - /Users/mpostma/Documents/code/rust/lemon-rs/target/debug/build/sqlite3-parser-47a895829ff66033/out/parse.rs:100:1: 180:2 (#0)
  1088	parser::ast::Stmt - src/parser/ast/mod.rs:157:1: 273:2 (#0)
  1008	parser::ast::TriggerCmd - src/parser/ast/mod.rs:2648:1: 2669:2 (#0)
   928	parser::ast::InsertBody - src/parser/ast/mod.rs:2537:1: 2540:2 (#0)
   810	[u16; _] - /Users/mpostma/Documents/code/rust/lemon-rs/target/debug/build/sqlite3-parser-47a895829ff66033/out/parse.rs:3382:1: 3788:3 (#0)

The main reason yyParser is so large, is because of the SmallVec of yyStackEntry, each of which is 1088 bytes.
Almost all of the size in the ssStackEntry comes from the YYMINORTYPE (1088 bytes) (that's the AST enum). I don't think that 1088 bytes for enum are problematic. There could be performance issues caused by its size of it, but then it would require actual benchmarking to see where to draw the line.

That basically leaves us with two options:

  • Stop using a SmallVec<yyStackEntry>, which takes a lot of space on the stack
  • Make yyStackEntry leaner. That's made easy by boxing the YYMINORTYPE inside of it.

The former is easier since it doesn't involve editing the grammar file.

@gwenn
Copy link
Owner

gwenn commented Jan 14, 2023

Thanks for your investigation.
In the original C version, you can choose between dynamic vs static stack:
https://github.com/sqlite/sqlite/blob/4e86aa86ea0ab657423f1a715c620c188c6ba879/tool/lempar.c#L215-L222
And SQLite uses a static stack but doesn't generate an AST...

We should try to rollback 8e9925e

@MarinPostma
Copy link
Contributor Author

MarinPostma commented Jan 14, 2023

@gwenn, I have changed my mind and decided to go with making the AST itself leaner, I'll keep you updated

@MarinPostma
Copy link
Contributor Author

I'm closing this pr for now, I'll get back to it when I have more time to work on optimizing the AST itself

@gwenn
Copy link
Owner

gwenn commented Jul 20, 2024

@MarinPostma For your information, SmallVec has been removed.
See #57

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants