Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parse via streaming for huge files #49

Open
AlexJWayne opened this issue Jul 2, 2019 · 2 comments
Open

Parse via streaming for huge files #49

AlexJWayne opened this issue Jul 2, 2019 · 2 comments

Comments

@AlexJWayne
Copy link

I'm trying to parse a 1.1GB file in an electron app. However, an Invalid String Length exception is being raised on this line:

function onData(chunk) {
	dxfString += chunk;
}

https://github.com/gdsestimating/dxf-parser/blob/master/src/DxfParser.js#L84

I believe that dxfString string is being built up to exceed to the 1GB limit on strings in V8.

It seems that this could be fixed by streaming the file contents into the parser rather than crawling through a giant string.

I'm not familiar enough with the DXF format to know if it's parseable without having all of it loaded, but I think it may be?

@bzuillsmith
Copy link
Member

Yes, this is a limitation with the current implementation. One option would be to try and break it up into DXF Sections so that each section gets it's own string. That might be enough to make it work for this particular file, but not sure.
I experimented with streaming in the past because large files cause the main thread of browsers to become unresponsive and crash. Unfortunately, streaming makes the parser much more complicated. You have to start keeping track of the state of the parser which adds a lot of complexity. I never had time to implement it.

@pentacular
Copy link
Contributor

Having had a quick look at the code, it looks like you only ever rewind back one section?

If so, that means that you have a small moving window of input data that you need to keep in memory.

Which means that you could make the methods async, fetch data on demand, and just keep enough data around for that single level of rewind, without affecting the data structures of the parser.

Having said that, I haven't run into size problems yet -- but if it's important to you, it might be a reasonably cheap optimization. :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants