Skip to content
This repository has been archived by the owner on Oct 16, 2022. It is now read-only.

scanRanges is extremely slow on dense match input #14

Open
ChrisPenner opened this issue Dec 15, 2019 · 3 comments
Open

scanRanges is extremely slow on dense match input #14

ChrisPenner opened this issue Dec 15, 2019 · 3 comments

Comments

@ChrisPenner
Copy link

ChrisPenner commented Dec 15, 2019

Referenced in ChrisPenner/lens-regex-pcre#10 with benchmarks here: https://github.com/jamesdbrock/replace-benchmark#results

We can see that pcre-heavy is having a bit of trouble with substitutions with LOTS of matches, I did a quick profile on it and the VAST majority of the time is spent inside rawMatch in scanRanges0. On large strings; it clocks in at OVER 10 minutes, whereas the next longest time is just over one second.

After a cursory look, it seems it is serially calling out to C for each consecutive match, and just iterating on that, there's also a lot of fiddly math going on, but nothing that seems to be the "obvious" culprit. Perhaps there's some unknown copying going on too, but I didn't dig deep enough to see.

I can dig around a bit more, but if anyone has suggestions or tips on a good way to speed this up, or even ideas of where the cost centers might be I'm open to ideas!

I understand that the benchmark in question represents a sort of "perfect storm" in terms of triggering bad performance, but it would be nice if there's some way we can get this into at least the same ballpark :)

@ChrisPenner
Copy link
Author

This might be a good thing for someone to look into for #hacktoberfest

@valpackett
Copy link
Owner

I'd be surprised if anyone seriously participates in that, seems like it's more of a spam-fest..

@ChrisPenner
Copy link
Author

ChrisPenner commented Oct 3, 2020

That's an interesting article, and it's probably true for much of the github ecosystem, but I have a feeling Haskell repos are likely shielded at least a bit from poor quality submissions (Haskell doesn't even show up in the list on the main site).

A quick glance over closed hacktoberfest issues in Haskell shows a surprising amount of legitimate work happening 😄

Granted, this isn't so much of a "quick win"

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants