Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Filter out keys when processing objects #1489

Open
lemire opened this issue Mar 6, 2021 · 4 comments
Open

Filter out keys when processing objects #1489

lemire opened this issue Mar 6, 2021 · 4 comments
Labels
on demand Related to simdjson::ondemand functionality performance

Comments

@lemire
Copy link
Member

lemire commented Mar 6, 2021

The idea is to very quickly dismiss keys that are not a match when searching through an object.

Suppose that the target has at least 7 bytes.

Pick up a 64-bit word. Copy up to 8 characters from the target to the this 64-bit word, terminating with a quote if possible. Let us call this word FILTER.

Go through the keys. Load a 64-bit word (always safe because of padding) at that location. Do an XOR with FILTER. You get zero if and only if the first 8 characters match (including possible the quote). If so, investigate further, if not move on.

If the target has fewer than 7 bytes, then you need to use a mask, but there are only 7 cases so you can use a lookup table or some other cleverness.

See #1485

@lemire lemire added this to the 1.0 milestone Mar 6, 2021
@lemire
Copy link
Member Author

lemire commented Mar 6, 2021

@jkeiser Marking for 1.0 though it could be 0.9.

It is not hard to implement. It just requires benchmarking.

@lemire
Copy link
Member Author

lemire commented Mar 8, 2021

Note that this could be extended to handle escaped content (backslashes), maybe optionally.

@jkeiser
Copy link
Member

jkeiser commented Mar 8, 2021

It seems like this might also play nicely with multi-field lookup interface ("find field x, y or z, whatever comes next").

@lemire
Copy link
Member Author

lemire commented Mar 8, 2021

@jkeiser Right. So I imagine a template where you pass a "matcher" which returns true if and only if the string matches. The idea is to do more work upfront building a matching instance. Ultimately, you could do regex!!!

@jkeiser jkeiser added on demand Related to simdjson::ondemand functionality performance labels Mar 9, 2021
@lemire lemire modified the milestones: 1.0, 2.0 Jul 27, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
on demand Related to simdjson::ondemand functionality performance
Projects
None yet
Development

No branches or pull requests

2 participants