-
Notifications
You must be signed in to change notification settings - Fork 77
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Nested regex captures #77
Comments
This is a known restriction of the regex library that jaq uses. As suggested in the linked post, we could use in jaq Rust bindings to oniguruma, the regex library behind jq. While this step would increase compatibility with jq, that would require figuring out how this impacts the build process of jaq (given that it links to a C library), for example whether this impacts the ability to build jaq for WASM. The impact on performance should also be measured. And of course, it would require porting the current regex routines of jaq to oniguruma. If someone is interested in doing all this, I would consider merging a corresponding PR. |
For reference here’s the link to onig, the relevant Rust crate: |
FYI, there's another crate based on regex that provides the look-around feature needed for this: https://docs.rs/fancy-regex/latest/fancy_regex/. It hasn't reached 1.0 as of this comment. |
There something that doesn't quite add up for me. I can see that jaq is missing the look-around feature in regexes, but I don't think that's related to the given examples of behaviour mismatch. The manual defines
I believe the issue raised by @pkoppstein is actually about the definition of $ jq -R 'match("^(([^:]+): *(.*))?")' <<< '((a):(b))'
{
"offset": 0,
"length": 9,
"string": "((a):(b))",
"captures": [
{
"offset": 0,
"length": 9,
"string": "((a):(b))",
"name": null
},
{
"offset": 0,
"length": 4,
"string": "((a)",
"name": null
},
{
"offset": 5,
"length": 4,
"string": "(b))",
"name": null
}
]
} $ jaq -R 'match("^(([^:]+): *(.*))?")' <<< '((a):(b))'
{
"offset": 0,
"length": 9,
"string": "((a):(b))",
"captures": [
{
"offset": 0,
"length": 9,
"string": "((a):(b))"
},
{
"offset": 0,
"length": 4,
"string": "((a)"
},
{
"offset": 5,
"length": 4,
"string": "(b))"
}
]
} Which I interpret as jq actually finding a single match in the input. Then the issue is EDIT: I see now that jaq's $ jaq -R 'scan("a")' <<< 'aaa'
"a"
$ jaq -R 'scan("a"; "g")' <<< 'aaa'
"a"
"a"
"a" So that's another difference from jq, which seems to have the |
jq and gojq are agreed:
But:
The text was updated successfully, but these errors were encountered: