-
Notifications
You must be signed in to change notification settings - Fork 77
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
jqjq support #216
Comments
It would be fun if jaq could run jqjq 😄 some months ago i hacked around the lack of -L support (but it seem to have it now?) but then ran into lack of destructing support. $ jaq -n -L . 'include "jqjq"; eval("1+2")'
Error: expected variable
╭─[/Users/wader/src/jqjq/jqjq.jq]
│
49 │ | ([.[2:6], .[8:] | _fromhex]) as [$hi,$lo]
┆ ────┬────
┆ │
┆ ╰───── unexpected token
... |
And yes, I would love to see jaq being able to run jqjq, but lack of destructuring support is indeed a pretty large road-block. Perhaps in jaq 3.0? :) |
Ah yes i probably tested with current main branch
Would be great and a fun project to help out with
One of my favorite but sadly a bit hidden features of jq! sometimes i also dream about how some kind of pattern matching could work in jq 🤔 |
🥳 nice, i gave it a try. with this jqjq changes i can get it to lex but parsing seems to bail out in the precedence climbing some notes:
diff --git a/jqjq.jq b/jqjq.jq
index bc5e632..64d0fde 100644
--- a/jqjq.jq
+++ b/jqjq.jq
@@ -80,7 +80,7 @@ def lex:
def _re($re; f):
( . as {$remain, $string_stack}
| $remain
- | match($re; "m").string
+ | match($re; "").string
| f as $token
| { result: ($token | del(.string_stack))
, remain: $remain[length:]
@@ -153,8 +153,8 @@ def lex:
// _re("^\\)"; {rparen: ., string_stack: ($string_stack[0:-1])})
// _re("^\\["; {lsquare: .})
// _re("^\\]"; {rsquare: .})
- // _re("^{"; {lcurly: .})
- // _re("^}"; {rcurly: .})
+ // _re("^\\{"; {lcurly: .})
+ // _re("^\\}"; {rcurly: .})
// _re("^\\.\\."; {dotdot: .})
// _re("^\\."; {dot: .})
// _re("^\\?"; {qmark: .})
@@ -228,7 +228,7 @@ def parse:
else false
end;
- ( _p("query1") as [$rest, $t]
+ ( debug({prec: .}) | _p("query1") as [$rest, $t]
| $rest
| def _f($t):
( .[0] as $next # peek next
@@ -272,6 +272,7 @@ def parse:
def _scalar($type; c; f):
( . as [$first]
| _consume(c)
+ | debug({aaa:.})
| [ .
, { term:
( $first
@@ -280,6 +281,7 @@ def parse:
)
}
]
+ | debug({bbb:.})
);
# {<keyval>...} where keyval is:
@@ -1060,7 +1062,7 @@ def parse:
]
);
- ( .# debug({_p: $type})
+ ( debug({_p: $type})
| if $type == "query" then
_op_prec_climb(0; false)
elif $type == "keyval_query" then
@@ -1112,7 +1114,7 @@ def parse:
// _p("recurse") # ".."
) as [$rest, $term]
| $rest
- | _repeat(_p("suffix")) as [$rest, $suffix_list]
+ | _repeat(empty | _p("suffix")) as [$rest, $suffix_list]
| $rest
| [ .
, ( $term
@@ -1516,10 +1518,10 @@ def eval_ast($query; $path; $env; undefined_func):
( a0 as $a0
| [[null], has($a0)]
)
- elif $name == "delpaths/1" then
- ( a0 as $a0
- | [[null], delpaths($a0)]
- )
+ # elif $name == "delpaths/1" then
+ # ( a0 as $a0
+ # | [[null], delpaths($a0)]
+ # )
elif $name == "explode/0" then [[null], explode]
elif $name == "implode/0" then [[null], implode]
elif $name == "tonumber/0" then [[null], tonumber]
@@ -1536,19 +1538,19 @@ def eval_ast($query; $path; $env; undefined_func):
| error($a0)
)
elif $name == "halt_error/1" then [[null], halt_error(a0)]
- elif $name == "getpath/1" then
- ( a0 as $a0
- | [ $path+$a0
- , getpath($a0)
- ]
- )
- elif $name == "setpath/2" then
- ( a0 as $a0
- | a1 as $a1
- | [ []
- , setpath($a0; $a1)
- ]
- )
+ # elif $name == "getpath/1" then
+ # ( a0 as $a0
+ # | [ $path+$a0
+ # , getpath($a0)
+ # ]
+ # )
+ # elif $name == "setpath/2" then
+ # ( a0 as $a0
+ # | a1 as $a1
+ # | [ []
+ # , setpath($a0; $a1)
+ # ]
+ # )
elif $name == "path/1" then
( _e($args[0]; []; $query_env) as [$p, $_v]
# TODO: try/catch error
@@ -1577,7 +1579,7 @@ def eval_ast($query; $path; $env; undefined_func):
elif $name == "expm1/0" then [[null], expm1]
elif $name == "fabs/0" then [[null], fabs]
elif $name == "floor/0" then [[null], floor]
- elif $name == "gamma/0" then [[null], gamma]
+ # elif $name == "gamma/0" then [[null], gamma]
elif $name == "j0/0" then [[null], j0]
elif $name == "j1/0" then [[null], j1]
elif $name == "lgamma/0" then [[null], lgamma]
@@ -2538,9 +2540,10 @@ def eval($expr; $globals; $builtins_env):
# TODO: does not work with jq yet because issue with bind patterns
# $ gojq -cn -L . 'include "jqjq"; {} | {a:1} | eval(".a") += 1'
# {"a":2}
- | if $path | . == [] or . == [null] then $value
- else getpath($path)
- end
+ # | if $path | . == [] or . == [null] then $value
+ # else getpath($path)
+ # end
+ | $value
);
def eval($expr):
eval($expr; {}; _builtins_env);
|
@wader wrote:
Just a reminder that jq's actual behavior regarding some of the regex @01mf02 -- I noticed that in your revision of the jq manual, |
@pkoppstein true! haven't dugg any deeper what going on, here is a repro of the difference. seem to only happen when there is a $ jq 'match("^\\s+"; "m")' <<< '"a\nb\n"'
$ jaq 'match("^\\s+"; "m")' <<< '"a\nb\n"'
$ jq 'match("^\\s+"; "m")' <<< '"a\n\nb\n"'
$ jaq 'match("^\\s+"; "m")' <<< '"a\n\nb\n"'
{
"offset": 2,
"length": 1,
"string": "\n",
"captures": []
}
|
Great. Thanks for trying this out. Let us know if you need any help.
jaq enables regex's
I'm not sure whether I'm the right person to do this --- I'm not very fluent at regexes, and even when I come across a discrepancy between jq and jaq, I am not confident to decide whether jq's or jaq's behaviour is right. |
@01mf02 wrote:
Yes, but I think we could avoid that by focusing only on those cases where someone notices a discrepancy where (a) jaq and/or gojq is correct and (b) either jq 1.7 behavior is definitely wrong, or the jq manual is definitely in need of clarification in light of the correct behavior. I would certainly be glad to help, and suspect @wader would too :-) |
@pkoppstein, thanks for offering your help. I will first await the merging of the current state of the manual, before augmenting it with more content. So let us discuss the topic of regex documentation once this is done. |
@01mf02 - I'm sorry I probably can't do much to help expedite the acceptance of your excellent work on the manual. Let me know if you think otherwise. |
Spent some more time on jqjq support and think i found a bug/difference between jaq and jq when it comes to destructing. Managed to minimize it down to this: $ jq -cn '[{a:123}] | debug | .[0] as {$a} | debug, $a'
["DEBUG:",[{"a":123}]]
["DEBUG:",[{"a":123}]]
[{"a":123}]
123
$ jaq -cn '[{a:123}] | debug | .[0] as {$a} | debug, $a'
["DEBUG:", [{"a":123}]]
["DEBUG:", {"a":123}]
{"a":123}
123 It seems like the as-binding does not passthru the input unchanged in the case of |
Ran into this also, i would guess related to same issue: $ jq -n '{} as $a | $a as {$t} | debug'
["DEBUG:",null]
null
$ jaq -n '{} as $a | $a as {$t} | debug'
["DEBUG:", {}]
{} |
Perhaps it is interesting to mention at this point that jaq's destructuring follows the same rules as its indexing, which diverge from jq by failing to index any null value. That means that |
Yeap noticed :) have done work workaround for it. I pushed my current progress here https://github.com/wader/jqjq/commits/jaq/ but be aware of lots of hacks :) with this at least "1+2" seem to kind of work but other expressions fails. Will probably continue later today on it. |
Thanks for mentioning the jaq branch of jqjq.
This evaluates the filter without using any globals or builtins. For $FILTER, the following work and yield the expected output:
What does not work:
|
Concerning the lexing speed:
is indeed slower than jq right now (1.2 seconds in jaq vs. 0.4 seconds in jq). I just made some small experiment, and the performance problem seems to be related to slow regex execution. So I swapped the underlying regex library in 0840b1b to |
Hmm it seem to get stuck with
🥳
I guess some null checks etc is needed in some places, i've neem bit sloppy with using null it seems :) |
Doh, yes you correct, i build without
When i read about regex-list it seem to focus on binary size, compile time and some less features (less unicode support?) over performance? but it was faster in this case also? ... oh they mean compile time as in time compiling a regex not rust build time? :) BTW for regex heavy filters maybe a regex cache would be an interesting idea? there has been some talks in jq issues about various ways of doing, ex in jqlang/jq#2856, maybe a LRU cache etc? i've also experimented a bit with how |
Any idea what is going on with these? https://github.com/wader/jqjq/blob/master/jqjq.jq#L1945-L1946 Without any changes it seem expressions like |
Now all of those work in the jaq branch, and i think all of them were different variants of null index. Current fixes are mostly temp workarounds, should try figure something nicer. I've started to annotate with |
Oh seems eval/1 now works, not sure what fixed it, and also this 🥳 $ jaq -L . 'include "jqjq"; eval("def f: 1,8; [f,f] | map(.+105) | implode")' <<< '{"a":123}'
"jqjq" |
Any plan to add $ jq -n '$ARGS' --args a b c -- d
{
"positional": [
"a",
"b",
"c",
"d"
],
"named": {}
}
$ jq -n --args '$ARGS' a b c -- d
{
"positional": [
"a",
"b",
"c",
"d"
],
"named": {}
} So skips |
Found another bug i think, But with an ugly hack, that is now in the jaq brach, i managed to run the tests! but not sure i would trust them just yet :) i did notice the test result output look a bit wrong so needs some work. $ jaq -n -L . -rRs 'include "jqjq"; . | jqjq(["", "--run-tests"]; {})' < jqjq.test
<cut>
OK: line 1230: null | eval("eval(\"empty\")") -> null
DIFF: line 1238: null | (.a | eval(".a, .b.c")) = 123 -> {"a":{"a":123,"b":{"c":123}}}
Expected: {"a":{"a":123,"b":{"c":123}}}
Actual: 123
OK: line 1244: null | try eval("{a b}") catch "failed" -> "failed"
OK: line 1249: null | try eval(". as {$a $b} | .") catch "failed" -> "failed"
191 of 253 tests passed
null |
Hi @wader, amazing work! Thanks so much for it.
It has happened to (nearly) everyone ... ;)
I think that the statement in
I have already thought about this the last few days. And following your comment, I just implemented regex caching with an LRU cache. However, the result is underwhelming: Using Given that the performance improvement is not that large, I will probably not integrate this into jaq. For reference, here is the diff to reproduce the experiment: diff --git a/jaq-std/Cargo.toml b/jaq-std/Cargo.toml
index b9c3f84..db4d8da 100644
--- a/jaq-std/Cargo.toml
+++ b/jaq-std/Cargo.toml
@@ -29,6 +29,7 @@ libm = { version = "0.2.7", optional = true }
aho-corasick = { version = "1.0", optional = true }
base64 = { version = "0.22", optional = true }
urlencoding = { version = "2.1.3", optional = true }
+lru = "0.12.5"
[dev-dependencies]
jaq-json = { path = "../jaq-json" }
diff --git a/jaq-std/src/regex.rs b/jaq-std/src/regex.rs
index 3378306..e7a9ee5 100644
--- a/jaq-std/src/regex.rs
+++ b/jaq-std/src/regex.rs
@@ -4,7 +4,7 @@ use alloc::string::{String, ToString};
use alloc::vec::Vec;
use regex_lite::{self as regex, Error, Regex, RegexBuilder};
-#[derive(Copy, Clone, Default)]
+#[derive(Copy, Clone, PartialEq, Eq, Default, Hash)]
pub struct Flags {
// global search
g: bool,
@@ -65,6 +65,25 @@ impl Flags {
let mut builder = RegexBuilder::new(re);
self.impact(&mut builder).build()
}
+
+ #[cfg(feature = "std")]
+ pub fn regex_cached(self, re: &str) -> Result<Regex, Error> {
+ use core::cell::RefCell;
+ use core::num::NonZeroUsize;
+ use lru::LruCache;
+
+ std::thread_local! {
+ static CACHE: RefCell<LruCache<(String, Flags), Result<Regex, Error>>> =
+ RefCell::new(LruCache::new(NonZeroUsize::new(64).unwrap()));
+ }
+
+ CACHE.with(|cache| {
+ cache
+ .borrow_mut()
+ .get_or_insert((re.to_string(), self), || self.regex(re))
+ .clone()
+ })
+ }
}
/// Mapping between byte and character indices. |
Oh, oh, that looks indeed like tail recursion gone wrong. I'm looking into it, but so far, I was not able to reproduce it with a smaller example ... |
That should actually not be that hard to do with |
I noticed that # the "%" is some zsh indicator about missing new line
$ jq -Rjn '"prompt> " | ., input'
prompt> 123
123%
$ jaq -Rjn '"prompt> " | ., input'
123
prompt> 123%
$ jaq -Rjn '"prompt>\n" | ., input'
prompt>
123
123% |
|
The motivation is to correctly parse `--args`. As a nice side effect, this eliminates the dependency on clap as well. See: <#216 (comment)>
You are right. I have now rewritten the command-line argument parser, see #238. |
You are right; my brain was in Rust mode. |
Your intuition is correct; replacing fn print(writer: &mut impl Write, cli: &Cli, val: &Val) -> io::Result<()> {
let f = |f: &mut Formatter| fmt_val_root(f, cli, val);
write!(writer, "{}", FormatterFn(f))?;
writer.flush()
} This might be a good time to implement |
👍 jqjq can always reimplement for now, and yes i can understand that it's a bit weird to have just some path related functions and not all.
Ok i see, yeap used the reimplementations. But shouldn't it ideally cause some jq runtime error instead of panic? |
Yes.
Yes. Originally, I left the panic at that place because I wasn't sure whether it was possible to perform updates ( |
I corrected this in a51d8da. Thanks again for bringing this up, and for all the other issues you reported. All your feedback has really helped me to improve the quality of jaq! I am very grateful for your help. |
By the way, I am planning to release jaq 2.0 this week, probably tomorrow. I was thinking about mentioning that jaq can now run jqjq. Are there plans from your side to merge jqjq's jaq branch into its master branch, or should I give instructions in the release notes for jaq 2.0 to continue using jqjq's jaq branch? |
👍 when messing with reduce/foreach for jqjq i did notice that it's possible for them to be valid path expressions but it might not be very useful: $ jq -cn '{a:1} | path((foreach (1,2,3) as $v (.;.)).a)'
["a"]
["a"]
["a"]
$ jq -cn '{a:1} | (foreach (1,2,3) as $v (.;.)).a += 1'
{"a":4}
$ jq -cn '{a:1} | path((reduce (1,2,3) as $v (.;.)).a)'
["a"]
$ jq -cn '{a:1} | (reduce (1,2,3) as $v (.;.)).a += 1'
{"a":2} |
Nice. Looking at the jq code i wonder how this ends up working, can only see that |
🥳 good you reminded me, yes hope i can get some time tonight to cleanup and merge things to master. it's mostly the null index workarounds i've been a bit reluctant about how to fix but i can probably come up with something. |
@01mf02 jqjq master should now work with jaq 2.0. there are still some TODOs left but i think they can wait |
Oh, that's interesting! However, this seems to be quite limited; consider the following example:
|
Yeap agree, not sure how useful it is, but was a bit surprised why i wouldn't work as long as you make sure to only index into the input. Intentional or bug? $ ./jqjq -cn '[[2], 1] | path(reduce (0, 0) as $x (.; .[$x]))'
[0,0]
$ ./jqjq -cn '[[2], 1] | (reduce (0, 0) as $x (.; .[$x])) |= 3'
[[3],1]
$ ./jqjq -cn '[[2], 1] | path(foreach (0, 0) as $x (.; .[$x]))'
[0]
[0,0]
# 🤔
$ ./jqjq -cn '[[2], 1] | (foreach (0, 0) as $x (.; .[$x])) = .'
[[[[2],1],1],1] |
Oh, this works in jqjq! Just checked gojq:
That means that to my knowledge, jqjq is the only jq implementation that handles nontrivial paths in |
Thanks a lot! I just tested it and it seems to work! (Not sure whether you know it, but the "fib.bf" example from the README seems to run out of memory for both jaq and jq.) |
😅 part of implement jqjq was to learn more about those part of jq so i focused quite a lot to make it work... and found quite a lot of bugs :) it even works with |
👍 did't know fib was OOM:ing but i know jqjq via jqjq does atm, i used to work before some try/catch change i think that for some reason caused lots of more memory usage, would be nice to make it work again. |
Planning on doing a release today? let me know if i can help out with something |
I've just created the release. :) If you are motivated to do this, it would be definitely helpful to post the news about the release to some site where people could be interested about this release, e.g. Hacker News. (I'm having a bit of trouble doing this kind of "publicity".) I'll write a post on Reddit later today --- stepping out of my comfort zone for this. 😬 |
The path filter should emit error if the value at which indexing begins is different from the value on which the path filter was called. For example, in |
Will do, have already posted to jq discord and hacker news https://news.ycombinator.com/item?id=42254530 will post on mastodon and maybe some more places also. You should definitely write some posts about it! i'm very curious and hopeful that ppl will build stuff around the new data format agnostic support. |
Posted on mastodon https://fosstodon.org/@wader/113554288216205847 |
By "if the value at which indexing begins is different from the value on which the path filter was called" you mean the value that beings each foreach/reduce iteration? my thinking is that |
Thanks a lot for this! |
👍 also posted here https://lobste.rs/s/5lvjcn/jaq_2_0_released_with_lots_improvements |
https://github.com/wader/jqjq needs
-L
The text was updated successfully, but these errors were encountered: