Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

jqjq support #216

Open
Freed-Wu opened this issue Oct 10, 2024 · 115 comments
Open

jqjq support #216

Freed-Wu opened this issue Oct 10, 2024 · 115 comments

Comments

@Freed-Wu
Copy link

Freed-Wu commented Oct 10, 2024

https://github.com/wader/jqjq needs -L

@wader
Copy link
Contributor

wader commented Oct 10, 2024

It would be fun if jaq could run jqjq 😄 some months ago i hacked around the lack of -L support (but it seem to have it now?) but then ran into lack of destructing support.
You can test it like this if you want:

$ jaq -n -L . 'include "jqjq"; eval("1+2")'
Error: expected variable
   ╭─[/Users/wader/src/jqjq/jqjq.jq]
   │
49 │         | ([.[2:6], .[8:] | _fromhex]) as [$hi,$lo]
   ┆                                           ────┬────
   ┆                                               │
   ┆                                               ╰───── unexpected token
...

@01mf02
Copy link
Owner

01mf02 commented Oct 11, 2024

-L will be supported in jaq 2.0.

And yes, I would love to see jaq being able to run jqjq, but lack of destructuring support is indeed a pretty large road-block. Perhaps in jaq 3.0? :)

@wader
Copy link
Contributor

wader commented Oct 11, 2024

-L will be supported in jaq 2.0.

Ah yes i probably tested with current main branch

And yes, I would love to see jaq being able to run jqjq

Would be great and a fun project to help out with

but lack of destructuring support is indeed a pretty large road-block. Perhaps in jaq 3.0? :)

One of my favorite but sadly a bit hidden features of jq! sometimes i also dream about how some kind of pattern matching could work in jq 🤔

@01mf02
Copy link
Owner

01mf02 commented Oct 14, 2024

@wader, I was actually able to implement destructuring after all in jaq relatively quickly. :)
My current prototype is in the patterns branch, if you wish to try it.
I'll try running jqjq with it later myself.

@wader
Copy link
Contributor

wader commented Oct 14, 2024

🥳 nice, i gave it a try. with this jqjq changes i can get it to lex but parsing seems to bail out in the precedence climbing
code, digging into that.

some notes:

  • match with ^...and m flag seem to match any line or something in jaq but not jq.
  • { not need to be escaped in jq regexp but in jaq
  • gamma, getpath, setpath and delpaths not available in jaq
diff --git a/jqjq.jq b/jqjq.jq
index bc5e632..64d0fde 100644
--- a/jqjq.jq
+++ b/jqjq.jq
@@ -80,7 +80,7 @@ def lex:
     def _re($re; f):
       ( . as {$remain, $string_stack}
       | $remain
-      | match($re; "m").string
+      | match($re; "").string
       | f as $token
       | { result: ($token | del(.string_stack))
         , remain: $remain[length:]
@@ -153,8 +153,8 @@ def lex:
       // _re("^\\)";    {rparen: ., string_stack: ($string_stack[0:-1])})
       // _re("^\\[";    {lsquare: .})
       // _re("^\\]";    {rsquare: .})
-      // _re("^{";      {lcurly: .})
-      // _re("^}";      {rcurly: .})
+      // _re("^\\{";    {lcurly: .})
+      // _re("^\\}";    {rcurly: .})
       // _re("^\\.\\."; {dotdot: .})
       // _re("^\\.";    {dot: .})
       // _re("^\\?";    {qmark: .})
@@ -228,7 +228,7 @@ def parse:
         else false
         end;
 
-      ( _p("query1") as [$rest, $t]
+      ( debug({prec: .}) | _p("query1") as [$rest, $t]
       | $rest
       | def _f($t):
           ( .[0] as $next # peek next
@@ -272,6 +272,7 @@ def parse:
     def _scalar($type; c; f):
       ( . as [$first]
       | _consume(c)
+      | debug({aaa:.})
       | [ .
         , { term:
               ( $first
@@ -280,6 +281,7 @@ def parse:
               )
           }
         ]
+      | debug({bbb:.})
       );
 
     # {<keyval>...} where keyval is:
@@ -1060,7 +1062,7 @@ def parse:
         ]
       );
 
-    ( .# debug({_p: $type})
+    ( debug({_p: $type})
     | if $type == "query" then
         _op_prec_climb(0; false)
       elif $type == "keyval_query" then
@@ -1112,7 +1114,7 @@ def parse:
           // _p("recurse") # ".."
           ) as [$rest, $term]
         | $rest
-        | _repeat(_p("suffix")) as [$rest, $suffix_list]
+        | _repeat(empty | _p("suffix")) as [$rest, $suffix_list]
         | $rest
         | [ .
           , ( $term
@@ -1516,10 +1518,10 @@ def eval_ast($query; $path; $env; undefined_func):
                 ( a0 as $a0
                 | [[null], has($a0)]
                 )
-              elif $name == "delpaths/1" then
-                ( a0 as $a0
-                | [[null], delpaths($a0)]
-                )
+              # elif $name == "delpaths/1" then
+              #   ( a0 as $a0
+              #   | [[null], delpaths($a0)]
+              #   )
               elif $name == "explode/0"  then [[null], explode]
               elif $name == "implode/0"  then [[null], implode]
               elif $name == "tonumber/0" then [[null], tonumber]
@@ -1536,19 +1538,19 @@ def eval_ast($query; $path; $env; undefined_func):
                 | error($a0)
                 )
               elif $name == "halt_error/1" then [[null], halt_error(a0)]
-              elif $name == "getpath/1" then
-                ( a0 as $a0
-                | [ $path+$a0
-                  , getpath($a0)
-                  ]
-                )
-              elif $name == "setpath/2" then
-                ( a0 as $a0
-                | a1 as $a1
-                | [ []
-                  , setpath($a0; $a1)
-                  ]
-                )
+              # elif $name == "getpath/1" then
+              #   ( a0 as $a0
+              #   | [ $path+$a0
+              #     , getpath($a0)
+              #     ]
+              #   )
+              # elif $name == "setpath/2" then
+              #   ( a0 as $a0
+              #   | a1 as $a1
+              #   | [ []
+              #     , setpath($a0; $a1)
+              #     ]
+              #   )
               elif $name == "path/1" then
                 ( _e($args[0]; []; $query_env) as [$p, $_v]
                 # TODO: try/catch error
@@ -1577,7 +1579,7 @@ def eval_ast($query; $path; $env; undefined_func):
               elif $name == "expm1/0"       then [[null], expm1]
               elif $name == "fabs/0"        then [[null], fabs]
               elif $name == "floor/0"       then [[null], floor]
-              elif $name == "gamma/0"       then [[null], gamma]
+              # elif $name == "gamma/0"       then [[null], gamma]
               elif $name == "j0/0"          then [[null], j0]
               elif $name == "j1/0"          then [[null], j1]
               elif $name == "lgamma/0"      then [[null], lgamma]
@@ -2538,9 +2540,10 @@ def eval($expr; $globals; $builtins_env):
   # TODO: does not work with jq yet because issue with bind patterns
   # $ gojq -cn -L . 'include "jqjq"; {} | {a:1} | eval(".a") += 1'
   # {"a":2}
-  | if $path | . == [] or . == [null] then $value
-    else getpath($path)
-    end
+  # | if $path | . == [] or . == [null] then $value
+  #   else getpath($path)
+  #   end
+  | $value
   );
 def eval($expr):
   eval($expr; {}; _builtins_env);

@pkoppstein
Copy link

@wader wrote:

match with ^...and m flag seem to match any line or something in jaq but not jq.

Just a reminder that jq's actual behavior regarding some of the regex
flags is generally not a good guide. Some details are at
jqlang/jq#2663

@01mf02 -- I noticed that in your revision of the jq manual,
you added a "Compatibility" note regarding jaq and the regex library.
I was thinking that if you come across any instances where jaq's handling of regex flags is correct
but differs from that of jq 1.7, it would be very helpful to add some details.

@wader
Copy link
Contributor

wader commented Oct 15, 2024

@pkoppstein true! haven't dugg any deeper what going on, here is a repro of the difference. seem to only happen when there is a \n\n

$ jq 'match("^\\s+"; "m")' <<< '"a\nb\n"'
$ jaq 'match("^\\s+"; "m")' <<< '"a\nb\n"'

$ jq 'match("^\\s+"; "m")' <<< '"a\n\nb\n"'
$ jaq 'match("^\\s+"; "m")' <<< '"a\n\nb\n"'
{
  "offset": 2,
  "length": 1,
  "string": "\n",
  "captures": []
}

@01mf02
Copy link
Owner

01mf02 commented Oct 15, 2024

🥳 nice, i gave it a try. with this jqjq changes i can get it to lex but parsing seems to bail out in the precedence climbing
code, digging into that.

Great. Thanks for trying this out. Let us know if you need any help.

@pkoppstein true! haven't dugg any deeper what going on, here is a repro of the difference. seem to only happen when there is a \n\n

jaq enables regex's multi_line function when m is given. The behaviour that we see here is that ^ matches the \n (by definition), and because the \n just after it is a space character, the regex matches.

@01mf02 -- I noticed that in your revision of the jq manual,
you added a "Compatibility" note regarding jaq and the regex library.
I was thinking that if you come across any instances where jaq's handling of regex flags is correct
but differs from that of jq 1.7, it would be very helpful to add some details.

I'm not sure whether I'm the right person to do this --- I'm not very fluent at regexes, and even when I come across a discrepancy between jq and jaq, I am not confident to decide whether jq's or jaq's behaviour is right.
I also think that this might be a rabbit-hole from which one will have a hard time getting out. Because I'm sure that there are tons of differences between Oniguruma and regex, and I think that the jq manual is not a good point to document these differences.
(That being said, there might be some value in documenting the most blatant differences, but it will be hard to determine where to draw the line.)

@pkoppstein
Copy link

@01mf02 wrote:

this might be a rabbit-hole

Yes, but I think we could avoid that by focusing only on those cases where someone notices a discrepancy where (a) jaq and/or gojq is correct and (b) either jq 1.7 behavior is definitely wrong, or the jq manual is definitely in need of clarification in light of the correct behavior.

I would certainly be glad to help, and suspect @wader would too :-)

@01mf02 01mf02 changed the title -L not supported jqjq support Oct 23, 2024
@01mf02
Copy link
Owner

01mf02 commented Oct 23, 2024

@pkoppstein, thanks for offering your help. I will first await the merging of the current state of the manual, before augmenting it with more content. So let us discuss the topic of regex documentation once this is done.

@pkoppstein
Copy link

@01mf02 - I'm sorry I probably can't do much to help expedite the acceptance of your excellent work on the manual. Let me know if you think otherwise.

@wader
Copy link
Contributor

wader commented Oct 28, 2024

Spent some more time on jqjq support and think i found a bug/difference between jaq and jq when it comes to destructing. Managed to minimize it down to this:

$ jq -cn '[{a:123}] | debug | .[0] as {$a} | debug, $a'
["DEBUG:",[{"a":123}]]
["DEBUG:",[{"a":123}]]
[{"a":123}]
123

$ jaq -cn '[{a:123}] | debug | .[0] as {$a} | debug, $a'
["DEBUG:", [{"a":123}]]
["DEBUG:", {"a":123}]
{"a":123}
123

It seems like the as-binding does not passthru the input unchanged in the case of .[<index>] as {...} or [...]

@wader
Copy link
Contributor

wader commented Oct 28, 2024

Ran into this also, i would guess related to same issue:

$ jq -n '{} as $a | $a as {$t} | debug'
["DEBUG:",null]
null

$ jaq -n '{} as $a | $a as {$t} | debug'
["DEBUG:", {}]
{}

@01mf02
Copy link
Owner

01mf02 commented Oct 28, 2024

Hi @wader, just as I was setting out to look into jqjq myself right now, I find that you have beaten me to it. :)
Thanks for uncovering this bug. I have corrected binding behaviour in 4560c86.

@01mf02
Copy link
Owner

01mf02 commented Oct 28, 2024

Perhaps it is interesting to mention at this point that jaq's destructuring follows the same rules as its indexing, which diverge from jq by failing to index any null value. That means that [] as [{$x}] | $x yields null in jq, but an error in jaq, because [] | .[0] | .x is a failure too. However, [] as [$x] | $x yields null for both jq and jaq.

@wader
Copy link
Contributor

wader commented Oct 28, 2024

Yeap noticed :) have done work workaround for it. I pushed my current progress here https://github.com/wader/jqjq/commits/jaq/ but be aware of lots of hacks :) with this at least "1+2" seem to kind of work but other expressions fails.

Will probably continue later today on it.

@wader
Copy link
Contributor

wader commented Oct 28, 2024

Hi @wader, just as I was setting out to look into jqjq myself right now, I find that you have beaten me to it. :) Thanks for uncovering this bug. I have corrected binding behaviour in 4560c86.

That was quick! 🥳 let me know how it goes, and i will probably continue later today also

@01mf02
Copy link
Owner

01mf02 commented Oct 28, 2024

Thanks for mentioning the jaq branch of jqjq.
I found that quite a few things already work, using:

jaq -n -L . 'include "jqjq"; eval($FILTER; {}; {})

This evaluates the filter without using any globals or builtins.

For $FILTER, the following work and yield the expected output:

  • 1+2*3 --- yay, precedences
  • 1.4e2
  • [1, 2]
  • {a: 1, b: (2, 3)}
  • def f: 1; f
  • if true then 0 else 1 end
  • reduce (1, 2) as $x (0; .+$x)
  • 1 | .+2

What does not work:

., .[], .[0], .a, 1 as $x | $x: These fail with: "cannot use null as iterable (array or object)".
For some of these, parsing seems to fail (.), for some others, execution.

@01mf02
Copy link
Owner

01mf02 commented Oct 28, 2024

Concerning the lexing speed:

jaq -L . -n 'include "jqjq"; _builtins_src | lex'

is indeed slower than jq right now (1.2 seconds in jaq vs. 0.4 seconds in jq).
If it is much slower than that for you: Did you compile jaq with cargo build --release?

I just made some small experiment, and the performance problem seems to be related to slow regex execution. So I swapped the underlying regex library in 0840b1b to regex-lite, and this brings down the execution time in jaq to 0.4 seconds as well. :)

@wader
Copy link
Contributor

wader commented Oct 28, 2024

Thanks for mentioning the jaq branch of jqjq. I found that quite a few things already work, using:

jaq -n -L . 'include "jqjq"; eval($FILTER; {}; {})

This evaluates the filter without using any globals or builtins.

Hmm it seem to get stuck with eval/1, will have to be investigated.

For $FILTER, the following work and yield the expected output:

  • 1+2*3 --- yay, precedences
  • 1.4e2
  • [1, 2]
  • {a: 1, b: (2, 3)}
  • def f: 1; f
  • if true then 0 else 1 end
  • reduce (1, 2) as $x (0; .+$x)
  • 1 | .+2

🥳

What does not work:

., .[], .[0], .a, 1 as $x | $x: These fail with: "cannot use null as iterable (array or object)". For some of these, parsing seems to fail (.), for some others, execution.

I guess some null checks etc is needed in some places, i've neem bit sloppy with using null it seems :)

@wader
Copy link
Contributor

wader commented Oct 28, 2024

Concerning the lexing speed:

jaq -L . -n 'include "jqjq"; _builtins_src | lex'

is indeed slower than jq right now (1.2 seconds in jaq vs. 0.4 seconds in jq). If it is much slower than that for you: Did you compile jaq with cargo build --release?

Doh, yes you correct, i build without --release. Now i've updated to latest main and built with release target, much faster 👍

I just made some small experiment, and the performance problem seems to be related to slow regex execution. So I swapped the underlying regex library in 0840b1b to regex-lite, and this brings down the execution time in jaq to 0.4 seconds as well. :)

When i read about regex-list it seem to focus on binary size, compile time and some less features (less unicode support?) over performance? but it was faster in this case also? ... oh they mean compile time as in time compiling a regex not rust build time? :)

BTW for regex heavy filters maybe a regex cache would be an interesting idea? there has been some talks in jq issues about various ways of doing, ex in jqlang/jq#2856, maybe a LRU cache etc? i've also experimented a bit with how /regex/ literals could work in jq, that would be neat and also simplify how to reuse a compiled regex.

@wader
Copy link
Contributor

wader commented Oct 28, 2024

Any idea what is going on with these? https://github.com/wader/jqjq/blob/master/jqjq.jq#L1945-L1946

Without any changes it seem expressions like 1+2 just output 1, but if i add ... | debug or ... | . to both it seem to work. Some tail or return optimisation going wrong? 🤔

@wader
Copy link
Contributor

wader commented Oct 28, 2024

What does not work:

., .[], .[0], .a, 1 as $x | $x: These fail with: "cannot use null as iterable (array or object)". For some of these, parsing seems to fail (.), for some others, execution.

Now all of those work in the jaq branch, and i think all of them were different variants of null index. Current fixes are mostly temp workarounds, should try figure something nicer. I've started to annotate with # TODO: jaq for things that needs a revisit.

@wader
Copy link
Contributor

wader commented Oct 28, 2024

Oh seems eval/1 now works, not sure what fixed it, and also this 🥳

$ jaq -L . 'include "jqjq"; eval("def f: 1,8; [f,f] | map(.+105) | implode")' <<< '{"a":123}'
"jqjq"

@wader
Copy link
Contributor

wader commented Oct 28, 2024

Any plan to add --args, would make it possible to use the jqjq shell wrapper with jaq, ./jqjq --jq jaq .... But beware the semantics of --args, it's a bit confusing. Let me illustrate:

$ jq -n '$ARGS' --args a b c -- d
{
  "positional": [
    "a",
    "b",
    "c",
    "d"
  ],
  "named": {}
}

$ jq -n --args '$ARGS' a b c -- d
{
  "positional": [
    "a",
    "b",
    "c",
    "d"
  ],
  "named": {}
}

So skips -- and the positional are the ones after the first non-option (the filter).

@wader
Copy link
Contributor

wader commented Oct 29, 2024

Found another bug i think, "" | test("") outputs false with jaq, was minified from test("^\\s*$") in the test parse code. I tried to debug the matches function but my rust is not good enough yet. Read the regex crate documentation about empty matches but didn't see that would make it skip them by default.

But with an ugly hack, that is now in the jaq brach, i managed to run the tests! but not sure i would trust them just yet :) i did notice the test result output look a bit wrong so needs some work.

$ jaq -n -L . -rRs 'include "jqjq"; . | jqjq(["", "--run-tests"]; {})' < jqjq.test
<cut>
OK: line 1230: null | eval("eval(\"empty\")") -> null
DIFF: line 1238: null | (.a | eval(".a, .b.c")) = 123 -> {"a":{"a":123,"b":{"c":123}}}
  Expected: {"a":{"a":123,"b":{"c":123}}}
    Actual: 123
OK: line 1244: null | try eval("{a b}") catch "failed" -> "failed"
OK: line 1249: null | try eval(". as {$a $b} | .") catch "failed" -> "failed"
191 of 253 tests passed
null

@01mf02
Copy link
Owner

01mf02 commented Oct 29, 2024

Hi @wader, amazing work! Thanks so much for it.

Doh, yes you correct, i build without --release. Now i've updated to latest main and built with release target, much faster 👍

It has happened to (nearly) everyone ... ;)

When i read about regex-list it seem to focus on binary size, compile time and some less features (less unicode support?) over performance? but it was faster in this case also? ... oh they mean compile time as in time compiling a regex not rust build time? :)

I think that the statement in regex-lite refers to Rust build time. Still, I believe that by having a reduced feature set, the time to compile a regular expression is also reduced. This is probably not advertised by regex-lite because they assume that you are usually not compiling thousands of regular expressions. :) Which brings us to your next point ...

BTW for regex heavy filters maybe a regex cache would be an interesting idea? there has been some talks in jq issues about various ways of doing, ex in jqlang/jq#2856, maybe a LRU cache etc? i've also experimented a bit with how /regex/ literals could work in jq, that would be neat and also simplify how to reuse a compiled regex.

I have already thought about this the last few days. And following your comment, I just implemented regex caching with an LRU cache. However, the result is underwhelming: Using jaq -L . -n 'include "jqjq"; _builtins_src | lex' as benchmark, we get 0.35 seconds without caching and 0.30 seconds with caching.
I also tried jaq -n 'limit(100000; repeat("--")) | match("^-="; "").string', and here, we get 0.20 seconds without caching and 0.16 seconds with caching.

Given that the performance improvement is not that large, I will probably not integrate this into jaq.

For reference, here is the diff to reproduce the experiment:

diff --git a/jaq-std/Cargo.toml b/jaq-std/Cargo.toml
index b9c3f84..db4d8da 100644
--- a/jaq-std/Cargo.toml
+++ b/jaq-std/Cargo.toml
@@ -29,6 +29,7 @@ libm = { version = "0.2.7", optional = true }
 aho-corasick = { version = "1.0", optional = true }
 base64 = { version = "0.22", optional = true }
 urlencoding = { version = "2.1.3", optional = true }
+lru = "0.12.5"
 
 [dev-dependencies]
 jaq-json = { path = "../jaq-json" }
diff --git a/jaq-std/src/regex.rs b/jaq-std/src/regex.rs
index 3378306..e7a9ee5 100644
--- a/jaq-std/src/regex.rs
+++ b/jaq-std/src/regex.rs
@@ -4,7 +4,7 @@ use alloc::string::{String, ToString};
 use alloc::vec::Vec;
 use regex_lite::{self as regex, Error, Regex, RegexBuilder};
 
-#[derive(Copy, Clone, Default)]
+#[derive(Copy, Clone, PartialEq, Eq, Default, Hash)]
 pub struct Flags {
     // global search
     g: bool,
@@ -65,6 +65,25 @@ impl Flags {
         let mut builder = RegexBuilder::new(re);
         self.impact(&mut builder).build()
     }
+
+    #[cfg(feature = "std")]
+    pub fn regex_cached(self, re: &str) -> Result<Regex, Error> {
+        use core::cell::RefCell;
+        use core::num::NonZeroUsize;
+        use lru::LruCache;
+
+        std::thread_local! {
+            static CACHE: RefCell<LruCache<(String, Flags), Result<Regex, Error>>> =
+                RefCell::new(LruCache::new(NonZeroUsize::new(64).unwrap()));
+        }
+
+        CACHE.with(|cache| {
+            cache
+                .borrow_mut()
+                .get_or_insert((re.to_string(), self), || self.regex(re))
+                .clone()
+        })
+    }
 }
 
 /// Mapping between byte and character indices.

@01mf02
Copy link
Owner

01mf02 commented Oct 29, 2024

Without any changes it seem expressions like 1+2 just output 1, but if i add ... | debug or ... | . to both it seem to work. Some tail or return optimisation going wrong? 🤔

Oh, oh, that looks indeed like tail recursion gone wrong. I'm looking into it, but so far, I was not able to reproduce it with a smaller example ...

@01mf02
Copy link
Owner

01mf02 commented Oct 29, 2024

Any plan to add --args, would make it possible to use the jqjq shell wrapper with jaq, ./jqjq --jq jaq ....

That should actually not be that hard to do with trailing_var_arg.

@wader
Copy link
Contributor

wader commented Nov 19, 2024

I noticed that --repl behaves different with jaq, i'm guessing output line buffering difference. Haven't dugg deeper into it yet.

# the "%" is some zsh indicator about missing new line
$ jq -Rjn '"prompt> " | ., input'
prompt> 123
123%

$ jaq -Rjn '"prompt> " | ., input'
123
prompt> 123%
$ jaq -Rjn '"prompt>\n" | ., input'
prompt>
123
123%

@wader
Copy link
Contributor

wader commented Nov 19, 2024

I reformatted it a bit to match the rest of jqjq and also changed sort to unique, there was an issue with deleting the same path in an array more than one time, i luckily for some reason had a test case for that, not sure why.

Is that really correct? Shouldn't you first sort, then unique? Without sort, [0, 1, 2] | delpaths([[1], [0]]) will yield [1] instead of [2], I think.

unique both removes dups and sorts so sorting should not be needed. From jq docs:

The unique function takes as input an array and produces an array of the same elements, in sorted order, with duplicates removed.

01mf02 added a commit that referenced this issue Nov 19, 2024
The motivation is to correctly parse `--args`.
As a nice side effect, this eliminates the dependency on clap as well.
See: <#216 (comment)>
@01mf02
Copy link
Owner

01mf02 commented Nov 19, 2024

After I cleaned up some bugs in jq's option parser (in jqlang/jq#3194), then ported it to jq (in wader/jqjq#19), I started porting it to Rust, since I've wanted to make my own jq implementation for a while. A clone of jq's logic wouldn't be much work. (OsStr::as_encoded_bytes would be useful.)

You are right. I have now rewritten the command-line argument parser, see #238.

@01mf02
Copy link
Owner

01mf02 commented Nov 19, 2024

unique both removes dups and sorts so sorting should not be needed.

You are right; my brain was in Rust mode.

@01mf02
Copy link
Owner

01mf02 commented Nov 19, 2024

I noticed that --repl behaves different with jaq, i'm guessing output line buffering difference. Haven't dugg deeper into it yet.

# the "%" is some zsh indicator about missing new line
$ jq -Rjn '"prompt> " | ., input'
prompt> 123
123%

$ jaq -Rjn '"prompt> " | ., input'
123
prompt> 123%
$ jaq -Rjn '"prompt>\n" | ., input'
prompt>
123
123%

Your intuition is correct; replacing print with the following function in jaq/src/main.rs, your example works:

fn print(writer: &mut impl Write, cli: &Cli, val: &Val) -> io::Result<()> {
    let f = |f: &mut Formatter| fmt_val_root(f, cli, val);
    write!(writer, "{}", FormatterFn(f))?;
    writer.flush()
}

This might be a good time to implement --unbuffered ... but not today. :)

@wader
Copy link
Contributor

wader commented Nov 19, 2024

setpath/2 and getpath/1 should both be easy to implement by definition. However, path/1 is impossible to define by
Hmmm ... I have to meditate over this a bit. I really do not like paths ...

👍 jqjq can always reimplement for now, and yes i can understand that it's a bit weird to have just some path related functions and not all.

$ jaq -cn -L . 'include "jqjq"; {a:0, b:1} | eval(".a, .b") += 1'
thread 'main' panicked at /Users/wader/src/jaq/jaq-core/src/filter.rs:369:17:
not implemented: updating with this operator is not supported yet
note: run with RUST_BACKTRACE=1 environment variable to display a backtrace

I suppose that you used your _getpath as getpath here. It is not surprising that this does not work, because getpath uses reduce, and AFAIK, in both jq and jaq, you cannot update with reduce ... (...) |= f.

Ok i see, yeap used the reimplementations. But shouldn't it ideally cause some jq runtime error instead of panic?

@01mf02
Copy link
Owner

01mf02 commented Nov 26, 2024

👍 jqjq can always reimplement for now, and yes i can understand that it's a bit weird to have just some path related functions and not all.

Yes.

But shouldn't it ideally cause some jq runtime error instead of panic?

Yes. Originally, I left the panic at that place because I wasn't sure whether it was possible to perform updates (p |= f) with certain filters (p) on the left-hand side. Now I'm rather sure that it is not possible in jaq's update model to have label $x | ... and try ... catch ... on the left-hand side of updates, whereas reduce and foreach could be possible in principle. However, because updating with reduce and foreach is not possible in jq neither, I have decided to do the same thing for all these filters, namely to return a path expression error, like jq. c41632d

@01mf02
Copy link
Owner

01mf02 commented Nov 26, 2024

I noticed that --repl behaves different with jaq, i'm guessing output line buffering difference. Haven't dugg deeper into it yet.

I corrected this in a51d8da. Thanks again for bringing this up, and for all the other issues you reported. All your feedback has really helped me to improve the quality of jaq! I am very grateful for your help.

@01mf02
Copy link
Owner

01mf02 commented Nov 26, 2024

By the way, I am planning to release jaq 2.0 this week, probably tomorrow. I was thinking about mentioning that jaq can now run jqjq. Are there plans from your side to merge jqjq's jaq branch into its master branch, or should I give instructions in the release notes for jaq 2.0 to continue using jqjq's jaq branch?

@wader
Copy link
Contributor

wader commented Nov 26, 2024

But shouldn't it ideally cause some jq runtime error instead of panic?

Yes. Originally, I left the panic at that place because I wasn't sure whether it was possible to perform updates (p |= f) with certain filters (p) on the left-hand side. Now I'm rather sure that it is not possible in jaq's update model to have label $x | ... and try ... catch ... on the left-hand side of updates, whereas reduce and foreach could be possible in principle. However, because updating with reduce and foreach is not possible in jq neither, I have decided to do the same thing for all these filters, namely to return a path expression error, like jq. c41632d

👍 when messing with reduce/foreach for jqjq i did notice that it's possible for them to be valid path expressions but it might not be very useful:

$ jq -cn '{a:1} | path((foreach (1,2,3) as $v (.;.)).a)'
["a"]
["a"]
["a"]
$ jq -cn '{a:1} | (foreach (1,2,3) as $v (.;.)).a += 1'
{"a":4}

$ jq -cn '{a:1} | path((reduce (1,2,3) as $v (.;.)).a)'
["a"]
$ jq -cn '{a:1} | (reduce (1,2,3) as $v (.;.)).a += 1'
{"a":2}

@wader
Copy link
Contributor

wader commented Nov 26, 2024

I noticed that --repl behaves different with jaq, i'm guessing output line buffering difference. Haven't dugg deeper into it yet.

I corrected this in a51d8da. Thanks again for bringing this up, and for all the other issues you reported. All your feedback has really helped me to improve the quality of jaq! I am very grateful for your help.

Nice. Looking at the jq code i wonder how this ends up working, can only see that --unbuffered flush stdout and don't see it write directly to fd:s... must be missing something

@wader
Copy link
Contributor

wader commented Nov 26, 2024

By the way, I am planning to release jaq 2.0 this week, probably tomorrow. I was thinking about mentioning that jaq can now run jqjq. Are there plans from your side to merge jqjq's jaq branch into its master branch, or should I give instructions in the release notes for jaq 2.0 to continue using jqjq's jaq branch?

🥳 good you reminded me, yes hope i can get some time tonight to cleanup and merge things to master. it's mostly the null index workarounds i've been a bit reluctant about how to fix but i can probably come up with something.

@wader
Copy link
Contributor

wader commented Nov 26, 2024

@01mf02 jqjq master should now work with jaq 2.0. there are still some TODOs left but i think they can wait

@01mf02
Copy link
Owner

01mf02 commented Nov 27, 2024

👍 when messing with reduce/foreach for jqjq i did notice that it's possible for them to be valid path expressions but it might not be very useful:

$ jq -cn '{a:1} | path((foreach (1,2,3) as $v (.;.)).a)'
["a"]
["a"]
["a"]
$ jq -cn '{a:1} | (foreach (1,2,3) as $v (.;.)).a += 1'
{"a":4}

$ jq -cn '{a:1} | path((reduce (1,2,3) as $v (.;.)).a)'
["a"]
$ jq -cn '{a:1} | (reduce (1,2,3) as $v (.;.)).a += 1'
{"a":2}

Oh, that's interesting! However, this seems to be quite limited; consider the following example:

$ jq -n '[[[2], 1], 0] | foreach (0, 0) as $x (.; .[$x]) |= .'
jq: error (at <unknown>): Invalid path expression near attempt to access element 0 of [[2],1]

@wader
Copy link
Contributor

wader commented Nov 27, 2024

Oh, that's interesting! However, this seems to be quite limited; consider the following example:

$ jq -n '[[[2], 1], 0] | foreach (0, 0) as $x (.; .[$x]) |= .'
jq: error (at <unknown>): Invalid path expression near attempt to access element 0 of [[2],1]

Yeap agree, not sure how useful it is, but was a bit surprised why i wouldn't work as long as you make sure to only index into the input. Intentional or bug?

$ ./jqjq -cn '[[2], 1] | path(reduce (0, 0) as $x (.; .[$x]))'
[0,0]
$ ./jqjq -cn '[[2], 1] | (reduce (0, 0) as $x (.; .[$x])) |= 3'
[[3],1]
$ ./jqjq -cn '[[2], 1] | path(foreach (0, 0) as $x (.; .[$x]))'
[0]
[0,0]
# 🤔
$ ./jqjq -cn '[[2], 1] | (foreach (0, 0) as $x (.; .[$x])) = .'
[[[[2],1],1],1]

@01mf02
Copy link
Owner

01mf02 commented Nov 27, 2024

Oh, this works in jqjq! Just checked gojq:

$ bin/gojq-0.12.16 -n '[[[2], 1], 0] | foreach (0, 0) as $x (.; .[$x]) |= .'
gojq: invalid path against: array ([[2],1])

That means that to my knowledge, jqjq is the only jq implementation that handles nontrivial paths in reduce/foreach. Congratulations! 🥳

@01mf02
Copy link
Owner

01mf02 commented Nov 27, 2024

@01mf02 jqjq master should now work with jaq 2.0. there are still some TODOs left but i think they can wait

Thanks a lot! I just tested it and it seems to work! (Not sure whether you know it, but the "fib.bf" example from the README seems to run out of memory for both jaq and jq.)

@wader
Copy link
Contributor

wader commented Nov 27, 2024

Oh, this works in jqjq! Just checked gojq:

$ bin/gojq-0.12.16 -n '[[[2], 1], 0] | foreach (0, 0) as $x (.; .[$x]) |= .'
gojq: invalid path against: array ([[2],1])

That means that to my knowledge, jqjq is the only jq implementation that handles nontrivial paths in reduce/foreach. Congratulations! 🥳

😅 part of implement jqjq was to learn more about those part of jq so i focused quite a lot to make it work... and found quite a lot of bugs :) it even works with eval/1 btw! 🥳 sadly via getpath as jq didn't like having majority of jqjq as a path expression 😬

@wader
Copy link
Contributor

wader commented Nov 27, 2024

@01mf02 jqjq master should now work with jaq 2.0. there are still some TODOs left but i think they can wait

Thanks a lot! I just tested it and it seems to work! (Not sure whether you know it, but the "fib.bf" example from the README seems to run out of memory for both jaq and jq.)

👍 did't know fib was OOM:ing but i know jqjq via jqjq does atm, i used to work before some try/catch change i think that for some reason caused lots of more memory usage, would be nice to make it work again.

@wader
Copy link
Contributor

wader commented Nov 27, 2024

Planning on doing a release today? let me know if i can help out with something

@01mf02
Copy link
Owner

01mf02 commented Nov 27, 2024

Planning on doing a release today? let me know if i can help out with something

I've just created the release. :) If you are motivated to do this, it would be definitely helpful to post the news about the release to some site where people could be interested about this release, e.g. Hacker News. (I'm having a bit of trouble doing this kind of "publicity".) I'll write a post on Reddit later today --- stepping out of my comfort zone for this. 😬

@itchyny
Copy link
Contributor

itchyny commented Nov 27, 2024

The path filter should emit error if the value at which indexing begins is different from the value on which the path filter was called. For example, in [{"x":1}] | path(.[0] as $x | $x | .x), the indexing .x is called on {"x":1}, which is different from [{"x":1}], so should be an error. If not, it emits the result ["x"], which is an invalid path for [{"x":1}]. The foreach example is like [[[2], 1], 0] | (.[0] as $x | $x | .[0]) = 1, which should be an error.

@wader
Copy link
Contributor

wader commented Nov 27, 2024

Planning on doing a release today? let me know if i can help out with something

I've just created the release. :) If you are motivated to do this, it would be definitely helpful to post the news about the release to some site where people could be interested about this release, e.g. Hacker News. (I'm having a bit of trouble doing this kind of "publicity".) I'll write a post on Reddit later today --- stepping out of my comfort zone for this. 😬

Will do, have already posted to jq discord and hacker news https://news.ycombinator.com/item?id=42254530 will post on mastodon and maybe some more places also. You should definitely write some posts about it! i'm very curious and hopeful that ppl will build stuff around the new data format agnostic support.

@wader
Copy link
Contributor

wader commented Nov 27, 2024

Posted on mastodon https://fosstodon.org/@wader/113554288216205847

@wader
Copy link
Contributor

wader commented Nov 27, 2024

The path filter should emit error if the value at which indexing begins is different from the value on which the path filter was called. For example, in [{"x":1}] | path(.[0] as $x | $x | .x), the indexing .x is called on {"x":1}, which is different from [{"x":1}], so should be an error. If not, it emits the result ["x"], which is an invalid path for [{"x":1}]. The foreach example is like [[[2], 1], 0] | (.[0] as $x | $x | .[0]) = 1, which should be an error.

By "if the value at which indexing begins is different from the value on which the path filter was called" you mean the value that beings each foreach/reduce iteration? my thinking is that path(reduce (0, 0) as $_ (.; .[0])) is more or less path(.[0] | .[0]).

@01mf02
Copy link
Owner

01mf02 commented Nov 29, 2024

Will do, have already posted to jq discord and hacker news https://news.ycombinator.com/item?id=42254530 will post on mastodon and maybe some more places also. You should definitely write some posts about it! i'm very curious and hopeful that ppl will build stuff around the new data format agnostic support.

Thanks a lot for this!

@wader
Copy link
Contributor

wader commented Nov 29, 2024

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants