-
Notifications
You must be signed in to change notification settings - Fork 74
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add formatters. #105
Add formatters. #105
Conversation
jaq-core/src/lib.rs
Outdated
match cell { | ||
Val::Str(s) => { | ||
record.push_field(s); | ||
Ok(()) | ||
} | ||
_ => Err(Error::Str(cell.clone())), | ||
}?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You could shorten this a lot by using cell.as_str()
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This remark of mine is probably obsolete if you consider my comment below.
jaq-core/src/lib.rs
Outdated
box_once(match cv.1 { | ||
Val::Str(s) => Ok(Val::str(urlencoding::encode(&s).to_string())), | ||
_ => Err(Error::Str(cv.1)), | ||
}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You frequently have this pattern where you check if the input value is a string, and if it is, then run some function on the string and return another string.
I would create a function in Val
that encodes this pattern, similarly to mutate_str()
. You could call it map_str
, for example. For functions that can fail during this conversion, like fromuri
, you could introduce another function like try_map_str()
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This remark of mine is probably obsolete if you consider my comment below.
Thanks a lot, this is pretty cool work! |
I think I managed to follow your pointers. However clippy was complaining about naming EDIT: I also tried the |
jaq-core/src/lib.rs
Outdated
v.clone() | ||
.to_str() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's better to use as_str() here and clone only conditionally.
jaq-core/src/lib.rs
Outdated
fn stringify(v: Val) -> String { | ||
v.clone() | ||
.to_str() | ||
.map_or_else(|_| v.to_string(), |s| s.to_string()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would replace s.to_string() by something like s.clone() here, so it is clear that we are really just reusing the string that is there and not creating an escaped version of it.
My impression (perhaps mistaken) is that this PR strays quite widely from jq's "@foo" operators. As the jq manual says, "The @foo syntax is used to format and escape strings". This means that one can write expressions such as:
That is, |
jaq-core/src/lib.rs
Outdated
@@ -402,3 +404,82 @@ const LOG: &[(&str, usize, RunPtr, UpdatePtr)] = &[( | |||
|_, cv| box_once(Ok(debug(cv.1))), | |||
|_, cv, f| f(debug(cv.1)), | |||
)]; | |||
|
|||
#[cfg(feature = "format")] | |||
fn stringify(v: Val) -> String { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would actually make this a member function of Val, because it is a functionality which will be used implicitly in the interpreter. That is, "world" | "Hello, \(.)"
will use this function for interpolation.
However I would rename this function; what do you think about to_string_or_clone()? I know it's a bit longer than stringify, but at least it's a bit more descriptive and clarifies the difference with respect to to_string().
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As a testament to my inexperience with Rust library design, I really screwed this up w.r.t. using Rc
and cloning everywhere. I tried the Cow
route and the AsRef
route, but failed to make it "nice" when calling the function and Val::str
.
Anyway, feel free to rework it all to reduce unnecessary cloning and such.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see a lot of unnecessary cloning. Don't worry, I think you're doing a good job.
jaq-core/src/lib.rs
Outdated
}), | ||
("@sh", 0, |_, cv| { | ||
box_once(Ok(Val::str( | ||
shell_escape::escape(stringify(cv.1).into()).to_string(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use into_owned() instead of to_string()?
jaq-core/src/lib.rs
Outdated
) | ||
}), | ||
("@csv", 0, |_, cv| { | ||
box_once(cv.1.as_arr().and_then(|vs| to_csv(vs, b','))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add a test that shows what happens when the input contains commas (inside a string)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't know this, but apparently commas don't need to be escaped within quotes (source: https://datatracker.ietf.org/doc/html/rfc4180#section-2). The added test reflects this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
jaq-core/src/lib.rs
Outdated
.map_err(Error::from_any) | ||
.map(|s| Val::str(s.to_string())) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use map_or_else() and to_ owned()?
jaq-parse/src/token.rs
Outdated
@@ -106,6 +106,8 @@ pub fn token() -> impl Parser<char, Token, Error = Simple<char>> { | |||
|
|||
let var = just('$').ignore_then(text::ident()); | |||
|
|||
let format_filter = just('@').chain(text::ident()).collect(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you use just('@').or_not()
to make this more general and then use it instead of text::ident() to define ident below?
Correct. This PR adds support for the first case only. However, I do already have an idea how this can be extended in the future in order to also support the second case, that is, string interpolation with custom formatting as prefix operator like in your example. |
@01mf02 - Thanks for your fast response, which was very welcome in several respects! Perhaps you missed my implied point, which was to highlight the tension that is created by the That is, identifiers starting with "@" are (in jq) very special in that they can be used as prefix operators. Of course, whatever potential confusion this might introduce in jaq can be mitigated by suitable documentation. |
I plan to keep this particular behaviour; that is, only filters whose name starts with an @ should be usable as string prefix operators.
I agree. |
jaq-core/src/lib.rs
Outdated
@@ -402,3 +404,76 @@ const LOG: &[(&str, usize, RunPtr, UpdatePtr)] = &[( | |||
|_, cv| box_once(Ok(debug(cv.1))), | |||
|_, cv, f| f(debug(cv.1)), | |||
)]; | |||
|
|||
#[cfg(feature = "format")] | |||
fn to_csv(vs: &[Val], delimiter: u8) -> ValR { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you make this function return a string result instead of a value result? In general, when a function returns only one kind of value, I return this directly instead of wrapping it in a value.
jaq-interpret/src/path.rs
Outdated
Ok(Val::str( | ||
s.chars().skip(skip).take(take).collect::<String>(), | ||
)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This doesn't seem to be related to formatters?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, that's because you modified the str function.
I think that accepting parameters implementing Into should be done quite sparingly, as it can require more type annotations, like here. May I ask you for your motivation for having changed the signature of the function?
jaq-interpret/src/val.rs
Outdated
pub fn str(s: impl Into<String>) -> Self { | ||
Self::Str(Rc::new(s.into())) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm sorry, but I do not like this change because it makes the string function different from all other similar functions like arr, obj and so on. This might be useful, but only if all functions are changed accordingly, which I would however not do as part of this PR. Let's keep it small.
jaq-interpret/src/val.rs
Outdated
/// it to string. | ||
pub fn to_string_or_clone(self) -> String { | ||
match self { | ||
Self::Str(s) => s.to_string(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here I would use to_owned().
Otherwise, this will make people (like me ^^) reading this wonder whether this will escape the string.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll have to go with (*s).clone()
since to_owned()
over an Rc
yields another Rc
. I think.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That sounds good to me!
jaq-parse/src/token.rs
Outdated
let ident = just('@') | ||
.or_not() | ||
.chain::<char, String, _>(text::ident()) | ||
.collect() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To reduce the size of this change, could you terminate the definition of ident after collect(), then redefine ident using the previously defined variable ident?
So, about |
That's nice of you to ask. :) I was actually thinking about exactly this question a few minutes ago. I think I prefer the option that produces fewer characters of code. I think that indeed, here map_err + map might be better. |
Amazingly, it took me this long to check de official definition of the
Also there's this PR in jq: jqlang/jq#2828 So possibly that one requires a heavy rewrite. Perhaps it's best to wait for jq's PR to land? |
@kklingenberg, I have a few comments regarding this PR:
|
I thank you for all the effort you put into fixing this; I expected it would be messy from the start. It probably ended up being more work to guide me through this, than to simply implement it yourself. I tried to emulate the behaviour of jq's Also about Anyway, I hereby yield control of this PR, as I don't have the required motivation to continue. Thanks again! |
Thank you for having attacked this functionality. Even if it took some of my time indeed, it is important for me that other people than me become familiar with my philosophy of working on jaq, and I hope that you learned something from working on this.
Platform-specific output is a no-go. It is also not what jq does, quoth the jq documentation: "The input is escaped suitable for use in a command-line for a POSIX shell."
That's good.
Thanks to you too! |
This PR adds the
@
filters that usually format things as strings using specific encodings, but also some that decode strings already encoded.It is the absolute laziest version of this PR. I won't disagree if you decide to close this right away, or park it forever. It is lazy because:
@csv
and@tsv
implementations are especially bad because they allocate csv::Writer for every element of the input stream.@
, because I didn't find an elegant way of limiting the@
symbol to core-filters only. So users can define filters starting with@
now, which is not possible in jq or gojq.I added a bunch of non-standard core filters that are referenced by the std implementations of the@
filters.I added a bunch of specific error variants, to capture Err variants from the various libraries that I wasn't quite sure I could just unwrap().If you still want to salvage this you can take over, or point me to ways of fixing stuff.