sqawk #6

danmbox · 2015-04-24T11:18:55Z

Have you noticed https://github.com/dbohdan/sqawk? There's a comparison out there, thechangelog/ping#132; it might make sense to adopt some features, like a column that equals the entire line (unsplit) and using regexes as field / column separators.

tobimensch · 2015-04-24T12:19:09Z

hi,

yes I noticed it as well as a few other alternatives.
I'm definitely looking forward to "steal" useful features from those alternatives, there are also some features which I had already planned for termsql before I saw them anywhere else, so it's really "multiple invention" rather than stealing I guess. :-)

Btw. termsql should be able to perform table joins from multiple files, but it does involve
an extra step. (1. output first table database to file with -o option 2. write second table with different name -t to same -o option database and perform the join) I'm looking to simplify this.

Allowing regex and other options for spliting the input also is a useful feature that will probably eventually end up in termsql.

As for keeping the original line in the table. ... Ok, I see how this makes sense when you name your tool sqawk and you want to emulate awk, but for what use case might this actually be useful? (Please someone give me an example) This feature could also be added to termsql, but I'd first like to know why and what for.

Next up I plan to add some further simplifications, for example I'm thinking about changing col0, col1 default names to c0,c1 simply so that people need to type less. And other nice simplifications you can see in the roadmap or that I've in my mind.

Btw. if you think you can contribute (ideas or code), you're definitely welcome.

danmbox · 2015-04-24T13:55:00Z

I'd suggest a1, a2, a3 instead of c0, c1, c2, for convergence with sqawk :)

As for a0 (= entire line), it would be useful for sort, uniq, wc -l, cut -cM-N and similar... E.g.
select count (distinct substr(a0, 10, 3)) from a where a0 ilike 'WARNING: %'
to count 3-letter warning codes following WARNING... See also the examples on sqawk frontpage involving a0.

danmbox · 2015-04-24T14:00:43Z

BTW, if you want to add multiple files/tables, you would also need a more sophisticated naming convention (like sqawk's a1, a2, b1, b2 etc)

dbohdan · 2015-04-24T20:43:30Z

Hey, everyone. I noticed this issue referenced at thechangelog/ping#132 and thought I'd drop by. :-)

@tobimensch, if you are looking for more projects to "steal" features from you may find my list useful. I thinking of taking --merge from termsql myself. :-)

tobimensch · 2015-04-25T04:20:10Z

@dbohdan

If you "steal" --merge, then at least do it right. It's not merging the n last columns, it's merging all columns from the nth column to the last. The background being that filenames sometimes have spaces in them, and so it's unpredictable how many columns are created, but it is predictable in what column they start. After merging you should have the correct filenames in the table, see the example in the termsql manual.

dbohdan · 2015-04-25T07:06:22Z

@tobimensch Right, that is how I would implement it. I did notice that my description at thechangelog/ping#132 (comment) was wrong, however; I have corrected it.

I think --merge can be improved a bit by letting the user specify a range of columns to merge, e.g., 3-5 or 8-. In the latter case it would merge all the columns from the eighth to the last (similar to how arguments to cut(1) work on *nix).

danmbox · 2015-04-25T09:16:52Z

@dbohdan if we're getting fancy, the user might want to merge all but the first 5 and last 2 columns. I remember having this problem with cut. But can't this be solved by a filter prior to the sqawk command?

dbohdan · 2015-04-25T09:43:37Z

@danmbox Good idea. Tcl's list range procedure lets you get that subrange of elements from a list with lrange $list 5 end-2; one could adopt the end-n notation for merge ranges. It may actually be better to integrate such a filter into the program itself since it would be specific to its field splitting mechanism.

danmbox · 2015-04-25T09:50:59Z

... or you might want NF, in keeping with your AWK theme :)

tobimensch · 2015-04-25T10:22:10Z

Meanwhile I stole the split by regex feature. Not updating the manual yet as I consider it still a little experimental, but from what little testing I have done it seems to work.

The fancier --merge syntax is probably a good idea, at least 3-5 or -4 type syntax makes sense, although I'd still like to see some concrete usecases (Be it just so I can update my examples list). I think I'll leave 8- as the default when the user just inputs 8 without the -, so that it keeps being just 8. Could also support a comma separated list of merges. But that's really getting a little complicated... like -r '-2,4-6,9'

danmbox · 2015-04-25T10:35:12Z

Thanks, regex is really useful! Without it you can't even distinguish fixed-column width and single-space-delimited formats for example.

It's always possible to leave enhancements for later, when somebody actually requires them. I remember having this problem with cut (need all but last N fields) but I can't remember why.

dbohdan · 2015-04-25T12:50:06Z

@danmbox Good idea about NF. I've implemented range merging in Sqawk, albeit only for number-number ranges for now.

@tobimensch Myself, I've decided to support two syntaxes: merge=1-2,3-4,5-6 and merge=1 2 3 4 5 6. The latter is the natural list format in Tcl, so the former is transformed into it if detected.

tobimensch · 2015-04-27T17:31:59Z

@dbohdan
Will you keep that list updated? I realize this is just a blog post, but people might end up referencing this list in the future. A wiki would be an ideal place for something like that.

dbohdan · 2015-04-27T19:26:05Z

@tobimensch

A wiki would be an ideal place for something like that.

I completely agree. I made a GitHub wiki for it at https://github.com/dbohdan/structured-text-tools/wiki with the content in the post plus an update on Sqawk and termsql. You should be able to edit the wiki as long as you have a GitHub account.

tobimensch · 2015-04-27T20:30:47Z

Nice :-)

tobimensch · 2015-04-28T10:26:36Z

@danmbox
I implemented the "entire line" feature.

Comparing with sqawk examples:

sqawk -1 -OFS ' -- ' 'select a0, count(*) from a group by a0 having count(*) > 1' < file
termsql -R 'select raw,count(*) group by raw having count(*) > 1' < file
sqawk "select count (distinct substr(a0, 10, 3)) from a where a0 like 'WARNING: %'"
termsql -R "select count (distinct substr(raw, 10, 3)) where raw like 'WARNING: %'"

By the way. You could've always simply used the --line-as-colums feature to achieve the same thing;

termsql -l1 'select col0,count(*) group by col0 having count(*) > 1' < file

This is actually closer to sqawk -1, because it doesn't split stuff into fields, while -R/--raw is closer to the default mode of sqawk.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

sqawk #6

sqawk #6

danmbox commented Apr 24, 2015

tobimensch commented Apr 24, 2015

danmbox commented Apr 24, 2015

danmbox commented Apr 24, 2015

dbohdan commented Apr 24, 2015

tobimensch commented Apr 25, 2015

dbohdan commented Apr 25, 2015

danmbox commented Apr 25, 2015

dbohdan commented Apr 25, 2015

danmbox commented Apr 25, 2015

tobimensch commented Apr 25, 2015

danmbox commented Apr 25, 2015

dbohdan commented Apr 25, 2015

tobimensch commented Apr 27, 2015

dbohdan commented Apr 27, 2015

tobimensch commented Apr 27, 2015

tobimensch commented Apr 28, 2015

sqawk #6

sqawk #6

Comments

danmbox commented Apr 24, 2015

tobimensch commented Apr 24, 2015

danmbox commented Apr 24, 2015

danmbox commented Apr 24, 2015

dbohdan commented Apr 24, 2015

tobimensch commented Apr 25, 2015

dbohdan commented Apr 25, 2015

danmbox commented Apr 25, 2015

dbohdan commented Apr 25, 2015

danmbox commented Apr 25, 2015

tobimensch commented Apr 25, 2015

danmbox commented Apr 25, 2015

dbohdan commented Apr 25, 2015

tobimensch commented Apr 27, 2015

dbohdan commented Apr 27, 2015

tobimensch commented Apr 27, 2015

tobimensch commented Apr 28, 2015