You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
❯ hyperfine -M 3 -L prog "jq,./target/release/jaq" -- "zstdcat ~/tmp/json_logs/conn_big.log.zst | {prog} -cr .proto > /dev/null"
Benchmark 1: zstdcat ~/tmp/json_logs/conn_big.log.zst | jq -cr .proto > /dev/null
Time (mean ± σ): 25.292 s ± 0.059 s [User: 26.299 s, System: 0.617 s]
Range (min … max): 25.233 s … 25.351 s 3 runs
Benchmark 2: zstdcat ~/tmp/json_logs/conn_big.log.zst | ./target/release/jaq -cr .proto > /dev/null
Time (mean ± σ): 38.434 s ± 0.123 s [User: 37.036 s, System: 3.545 s]
Range (min … max): 38.304 s … 38.547 s 3 runs
Summary
'zstdcat ~/tmp/json_logs/conn_big.log.zst | jq -cr .proto > /dev/null' ran
1.52 ± 0.01 times faster than 'zstdcat ~/tmp/json_logs/conn_big.log.zst | ./target/release/jaq -cr .proto > /dev/null'
Since this is often the only thing I do, I have a stupid simple tool called json-cut that does this using github.com/buger/jsonparser which shows how fast this sort of thing can be..
❯ hyperfine -M 3 -L prog "~/projects/json-cut/json-cut proto" -- "zstdcat ~/tmp/json_logs/conn_big.log.zst | {prog} > /dev/null"
Benchmark 1: zstdcat ~/tmp/json_logs/conn_big.log.zst | ~/projects/json-cut/json-cut proto > /dev/null
Time (mean ± σ): 2.077 s ± 0.005 s [User: 3.458 s, System: 0.358 s]
Range (min … max): 2.071 s … 2.080 s 3 runs
I notice the high sys time in this case appears to be from not using buffered IO on stdout. applying this small change makes it 8% faster overall:
diff --git a/jaq/src/main.rs b/jaq/src/main.rs
index 02b2390..2ec3e0b 100644
--- a/jaq/src/main.rs
+++ b/jaq/src/main.rs
@@ -3,6 +3,7 @@ use jaq_interpret::{Ctx, Filter, FilterT, ParseCtx, RcIter, Val};
use std::io::{self, BufRead, Write};
use std::path::PathBuf;
use std::process::{ExitCode, Termination};
+use std::io::BufWriter;
#[cfg(feature = "mimalloc")]
#[global_allocator]
@@ -455,10 +456,11 @@ fn print(cli: &Cli, val: Val, writer: &mut impl Write) -> io::Result<()> {
Ok(())
}
-fn with_stdout<T>(f: impl FnOnce(&mut io::StdoutLock) -> Result<T, Error>) -> Result<T, Error> {
- let mut stdout = io::stdout().lock();
- let y = f(&mut stdout)?;
- stdout.flush()?;
+fn with_stdout<T>(f: impl FnOnce(&mut BufWriter<io::StdoutLock>) -> Result<T, Error>) -> Result<T, Error> {
+ let stdout = io::stdout().lock();
+ let mut buffered = BufWriter::new(stdout);
+ let y = f(&mut buffered)?;
+ buffered.flush()?;
Ok(y)
}
that should probably only buffer stdout if stdout is not a tty though.
❯ hyperfine -M 3 -L prog "./target/release/jaq.orig,./target/release/jaq" -- "zstdcat ~/tmp/json_logs/conn_big.log.zst | {prog} -cr .proto > /dev/null"
Benchmark 1: zstdcat ~/tmp/json_logs/conn_big.log.zst | ./target/release/jaq.orig -cr .proto > /dev/null
Time (mean ± σ): 38.410 s ± 0.136 s [User: 36.983 s, System: 3.557 s]
Range (min … max): 38.259 s … 38.523 s 3 runs
Benchmark 2: zstdcat ~/tmp/json_logs/conn_big.log.zst | ./target/release/jaq -cr .proto > /dev/null
Time (mean ± σ): 35.687 s ± 0.202 s [User: 36.368 s, System: 1.301 s]
Range (min … max): 35.497 s … 35.899 s 3 runs
Summary
'zstdcat ~/tmp/json_logs/conn_big.log.zst | ./target/release/jaq -cr .proto > /dev/null' ran
1.08 ± 0.01 times faster than 'zstdcat ~/tmp/json_logs/conn_big.log.zst | ./target/release/jaq.orig -cr .proto > /dev/null'
The text was updated successfully, but these errors were encountered:
conn_big.log.zstd is a json log containing 7,620,480 rows like
Since this is often the only thing I do, I have a stupid simple tool called json-cut that does this using github.com/buger/jsonparser which shows how fast this sort of thing can be..
I notice the high sys time in this case appears to be from not using buffered IO on stdout. applying this small change makes it 8% faster overall:
that should probably only buffer stdout if stdout is not a tty though.
The text was updated successfully, but these errors were encountered: