Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Skip intermediate coord representation #49

Merged
merged 1 commit into from
May 11, 2024
Merged

Conversation

michaelkirk
Copy link
Member

We can just build up the coords directly rather than building up arrays and then converting them to Coords.

One more little perf boost. I tried this change previously, but it was causing a surprising regression in the encode benchmarks (even though this only touches a decode code path).

My suspicion is that it was the same perf cliff in #48. I really don't know, but I'm imaging some incidental code layout thing that I was on the edge of. Now that that we've dived off that cliff anyway, this seems like strictly a (small) improvement.

$ cargo bench --bench="*" --  --baseline=sha-6872788
   Compiling polyline v0.10.2 (/Users/mkirk/src/georust/polyline)
    Finished `bench` profile [optimized] target(s) in 0.94s
     Running benches/benchmarks.rs
(target/release/deps/benchmarks-8b91c88197106af3)
encode 10_000 coordinates at precision 1e-5
                        time:   [180.56 µs 180.85 µs 181.16 µs]
                        change: [-1.3886% -1.0790% -0.7773%] (p = 0.00 < 0.05)
                        Change within noise threshold.
Found 5 outliers among 100 measurements (5.00%)
  2 (2.00%) low mild
  3 (3.00%) high mild

encode 10_000 coordinates at precision 1e-6
                        time:   [219.72 µs 220.82 µs 221.84 µs]
                        change: [+0.5301% +1.0508% +1.5979%] (p = 0.00 < 0.05)
                        Change within noise threshold.

decode 10_000 coordinates at precision 1e-5
                        time:   [80.937 µs 81.828 µs 82.713 µs]
                        change: [-4.9284% -3.9764% -2.9753%] (p = 0.00 < 0.05)
                        Performance has improved.

decode 10_000 coordinates at precision 1e-6
                        time:   [97.706 µs 98.786 µs 99.825 µs]
                        change: [-3.1947% -2.2649% -1.3617%] (p = 0.00 < 0.05)
                        Performance has improved.

We can just build up the coords directly rather than building up arrays
and then converting them to Coords.

```
$ cargo bench --bench="*" --  --baseline=sha-6872788
   Compiling polyline v0.10.2 (/Users/mkirk/src/georust/polyline)
    Finished `bench` profile [optimized] target(s) in 0.94s
     Running benches/benchmarks.rs
(target/release/deps/benchmarks-8b91c88197106af3)
encode 10_000 coordinates at precision 1e-5
                        time:   [180.56 µs 180.85 µs 181.16 µs]
                        change: [-1.3886% -1.0790% -0.7773%] (p = 0.00 < 0.05)
                        Change within noise threshold.
Found 5 outliers among 100 measurements (5.00%)
  2 (2.00%) low mild
  3 (3.00%) high mild

encode 10_000 coordinates at precision 1e-6
                        time:   [219.72 µs 220.82 µs 221.84 µs]
                        change: [+0.5301% +1.0508% +1.5979%] (p = 0.00 < 0.05)
                        Change within noise threshold.

decode 10_000 coordinates at precision 1e-5
                        time:   [80.937 µs 81.828 µs 82.713 µs]
                        change: [-4.9284% -3.9764% -2.9753%] (p = 0.00 < 0.05)
                        Performance has improved.

decode 10_000 coordinates at precision 1e-6
                        time:   [97.706 µs 98.786 µs 99.825 µs]
                        change: [-3.1947% -2.2649% -1.3617%] (p = 0.00 < 0.05)
                        Performance has improved.
```
@michaelkirk michaelkirk requested a review from urschrei May 11, 2024 20:53
@urschrei
Copy link
Member

urschrei commented May 11, 2024

critcmp main decode (decode being this PR):

group                                          decode                                 main
-----                                          ------                                 ----
decode 10_000 coordinates at precision 1e-5    1.01     71.0±1.74µs        ? ?/sec    1.00     70.6±2.11µs        ? ?/sec
decode 10_000 coordinates at precision 1e-6    1.00     86.3±1.87µs        ? ?/sec    1.01     86.8±2.23µs        ? ?/sec
encode 10_000 coordinates at precision 1e-5    1.00     83.8±1.58µs        ? ?/sec    1.00     83.5±0.70µs        ? ?/sec
encode 10_000 coordinates at precision 1e-6    1.05    100.9±1.26µs        ? ?/sec    1.00     96.4±2.14µs        ? ?/sec

So a 5 % hit on my machine on M2 arm64 on the last benchmark (which is NBD, to be clear)

@michaelkirk
Copy link
Member Author

Huh wild that editing the decode path would slow down the encode benchmarks, but that seems to be the name of the game.

I just ran the benchmarks on my x86_64 linux machine and see no change good or bad.

So, I'm inclined to say this is "probably no different" from a performance perspective, and slightly clearer from a code readability perspective, so I'm still inclined to merge it if you're OK with it.

@michaelkirk michaelkirk added this pull request to the merge queue May 11, 2024
Merged via the queue into main with commit b3a3c56 May 11, 2024
1 check passed
This was referenced May 11, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants