Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

flexpolyline bench #46

Merged
merged 3 commits into from
May 13, 2024
Merged

Conversation

michaelkirk
Copy link
Member

@michaelkirk michaelkirk commented May 8, 2024

update: merged! Based on #45, so merge that first (Sorry for the long chain of PR's. )

FIXES #35

(actually it just demonstrates that it's been addressed by previous work by me and @mattiZed)

flexpolyline is 20% slower for encode

encode 10_000 coordinates at precision 1e-5
                        time:   [105.71 µs 105.83 µs 105.97 µs]

encode 10_000 coordinates at precision 1e-5 (flexpolyline)
                        time:   [126.90 µs 127.64 µs 128.37 µs]

flexpolyline is 12% slower for decode

decode 10_000 coordinates at precision 1e-5
                        time:   [88.922 µs 89.823 µs 90.648 µs]

decode 10_000 coordinates at precision 1e-5 (flexpolyline)
                        time:   [99.103 µs 100.57 µs 102.29 µs]

These aren't very rigorous benchmarks. In fact, I'd be OK with not merging them at all, but it might be a good thing to keep around in case we break something and become 10x slower than flexpolyline.

@michaelkirk michaelkirk marked this pull request as draft May 8, 2024 22:17
@michaelkirk michaelkirk force-pushed the mkirk/flexpolyline-bench branch 2 times, most recently from e92c541 to a3a4028 Compare May 9, 2024 15:48
@michaelkirk michaelkirk marked this pull request as ready for review May 9, 2024 15:57
@michaelkirk
Copy link
Member Author

michaelkirk commented May 11, 2024

This might be an interesting test case @urschrei.

Before #48, polyline encode was faster for me than flexpolyline. Now it's slower:

Edit: this is on M1 aarch64

encode 10_000 coordinates at precision 1e-5
                        time:   [180.62 µs 181.02 µs 181.46 µs]
encode 10_000 coordinates at precision 1e-5 (flexpolyline)
                        time:   [126.00 µs 127.00 µs 127.99 µs]

Decode is still faster (actually even faster than it was before!)

decode 10_000 coordinates at precision 1e-5
                        time:   [82.541 µs 83.465 µs 84.310 µs]
decode 10_000 coordinates at precision 1e-5 (flexpolyline)
                        time:   [103.50 µs 105.19 µs 106.91 µs]

Edit: on x86_64 polyline is still a little faster than flexpolyline for both encode and decode

encode 10_000 coordinates at precision 1e-5
                        time:   [195.25 µs 195.73 µs 196.28 µs]
encode 10_000 coordinates at precision 1e-5 (flexpolyline)
                        time:   [215.19 µs 215.63 µs 216.15 µs]

decode 10_000 coordinates at precision 1e-5
                        time:   [132.47 µs 132.91 µs 133.44 µs]
decode 10_000 coordinates at precision 1e-5 (flexpolyline)
                        time:   [172.45 µs 174.28 µs 176.26 µs]

@urschrei
Copy link
Member

flexbase (before #48) vs flex (this pr rebased against #48):

group                                                         flex                                   flexbase
-----                                                         ----                                   --------
decode 10_000 coordinates at precision 1e-5                   1.00     71.1±2.01µs        ? ?/sec    1.15     81.5±1.90µs        ? ?/sec
decode 10_000 coordinates at precision 1e-5 (flexpolyline)    1.01     89.8±2.64µs        ? ?/sec    1.00     89.3±1.47µs        ? ?/sec
decode 10_000 coordinates at precision 1e-6                   1.00     88.6±2.11µs        ? ?/sec    1.13    100.5±1.55µs        ? ?/sec
encode 10_000 coordinates at precision 1e-5                   1.00     84.1±1.62µs        ? ?/sec    1.17     98.2±0.99µs        ? ?/sec
encode 10_000 coordinates at precision 1e-5 (flexpolyline)    1.03    119.0±4.42µs        ? ?/sec    1.00    115.6±1.72µs        ? ?/sec
encode 10_000 coordinates at precision 1e-6                   1.00    100.6±4.71µs        ? ?/sec    1.18    118.4±1.87µs        ? ?/sec

So if I'm reading this correctly #48 is still giving me across-the-board improvements, we're faster than flexpolyline, and the flexpolyline tests have some noise (up to 3 %).

@michaelkirk
Copy link
Member Author

It sounds like you live in the kind of sane world I would like to live in.

Not sure if you saw my edit above, but I saw behavior similar to yours on my x86/64 machine.

I'm becoming more and more convinced that I am inhabiting some weird M1 edge case, and we shouldn't spend more time trying to figure it out.

@urschrei
Copy link
Member

And after rebasing against #49:

group                                                         flex                                   flexbase
-----                                                         ----                                   --------
decode 10_000 coordinates at precision 1e-5                   1.00     69.9±2.34µs        ? ?/sec    1.17     81.5±1.90µs        ? ?/sec
decode 10_000 coordinates at precision 1e-5 (flexpolyline)    1.00     89.1±1.54µs        ? ?/sec    1.00     89.3±1.47µs        ? ?/sec
decode 10_000 coordinates at precision 1e-6                   1.00     86.8±2.68µs        ? ?/sec    1.16    100.5±1.55µs        ? ?/sec
encode 10_000 coordinates at precision 1e-5                   1.00     83.9±0.79µs        ? ?/sec    1.17     98.2±0.99µs        ? ?/sec
encode 10_000 coordinates at precision 1e-5 (flexpolyline)    1.00    114.8±0.80µs        ? ?/sec    1.01    115.6±1.72µs        ? ?/sec
encode 10_000 coordinates at precision 1e-6                   1.00     96.3±2.02µs        ? ?/sec    1.23    118.4±1.87µs        ? ?/sec

another small improvement on my machine.

@urschrei
Copy link
Member

I am highly confused. I think we should merge this (because I think the flexpolyline comparisons are valuable), and solicit benchmarking runs from some other people with M1/2/3 machines to see whether anything emerges.

@urschrei urschrei self-requested a review May 11, 2024 21:25
@michaelkirk michaelkirk added this pull request to the merge queue May 13, 2024
Merged via the queue into georust:main with commit bb76082 May 13, 2024
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

perf gap with flexpolyline
2 participants