Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize frame stats collection #151

Merged
merged 5 commits into from
Jul 31, 2024
Merged

Optimize frame stats collection #151

merged 5 commits into from
Jul 31, 2024

Conversation

andreiltd
Copy link
Member

@andreiltd andreiltd commented Jun 14, 2023

Checklist

  • I have read the Contributor Guide
  • I have read and agree to the Code of Conduct
  • I have added a description of my changes and why I'd like them included in the section below

Description of Changes

Optimize UI performance by calculating frame collection statistics dynamically. Rather than repeatedly traversing each frame, we now update the stats upon adding or removing frames. This change mitigates the significant overhead caused by constantly accessing frame data protected by RwLock.

Also, this update modifies the data containers used for storing frames to adopt binary tree structures. As a result, all frames are sorted at the time of insertion, eliminating the need for subsequent sorting during vector collection.

Moreover, instead of returning vectors, the binary tree structure enables iteration over elements in a sorted order, providing iterators rather than vectors. This functionality gives the API caller the flexibility to either clone frames or simply inspect them in an efficient manner.

Before:
before

After:
after

@TimonPost
Copy link
Member

TimonPost commented Jun 15, 2023

Awesome work! Can you perhaps share before/after of the puffin_viewer and the puffin_egui ui being used in our project? The flamegraph is great but find it hard to compare it in concrete numbers. Would like to see some before after milliseconds per call comparison over maybe 50 frames or something.

@andreiltd
Copy link
Member Author

andreiltd commented Jun 16, 2023

Sure! Here is the diff at about 10k frames collected over all visible frames (~130 frames)

Before:
puffin-before-at-10k-frames

After:
after-after

@andreiltd andreiltd marked this pull request as draft June 16, 2023 10:04
@andreiltd andreiltd force-pushed the optimize-stats-computation branch 2 times, most recently from 11b0832 to d1c6697 Compare June 19, 2023 18:08
@andreiltd andreiltd marked this pull request as ready for review June 19, 2023 18:19
@andreiltd andreiltd marked this pull request as draft June 21, 2023 07:21
@andreiltd andreiltd force-pushed the optimize-stats-computation branch 2 times, most recently from 79cf6c0 to da309d6 Compare June 21, 2023 16:56
@andreiltd
Copy link
Member Author

andreiltd commented Jun 22, 2023

Bench suite results:

     Running benches/benchmark.rs (target/release/deps/benchmark-901adb2054ed4cfe)
Gnuplot not found, using plotters backend
profile_function        time:   [94.230 ns 94.448 ns 94.693 ns]
                        change: [-2.6028% -2.2195% -1.8438%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 10 outliers among 100 measurements (10.00%)
  7 (7.00%) high mild
  3 (3.00%) high severe

profile_function_data   time:   [94.077 ns 94.285 ns 94.511 ns]
                        change: [-3.3842% -3.0679% -2.7380%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 7 outliers among 100 measurements (7.00%)
  7 (7.00%) high mild

profile_scope           time:   [55.632 ns 55.707 ns 55.784 ns]
                        change: [-0.5181% -0.3122% -0.0932%] (p = 0.01 < 0.05)
                        Change within noise threshold.
Found 3 outliers among 100 measurements (3.00%)
  3 (3.00%) high mild

profile_scope_data      time:   [55.965 ns 56.044 ns 56.124 ns]
                        change: [-0.4706% -0.0480% +0.2720%] (p = 0.83 > 0.05)
                        No change in performance detected.
Found 4 outliers among 100 measurements (4.00%)
  1 (1.00%) low mild
  3 (3.00%) high mild

flush_frames            time:   [780.93 ns 782.89 ns 784.95 ns]
                        change: [-1.0638% -0.4959% +0.0945%] (p = 0.10 > 0.05)
                        No change in performance detected.
Found 7 outliers among 100 measurements (7.00%)
  4 (4.00%) high mild
  3 (3.00%) high severe

profile_function_off    time:   [1.0552 ns 1.0552 ns 1.0553 ns]
                        change: [-0.2772% -0.2522% -0.2273%] (p = 0.00 < 0.05)
                        Change within noise threshold.
Found 8 outliers among 100 measurements (8.00%)
  2 (2.00%) high mild
  6 (6.00%) high severe

profile_function_data_off
                        time:   [1.2785 ns 1.2855 ns 1.2919 ns]
                        change: [+2.1986% +3.4223% +4.6692%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 9 outliers among 100 measurements (9.00%)
  5 (5.00%) low severe
  4 (4.00%) low mild

profile_scope_off       time:   [1.0552 ns 1.0553 ns 1.0553 ns]
                        change: [-0.0085% +0.0226% +0.0564%] (p = 0.20 > 0.05)
                        No change in performance detected.
Found 10 outliers among 100 measurements (10.00%)
  4 (4.00%) high mild
  6 (6.00%) high severe

profile_scope_data_off  time:   [1.0552 ns 1.0553 ns 1.0553 ns]
                        change: [-23.214% -22.525% -21.869%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 6 outliers among 100 measurements (6.00%)
  1 (1.00%) high mild
  5 (5.00%) high severe

@andreiltd andreiltd marked this pull request as ready for review June 22, 2023 06:59
Optimize UI performance by calculating frame collection statistics
dynamically. Rather than repeatedly traversing each frame, we now update
the stats upon adding or removing frames. This change mitigates the
significant overhead caused by constantly accessing frame data protected
by RwLock.
This update modifies the data containers used for storing frames to
adopt binary tree structures. As a result, all frames are sorted at the
time of insertion, eliminating the need for subsequent sorting during
vector collection.

Moreover, instead of returning vectors, the binary tree structure
enables iteration over elements in a sorted order, providing iterators
rather than vectors. This functionality gives the API caller the
flexibility to either clone frames or simply inspect them in an
efficient manner.
@andreiltd
Copy link
Member Author

The only relevant benchmark for the changes is flush_frames. We could expect flush_frames performance to regress due to added complexity, specifically caching stats on add_frame calls but the benches show no change in sending frames performance.

@TimonPost
Copy link
Member

Cool, thanks for doing the bencharks and posting the stats. If the benchmark proves there is not to much overhead I'm happy to merge the code! Think its not a disaster if we make the flush frame slightly longer as at least its predictable. One can argue tho that viewing statistics is less important then profiler performance.

This commit consolidates the packed and unpacked data from frame_data
into a single enum. The goal of this change is to minimize the number of
locks required when retrieving information about a frame.

Previously, the process of reading the packing information for stats
required four separate locks:
 1. One to check 'has_unpacked'
 2. Another to verify 'has_unpacked' within 'unpacked_size'
 3. One to retrieve 'unpacked_size'
 4. And finally, one to retrieve 'packed_size'

With this optimization, the lock count is reduced to a single one,
achieved through invoking the 'packing_info' function.
Copy link
Member

@TimonPost TimonPost left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good now!

Copy link
Contributor

@Hoodad Hoodad left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good, can you please add information to the release notes about the changes before merging?

@kondrak
Copy link
Member

kondrak commented Jul 8, 2024

@andreiltd what happened here? :) Not relevant anymore after pivot?

@andreiltd
Copy link
Member Author

andreiltd commented Jul 8, 2024

@kondrak I will add an entry to changelog as requested by @Hoodad and it should be ready to go.

Edit: Done ✔️

@andreiltd andreiltd requested a review from Hoodad July 8, 2024 11:32
@emilk emilk merged commit 40f2579 into main Jul 31, 2024
6 checks passed
@emilk emilk deleted the optimize-stats-computation branch July 31, 2024 09:18
rib pushed a commit that referenced this pull request Aug 21, 2024
### Checklist

* [x] I have read the [Contributor Guide](../CONTRIBUTING.md)
* [x] I have read and agree to the [Code of
Conduct](../CODE_OF_CONDUCT.md)
* [x] I have added a description of my changes and why I'd like them
included in the section below

### Description of Changes

Optimize UI performance by calculating frame collection statistics
dynamically. Rather than repeatedly traversing each frame, we now update
the stats upon adding or removing frames. This change mitigates the
significant overhead caused by constantly accessing frame data protected
by `RwLock`.

Also, this update modifies the data containers used for storing frames
to adopt binary tree structures. As a result, all frames are sorted at
the time of insertion, eliminating the need for subsequent sorting
during vector collection.

Moreover, instead of returning vectors, the binary tree structure
enables iteration over elements in a sorted order, providing iterators
rather than vectors. This functionality gives the API caller the
flexibility to either clone frames or simply inspect them in an
efficient manner.

Before:

![before](https://github.com/EmbarkStudios/puffin/assets/7009786/e1472c1e-77e8-4845-beff-2d5a9bca0e1a)

After:

![after](https://github.com/EmbarkStudios/puffin/assets/7009786/ea100b5a-ad21-4b7d-a0bd-45d918c656f5)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants