Optimize frame stats collection #151

andreiltd · 2023-06-14T13:57:29Z

Checklist

I have read the Contributor Guide
I have read and agree to the Code of Conduct
I have added a description of my changes and why I'd like them included in the section below

Description of Changes

Optimize UI performance by calculating frame collection statistics dynamically. Rather than repeatedly traversing each frame, we now update the stats upon adding or removing frames. This change mitigates the significant overhead caused by constantly accessing frame data protected by RwLock.

Also, this update modifies the data containers used for storing frames to adopt binary tree structures. As a result, all frames are sorted at the time of insertion, eliminating the need for subsequent sorting during vector collection.

Moreover, instead of returning vectors, the binary tree structure enables iteration over elements in a sorted order, providing iterators rather than vectors. This functionality gives the API caller the flexibility to either clone frames or simply inspect them in an efficient manner.

Before:

After:

TimonPost · 2023-06-15T17:33:42Z

Awesome work! Can you perhaps share before/after of the puffin_viewer and the puffin_egui ui being used in our project? The flamegraph is great but find it hard to compare it in concrete numbers. Would like to see some before after milliseconds per call comparison over maybe 50 frames or something.

andreiltd · 2023-06-16T09:49:15Z

Sure! Here is the diff at about 10k frames collected over all visible frames (~130 frames)

Before:

After:

andreiltd · 2023-06-22T06:56:12Z

Bench suite results:

     Running benches/benchmark.rs (target/release/deps/benchmark-901adb2054ed4cfe)
Gnuplot not found, using plotters backend
profile_function        time:   [94.230 ns 94.448 ns 94.693 ns]
                        change: [-2.6028% -2.2195% -1.8438%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 10 outliers among 100 measurements (10.00%)
  7 (7.00%) high mild
  3 (3.00%) high severe

profile_function_data   time:   [94.077 ns 94.285 ns 94.511 ns]
                        change: [-3.3842% -3.0679% -2.7380%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 7 outliers among 100 measurements (7.00%)
  7 (7.00%) high mild

profile_scope           time:   [55.632 ns 55.707 ns 55.784 ns]
                        change: [-0.5181% -0.3122% -0.0932%] (p = 0.01 < 0.05)
                        Change within noise threshold.
Found 3 outliers among 100 measurements (3.00%)
  3 (3.00%) high mild

profile_scope_data      time:   [55.965 ns 56.044 ns 56.124 ns]
                        change: [-0.4706% -0.0480% +0.2720%] (p = 0.83 > 0.05)
                        No change in performance detected.
Found 4 outliers among 100 measurements (4.00%)
  1 (1.00%) low mild
  3 (3.00%) high mild

flush_frames            time:   [780.93 ns 782.89 ns 784.95 ns]
                        change: [-1.0638% -0.4959% +0.0945%] (p = 0.10 > 0.05)
                        No change in performance detected.
Found 7 outliers among 100 measurements (7.00%)
  4 (4.00%) high mild
  3 (3.00%) high severe

profile_function_off    time:   [1.0552 ns 1.0552 ns 1.0553 ns]
                        change: [-0.2772% -0.2522% -0.2273%] (p = 0.00 < 0.05)
                        Change within noise threshold.
Found 8 outliers among 100 measurements (8.00%)
  2 (2.00%) high mild
  6 (6.00%) high severe

profile_function_data_off
                        time:   [1.2785 ns 1.2855 ns 1.2919 ns]
                        change: [+2.1986% +3.4223% +4.6692%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 9 outliers among 100 measurements (9.00%)
  5 (5.00%) low severe
  4 (4.00%) low mild

profile_scope_off       time:   [1.0552 ns 1.0553 ns 1.0553 ns]
                        change: [-0.0085% +0.0226% +0.0564%] (p = 0.20 > 0.05)
                        No change in performance detected.
Found 10 outliers among 100 measurements (10.00%)
  4 (4.00%) high mild
  6 (6.00%) high severe

profile_scope_data_off  time:   [1.0552 ns 1.0553 ns 1.0553 ns]
                        change: [-23.214% -22.525% -21.869%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 6 outliers among 100 measurements (6.00%)
  1 (1.00%) high mild
  5 (5.00%) high severe

Optimize UI performance by calculating frame collection statistics dynamically. Rather than repeatedly traversing each frame, we now update the stats upon adding or removing frames. This change mitigates the significant overhead caused by constantly accessing frame data protected by RwLock.

This update modifies the data containers used for storing frames to adopt binary tree structures. As a result, all frames are sorted at the time of insertion, eliminating the need for subsequent sorting during vector collection. Moreover, instead of returning vectors, the binary tree structure enables iteration over elements in a sorted order, providing iterators rather than vectors. This functionality gives the API caller the flexibility to either clone frames or simply inspect them in an efficient manner.

andreiltd · 2023-06-23T12:01:50Z

The only relevant benchmark for the changes is flush_frames. We could expect flush_frames performance to regress due to added complexity, specifically caching stats on add_frame calls but the benches show no change in sending frames performance.

TimonPost · 2023-06-26T07:38:34Z

Cool, thanks for doing the bencharks and posting the stats. If the benchmark proves there is not to much overhead I'm happy to merge the code! Think its not a disaster if we make the flush frame slightly longer as at least its predictable. One can argue tho that viewing statistics is less important then profiler performance.

This commit consolidates the packed and unpacked data from frame_data into a single enum. The goal of this change is to minimize the number of locks required when retrieving information about a frame. Previously, the process of reading the packing information for stats required four separate locks: 1. One to check 'has_unpacked' 2. Another to verify 'has_unpacked' within 'unpacked_size' 3. One to retrieve 'unpacked_size' 4. And finally, one to retrieve 'packed_size' With this optimization, the lock count is reduced to a single one, achieved through invoking the 'packing_info' function.

TimonPost

looks good now!

Hoodad

This looks good, can you please add information to the release notes about the changes before merging?

kondrak · 2024-07-08T11:18:34Z

@andreiltd what happened here? :) Not relevant anymore after pivot?

andreiltd · 2024-07-08T11:27:17Z

@kondrak I will add an entry to changelog as requested by @Hoodad and it should be ready to go.

Edit: Done ✔️

### Checklist * [x] I have read the [Contributor Guide](../CONTRIBUTING.md) * [x] I have read and agree to the [Code of Conduct](../CODE_OF_CONDUCT.md) * [x] I have added a description of my changes and why I'd like them included in the section below ### Description of Changes Optimize UI performance by calculating frame collection statistics dynamically. Rather than repeatedly traversing each frame, we now update the stats upon adding or removing frames. This change mitigates the significant overhead caused by constantly accessing frame data protected by `RwLock`. Also, this update modifies the data containers used for storing frames to adopt binary tree structures. As a result, all frames are sorted at the time of insertion, eliminating the need for subsequent sorting during vector collection. Moreover, instead of returning vectors, the binary tree structure enables iteration over elements in a sorted order, providing iterators rather than vectors. This functionality gives the API caller the flexibility to either clone frames or simply inspect them in an efficient manner. Before: ![before](https://github.com/EmbarkStudios/puffin/assets/7009786/e1472c1e-77e8-4845-beff-2d5a9bca0e1a) After: ![after](https://github.com/EmbarkStudios/puffin/assets/7009786/ea100b5a-ad21-4b7d-a0bd-45d918c656f5)

andreiltd requested review from emilk and TimonPost as code owners June 14, 2023 13:57

andreiltd force-pushed the optimize-stats-computation branch from 9247a73 to a22a92a Compare June 15, 2023 06:45

andreiltd marked this pull request as draft June 16, 2023 10:04

andreiltd force-pushed the optimize-stats-computation branch 2 times, most recently from 11b0832 to d1c6697 Compare June 19, 2023 18:08

andreiltd marked this pull request as ready for review June 19, 2023 18:19

andreiltd marked this pull request as draft June 21, 2023 07:21

andreiltd force-pushed the optimize-stats-computation branch 2 times, most recently from 79cf6c0 to da309d6 Compare June 21, 2023 16:56

andreiltd marked this pull request as ready for review June 22, 2023 06:59

andreiltd force-pushed the optimize-stats-computation branch from da309d6 to 327ac9c Compare June 22, 2023 07:17

andreiltd force-pushed the optimize-stats-computation branch from 327ac9c to 7eacaf9 Compare June 22, 2023 12:26

andreiltd force-pushed the optimize-stats-computation branch from 7eacaf9 to 2f8f16f Compare June 22, 2023 14:07

andreiltd force-pushed the optimize-stats-computation branch from 888ac87 to 73ad24c Compare June 23, 2023 12:13

andreiltd force-pushed the optimize-stats-computation branch from 73ad24c to c497844 Compare June 26, 2023 11:59

TimonPost approved these changes Jun 26, 2023

View reviewed changes

andreiltd force-pushed the optimize-stats-computation branch from 751538c to d9d38fe Compare June 28, 2024 09:44

Merge branch 'main' into optimize-stats-computation

0831946

andreiltd force-pushed the optimize-stats-computation branch from d9d38fe to 0831946 Compare June 28, 2024 09:56

Hoodad requested changes Jul 3, 2024

View reviewed changes

Build SelectedFrames from iterator

c38b8ae

andreiltd force-pushed the optimize-stats-computation branch from 7d24cbf to c38b8ae Compare July 8, 2024 11:32

andreiltd requested a review from Hoodad July 8, 2024 11:32

emilk merged commit 40f2579 into main Jul 31, 2024
6 checks passed

emilk deleted the optimize-stats-computation branch July 31, 2024 09:18

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimize frame stats collection #151

Optimize frame stats collection #151

andreiltd commented Jun 14, 2023 •

edited

Loading

TimonPost commented Jun 15, 2023 •

edited

Loading

andreiltd commented Jun 16, 2023 •

edited

Loading

andreiltd commented Jun 22, 2023 •

edited

Loading

andreiltd commented Jun 23, 2023

TimonPost commented Jun 26, 2023

TimonPost left a comment

Hoodad left a comment

kondrak commented Jul 8, 2024

andreiltd commented Jul 8, 2024 •

edited

Loading

Optimize frame stats collection #151

Optimize frame stats collection #151

Conversation

andreiltd commented Jun 14, 2023 • edited Loading

Checklist

Description of Changes

TimonPost commented Jun 15, 2023 • edited Loading

andreiltd commented Jun 16, 2023 • edited Loading

andreiltd commented Jun 22, 2023 • edited Loading

andreiltd commented Jun 23, 2023

TimonPost commented Jun 26, 2023

TimonPost left a comment

Choose a reason for hiding this comment

Hoodad left a comment

Choose a reason for hiding this comment

kondrak commented Jul 8, 2024

andreiltd commented Jul 8, 2024 • edited Loading

andreiltd commented Jun 14, 2023 •

edited

Loading

TimonPost commented Jun 15, 2023 •

edited

Loading

andreiltd commented Jun 16, 2023 •

edited

Loading

andreiltd commented Jun 22, 2023 •

edited

Loading

andreiltd commented Jul 8, 2024 •

edited

Loading