-
Notifications
You must be signed in to change notification settings - Fork 75
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimize frame stats collection #151
Conversation
9247a73
to
a22a92a
Compare
Awesome work! Can you perhaps share before/after of the puffin_viewer and the puffin_egui ui being used in our project? The flamegraph is great but find it hard to compare it in concrete numbers. Would like to see some before after milliseconds per call comparison over maybe 50 frames or something. |
11b0832
to
d1c6697
Compare
79cf6c0
to
da309d6
Compare
Bench suite results:
|
da309d6
to
327ac9c
Compare
Optimize UI performance by calculating frame collection statistics dynamically. Rather than repeatedly traversing each frame, we now update the stats upon adding or removing frames. This change mitigates the significant overhead caused by constantly accessing frame data protected by RwLock.
327ac9c
to
7eacaf9
Compare
This update modifies the data containers used for storing frames to adopt binary tree structures. As a result, all frames are sorted at the time of insertion, eliminating the need for subsequent sorting during vector collection. Moreover, instead of returning vectors, the binary tree structure enables iteration over elements in a sorted order, providing iterators rather than vectors. This functionality gives the API caller the flexibility to either clone frames or simply inspect them in an efficient manner.
7eacaf9
to
2f8f16f
Compare
The only relevant benchmark for the changes is |
888ac87
to
73ad24c
Compare
Cool, thanks for doing the bencharks and posting the stats. If the benchmark proves there is not to much overhead I'm happy to merge the code! Think its not a disaster if we make the flush frame slightly longer as at least its predictable. One can argue tho that viewing statistics is less important then profiler performance. |
This commit consolidates the packed and unpacked data from frame_data into a single enum. The goal of this change is to minimize the number of locks required when retrieving information about a frame. Previously, the process of reading the packing information for stats required four separate locks: 1. One to check 'has_unpacked' 2. Another to verify 'has_unpacked' within 'unpacked_size' 3. One to retrieve 'unpacked_size' 4. And finally, one to retrieve 'packed_size' With this optimization, the lock count is reduced to a single one, achieved through invoking the 'packing_info' function.
73ad24c
to
c497844
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks good now!
751538c
to
d9d38fe
Compare
d9d38fe
to
0831946
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good, can you please add information to the release notes about the changes before merging?
@andreiltd what happened here? :) Not relevant anymore after pivot? |
7d24cbf
to
c38b8ae
Compare
### Checklist * [x] I have read the [Contributor Guide](../CONTRIBUTING.md) * [x] I have read and agree to the [Code of Conduct](../CODE_OF_CONDUCT.md) * [x] I have added a description of my changes and why I'd like them included in the section below ### Description of Changes Optimize UI performance by calculating frame collection statistics dynamically. Rather than repeatedly traversing each frame, we now update the stats upon adding or removing frames. This change mitigates the significant overhead caused by constantly accessing frame data protected by `RwLock`. Also, this update modifies the data containers used for storing frames to adopt binary tree structures. As a result, all frames are sorted at the time of insertion, eliminating the need for subsequent sorting during vector collection. Moreover, instead of returning vectors, the binary tree structure enables iteration over elements in a sorted order, providing iterators rather than vectors. This functionality gives the API caller the flexibility to either clone frames or simply inspect them in an efficient manner. Before: ![before](https://github.com/EmbarkStudios/puffin/assets/7009786/e1472c1e-77e8-4845-beff-2d5a9bca0e1a) After: ![after](https://github.com/EmbarkStudios/puffin/assets/7009786/ea100b5a-ad21-4b7d-a0bd-45d918c656f5)
Checklist
Description of Changes
Optimize UI performance by calculating frame collection statistics dynamically. Rather than repeatedly traversing each frame, we now update the stats upon adding or removing frames. This change mitigates the significant overhead caused by constantly accessing frame data protected by
RwLock
.Also, this update modifies the data containers used for storing frames to adopt binary tree structures. As a result, all frames are sorted at the time of insertion, eliminating the need for subsequent sorting during vector collection.
Moreover, instead of returning vectors, the binary tree structure enables iteration over elements in a sorted order, providing iterators rather than vectors. This functionality gives the API caller the flexibility to either clone frames or simply inspect them in an efficient manner.
Before:
After: