-
Notifications
You must be signed in to change notification settings - Fork 514
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
perf(profiling): Tune the sample profile generation code for performance #1694
perf(profiling): Tune the sample profile generation code for performance #1694
Conversation
Some benchmarks I collected comparing the before and afters using a few arbitrary profiles.
From this testing, this looks to be consistently faster than what it used to be, and I have no reason to believe it's slower in any case. |
Side note: The "daphne" errors above in the Django tests was fixed in master, to if you update your branches they should be gone. |
We noticed that generating the sample format at the end of a profile can get rather slow and this aims to improve what we can here with minimal changes. A few things we took advantage of to accomplish this: - Turning the extracted stack into a tuple so it is hashable so it can be used as a dictionary key. This let's us check if the stack is indexed first, and skip indexing the frames again. This is especially effective in profiles where it's blocking on a network request for example, since there will be many identical stacks. - Using the hash of the stack as the dictionary key. Hashing the entire stack can be an expensive operation since a stack can have up to 128 frames. Using it as a dictionary key means it needs to be rehashed each time. To avoid this, we pre-hash the stack and use the hash as a dictionary key which is more efficient. - Convert numbers to strings ahead of time if we know have to. Values like the tid and elapsed since start ns needs to be sent as a string. However, many samples share the same value for it, and we're doing the conversion each time. Instead, we convert them to a string upfront and reuse it as needed in order to minimize unnecessary calculations.
fba25ec
to
3b316c2
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice numbers 🎉
…ion-code-for-performance
We noticed that generating the sample format at the end of a profile can get rather slow and this aims to improve what we can here with minimal changes. A few things we took advantage of to accomplish this: