Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Transport causes CPU Spikes on Larger Envelopes #2384

Closed
mitsuhiko opened this issue Sep 20, 2023 · 5 comments · Fixed by #2449
Closed

Transport causes CPU Spikes on Larger Envelopes #2384

mitsuhiko opened this issue Sep 20, 2023 · 5 comments · Fixed by #2449

Comments

@mitsuhiko
Copy link
Member

mitsuhiko commented Sep 20, 2023

How do you use Sentry?

Sentry Saas (sentry.io)

Version

1.31

Steps to Reproduce

Sending this to Sentry causes the transport to create CPU spikes that are very visible in our kubernetes pods.

from sentry_sdk import capture_message, push_scope

for _ in range(50):
    with push_scope() as scope:
        scope.add_attachment(bytes=b"Hello World! " * 10000, filename="attachment.txt")
        capture_message('Hello, hello.')

It also happens if gzip compression is disabled. It appears to be related to how capture_envelope works within the transport.

Expected Result

I would not expect the CPU usage to spike.

Actual Result

It spikes.

@sl0thentr0py
Copy link
Member

possible urllib3 stuff
psf/requests#1910

@antonpirker
Copy link
Member

antonpirker commented Oct 10, 2023

I did some profiling today: (because sending is done in a worker thread, I profiled them on my local machine because in sentry we sould have not enough samples)

Profling setup:

I used profiling_sample_rate=1.0 in the sentry_sdk.init() for profiling. Additionally I patched
I profiled the background workers thread .run() method and dump the data on flush(): https://github.com/getsentry/sentry-python/pull/2439/files

a) send 10 envelopes with big attachments (10000 hello worlds):
https://sentry-sdks.sentry.io/profiling/profile/sentry-python/6b7fc1ca0e30427cbf9d76ff2e1998dd/flamegraph/

10_10000

b) send 50 envelopes with big attachments (10000 hello worlds):
https://sentry-sdks.sentry.io/profiling/profile/sentry-python/d446df68dd324870a5d0fb416a29e29b/flamegraph/
50_10000

c) send 50 envelopes with small attachments (10 hello worlds):
https://sentry-sdks.sentry.io/profiling/profile/sentry-python/ac9a8a1ae4f648a582ce375fe6e1e474/flamegraph/

50_10

So there is nothing that jumps out. Most of the time is spent in the http lib (which was kind of our guess). I tried to disable SSL certificate verification, but this made not difference at all.

One thing of not is that sending 10 envelopes takes around 350ms and sending 50 envelopes takes 2500ms (no matter what size the attachments) So that is not linear growth there.

We could create an AsyncHttpTransport (swapping out urllib3 with aiohttp) to see if the result changes drastically. But this is not done in a day.

@antonpirker
Copy link
Member

Another note. I also tracked the cpu utilization of my test process. It maxed out at between 54% (10 small envelopes) 56% (10 large envelopes), and 61% (50 large envelopes). So I could not reproduce a cup utilization spike.

@antonpirker
Copy link
Member

Ok, after same debugging I did not find the root cause, so I also consulted with the admins of the urllib3 discord and one thing the immediately told us that we should increase the pool size. Another thing was that decoding TLS traffic uses cpu but there is a feature to turn it of. This is what I did.

It did not got rid of all the CPU spikes in my tests setup but instead of 1 in 3-4 runs, the cpu only spikes in 1 in 7-8 runs. So it is better now.

@antonpirker
Copy link
Member

Another thing of notice: OpenSSL 3 has performance problems: python/cpython#95031
Our server still has OpenSSL 1.1.1 so we are not affected, but when we update, we will. (and other customers might use OpenSSL 3 and are affected by those performance problems)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants