Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Profiles will not work with a traces_sampler instead of traces_sample_rate #1813

Closed
Daverball opened this issue Dec 29, 2022 · 28 comments
Closed
Assignees

Comments

@Daverball
Copy link
Contributor

Daverball commented Dec 29, 2022

How do you use Sentry?

Sentry Saas (sentry.io)

Version

1.12.1

Steps to Reproduce

  1. Pass a TracesSampler instead of setting traces_sample_rate
  2. Set _experiments['profiles_sample_rate'] to 1.0

Expected Result

Every request with a trace also has a corresponding profile

Actual Result

No profiles are being recorded

P.S. From glancing at the code it doesn't look like there is a hard dependency on the more "dumb" traces_sample_rate over a TracesSampler. It's just that init checks for traces_sample_rate instead of doing the more thorough check of has_tracing_enabled which will check either option.

As a workaround I seem to be able to just pass a dummy fixed sampling rate, since the TracesSampler always takes priority.

P.P.S. Looks like the workaround might not actually work, since I'm still seeing no profiles for traced transactions. Unless they take a while to process. But I will try some alternative sampling methods for the profiler before giving up.

@Daverball
Copy link
Contributor Author

As a side note: I am using gevent which has caused issues with the default HTTPTransport in the past, so I'm currently using a manually patched HTTPTransport that does work (It's possible that this workaround is no longer necessary, but seeing as traces and exceptions are still being sent to the API just fine I'm leaving it as it is for now)

It's possible the workaround does not work for me due to gevent, but seeing as the code sentry's profiler was based off was originally written to be used with gevent I hope that's not the case.

@Daverball
Copy link
Contributor Author

I didn't manage to make the workaround work using any of the timing methods. Some debug logging statements (like with tracing) would be helpful to know whether or not profiling has been attempted at all.

Also on a more general note: Have you considered integrating with an existing profiler that is both more feature complete and imposes less of a performance overhead? I specifically have scalene in mind, which can also do memory debugging and can measure how much time is spent in native code as opposed to the Python interpreter (e.g. for C extensions) and will work with multiprocessing (i.e. most of the profiling work could then run on a separate CPU thread, without affecting the main application).

It might be a little bit more difficult to achieve though, seeing as scalene has to hook into the application very early on, before anything else gets loaded to achieve its low overhead. But making the profiler as accurate and feature complete as possible should be a win for everybody.

@sl0thentr0py
Copy link
Member

@Daverball profiling is currently only on for WSGI, ASGI support has an open pull request. What framework/server are you using?

@Daverball
Copy link
Contributor Author

@sl0thentr0py I'm using my own framework (unfortunately I currently cannot share the source code), but it is WSGI and using the SentryWsgiMidleware along with a couple of custom hooks to extract information from the request and any SQL queries.

@Zylphrex
Copy link
Member

Zylphrex commented Jan 3, 2023

As a side note: I am using gevent which has caused issues with the default HTTPTransport in the past, so I'm currently using a manually patched HTTPTransport that does work (It's possible that this workaround is no longer necessary, but seeing as traces and exceptions are still being sent to the API just fine I'm leaving it as it is for now)

It's possible the workaround does not work for me due to gevent, but seeing as the code sentry's profiler was based off was originally written to be used with gevent I hope that's not the case.

@Daverball There are some gevent specific changes that need to be ported for the profiler to collect samples correctly that is being looked into but the profile should still be sent to sentry. Are you not seeing any profiles being sent at all? You can confirm this by enabling debug mode and looking for a debug log like

 [sentry] DEBUG: Sending envelope [envelope with 2 items (profile, transaction)] project:XXX host:oXXX.ingest.sentry.io

If there are envelopes being sent with 2 items and one of them being a profile, it is correctly being sent to Sentry. After that, you should check the stats page from the side bar in Sentry and look at the profiles category. It's possible that the profiles you're sending are being dropped, potentially due to them containing only a single sample. For context, if a profile being sent contains only a single sample (taken at ~10ms intervals), it will be rejected.

I didn't manage to make the workaround work using any of the timing methods. Some debug logging statements (like with tracing) would be helpful to know whether or not profiling has been attempted at all.

Yes, this is something that will be added to help with debugging.

Also on a more general note: Have you considered integrating with an existing profiler that is both more feature complete and imposes less of a performance overhead? I specifically have scalene in mind, which can also do memory debugging and can measure how much time is spent in native code as opposed to the Python interpreter (e.g. for C extensions) and will work with multiprocessing (i.e. most of the profiling work could then run on a separate CPU thread, without affecting the main application).

It might be a little bit more difficult to achieve though, seeing as scalene has to hook into the application very early on, before anything else gets loaded to achieve its low overhead. But making the profiler as accurate and feature complete as possible should be a win for everybody.

We have looked into existing profilers but many of them are not very suitable for production profiling as they induce a rather large overhead. Scalene seems to use a similar approach we use here. This current implementation of the profiler has deviated quite significantly from what we based it on and looking at Scalene, it seems that they are actually similar in the way CPU profiling is done (I'll need to look at it in more detail to see how they differ). Our implementation is also tuned to better suit the way Sentry process profiles server side so we likely will not be integrating existing profilers directly into the SDK but in terms of overhead, we are aiming for ~5%-10% in most cases.

@Daverball
Copy link
Contributor Author

It's possible that the profiles you're sending are being dropped, potentially due to them containing only a single sample. For context, if a profile being sent contains only a single sample (taken at ~10ms intervals), it will be rejected.
@Zylphrex That sounds like that's probably why then, because I was testing on a view that takes <20ms. So even if the samples were collected correctly, there would only be one sample. I don't think I was seeing envelopes with 2 items though, but I will try it again when I find the time (probably this weekend at the earliest).

10ms as a sampling interval seems a bit high. I was expecting something in the neighbourhood of 1-2ms, but I suppose part of the theory is that aggregating a bunch of profiles with a low sampling rate given enough requests will eventually yield a profile that is just as detailed and accurate as one taken at a higher sampling rate, although I'm not fully convinced that still is true if you go below a certain sampling rate since the likelihood of missing certain parts of the code completely increases the larger the gap between samples is, although in most cases those parts probably wouldn't contribute to the total runtime in any significant way anyways. But I will have to play around with it myself and compare it with scalene's results to find out.

@Zylphrex
Copy link
Member

Zylphrex commented Jan 3, 2023

although in most cases those parts probably wouldn't contribute to the total runtime in any significant way anyways

This is precisely the reason why we started with a 10ms sampling interval. At the current 10ms interval, we've already seen success with identifying bottlenecks down to lines of code.

It's definitely possible that some of the more shorter calls will be missed but to be able to sample at 1-2ms intervals in production, the overhead naturally will increase. So that's not something that we'll be changing in the near future, but that may change in the future.

@github-actions
Copy link

This issue has gone three weeks without activity. In another week, I will close it.

But! If you comment or otherwise update it, I will reset the clock, and if you label it Status: Backlog or Status: In Progress, I will leave it alone ... forever!


"A weed is but an unloved flower." ― Ella Wheeler Wilcox 🥀

@Daverball
Copy link
Contributor Author

I managed to get one profile to show up as a complete fluke. The issue definitely is that the scheduler doesn't work correctly with gevent though, so for pretty much any view it just does not get enough chances to record a sample. I will have to upgrade to 1.14 and retest, now that gevent is supported properly.

That being said I think the originally reported issue has been fixed and the workaround should no longer be necessary. But I will test with 1.14 (as soon as I have the time) before closing the issue.

@Daverball
Copy link
Contributor Author

Daverball commented Jan 27, 2023

@Zylphrex Looks like even with 1.14 I'm not getting any profiles to show up. It is always sending an envelope with a profile but I guess I'm just not getting enough samples. Potentially the reason is the use of time.sleep in the GeventScheduler even though that may have been patched to gevent.sleep.

One potential problem with that is that the profiler thread won't release the GIL when it sleeps using gevent.sleep, only the native time.sleep releases the GIL. Now, it's possible that Gevent internally would release the GIL if all of the Greenlets are just waiting on IO, but I'm not sure whether that is the case. Either way you should use gevent.monkey.get_original to retrieve the unpatched time.sleep if you want the GeventScheduler to behave the same way as the ThreadScheduler.

That being said this doesn't really explain to me why I'm getting too few samples, even with gevent the interpreter should release the GIL every 5ms, so the sampling thread should get a chance to run as often as it takes samples. It's probably also a bit better to use the existing gevent thread pool, rather than creating a new instance, since I believe there are some caveats to using the ThreadPool that will be taken care of automatically if you use the existing instance: gevent.get_hub().threadpool

Somewhat unrelated question: Do you do anything to ensure that overlapping transactions and any other scheduled tasks running in other greenlets don't show up in the profile? You could do the same thing you do with Threads and just track the one Greenlet that was active when the transaction was started. You can still get at this specific greenlets frame even when it's not currently active when you're sampling: Greenlet.gr_frame. That would mean however that there's some jitter to the greenlet samples, since the frame may be from as far as 10ms back (or further, but you would be able to detect that, since it would be the same frame you've seen last time). But technically the same is true for threads if there's more threads running beyond the sampler and the thread you're profiling (although context switches would probably be less frequent). But I still think it's better to track the one greenlet if you want to match the behavior without gevent.

@Zylphrex
Copy link
Member

Good point about the time.sleep, I'll get that fixed up.

I can take a look at using the existing threadpool but that threadpool implementation can be user defined docs and we would permanently take up one of the threads by default so that may have other unintended side effects.

Do you do anything to ensure that overlapping transactions and any other scheduled tasks running in other greenlets don't show up in the profile?

Not at this time, one of the reasons being exactly what you're describing. It is definitely something to investigate more and improve on.

Zylphrex added a commit that referenced this issue Jan 27, 2023
As pointed out in #1813 (comment),
gevent patches the `time` module and `time.sleep` will only release the GIL if
there no other greenlets ready to run. This ensures that we always use the
builtin `time.sleep` and not the patched version provided by `gevent`.
Zylphrex added a commit that referenced this issue Jan 30, 2023
As pointed out in #1813 (comment),
gevent patches the `time` module and `time.sleep` will only release the GIL if
there no other greenlets ready to run. This ensures that we always use the
builtin `time.sleep` and not the patched version provided by `gevent`.
@Daverball
Copy link
Contributor Author

Did some quick testing again with 1.15.0. Looks like it still will not work, but at least it's being honest about it now, since it won't include the payload unless there's enough samples, so it's quicker for me to see if it works.

I've noticed that for testing the gevent code path you don't really seem to be patching anything. Ideally the whole test suite would run monkeypatched (theres pytest-gevent that provides a patched pytest runner, but you can also just use the monkey utility from gevent directly to invoke the pytest module in a fully patched interpreter) Otherwise you are not really testing much of anything, other than the imports not crashing and the functions doing what they're advertised.

If I find some time this weekend or next weekend I will probably make a local fork of the codebase to see if I can get it to work myself with a few tweaks. Would also be interesting to see if I can prototype something, so that all the greenlets get tracked like individual threads. Not sure whether it's possible without using the greenlet C API though.

It's probably also worth noting that I'm using uwsgi which is using the C API directly from its own C extension to spawn the greenlets, which means the parent frame doesn't really exist for those greenlets. I saw a little comment about that in the gevent codebase in some optional instrumentation code that tracks the greenlet call tree.

@Zylphrex
Copy link
Member

I wonder if your issue is actually with uwsgi and not gevent. I have noticed the profiler does not work well when there's a lot of native code in the application and uwsgi is mostly native code.

@Daverball
Copy link
Contributor Author

If anything it would be the combination of the two. We use uwsgi at work without gevent and the profiling works just fine there from what I can tell. I have seen the occasional profile that looked a little bit suspicious, but the vast majority look completely reasonable to me.

@Daverball
Copy link
Contributor Author

@Zylphrex Looks like something very strange is happening with the new_profiles deque. I've added some additional debug logging statements locally to get a better idea of what's happening, and it looks like even after the current profile is added to the deque the profiling thread keeps thinking that it is empty for the entire duration of the profile. The profiler does get run periodically before the profling stops again, but it can never pop the profile off the queue. I've tried replacing the deque with a Queue to see if there's some synchronization strangeness involved, but even with that the Queue is always reporting empty inside the sampler thread.

I'm not really sure how this is possible. The best idea I could come up with is that the instance of the scheduler that is running the sampler thread is not the same one that is used to start profiling. But id returns the same integer for both instances, so there goes that theory unless uwsgi does something really funky internally, like forking the interpreter once the application is setup, so not all the greenlets are running in the same instance of the interpreter that the sampler thread is running in.

@Zylphrex
Copy link
Member

Are you able to create a reproduction that you can share? I know you mentioned you use your own framework but perhaps it's you can reproduce it with uwsgi + gevent somehow? That help me a lot in debugging this.

@Daverball
Copy link
Contributor Author

Daverball commented Feb 12, 2023

repro.py

from gevent import monkey; monkey.patch_all()
import sentry_sdk
from sentry_sdk.integrations.wsgi import SentryWsgiMiddleware
from time import sleep

SENTRY_DSN = 'https://{}.ingest.sentry.io/{}'
sentry_sdk.init(
    dsn=SENTRY_DSN,
    environment='development',
    traces_sample_rate=1.0,
    _experiments={
        'profiles_sample_rate': 1.0,
    }
)


# Main WSGI application
def application(env, start_response):
    response = b'Hello world'
    for byte in response:
        sleep(0.1)

    start_response('200 OK', [
        ('Content-Type', 'text/plain'),
        ('Content-Length', str(len(response)))
    ])
    return [response]


application = SentryWsgiMiddleware(application)

repro.ini

[uwsgi]
http = :9090
wsgi-file = repro.py
gevent = 100

uwsgi --ini repro.ini

@Daverball
Copy link
Contributor Author

Daverball commented Feb 12, 2023

Although I'm not sure sentry-sdk will work at all, because the default transport complains about disabled threading in uwsgi, but you are not allowed to turn on uwsgi threads if you enable gevent. I've implemented my own transport that works with uwsgi and gevent. This however does not change whether or not Python threads are supported.

@Daverball
Copy link
Contributor Author

Daverball commented Feb 12, 2023

Actually it looks like the minimal repro does not actually reproduce the problem and does record samples periodically, like it's supposed to, although the profiles are completely useless because the frame only shows what the Hub greenlet is doing. But it's at least good to know, that I no longer need my custom transport. Looks like the default transport works just fine now with my setup.

@Daverball
Copy link
Contributor Author

Daverball commented Feb 12, 2023

@Zylphrex Alright I think it might be related to uwsgi's builtin static mapping and/or cache. If I enable them for the minimal repro case, then profiles stop working as well:

repro.ini

[uwsgi]
http = :9090
wsgi-file = repro.py
gevent = 100
enable-threads = True
cache2 = name=default,items=1000,purge_lru=True
static-map = /static=data/static
static-gzip-all = True
static-cache-paths = 60

Yes, If I remove the static mapping and cache from my application, then profiles start working there as well. I assume the way uwsgi forks the processes happens a bit differently with static maps.

@Daverball
Copy link
Contributor Author

@Zylphrex I've made a little proof-of-concept for sampling all the currently active greenlets. It's a little bit messy still, but it works. Cleaning it up properly will probably require some backend-support for an optional second identifier in addition to the thread_id for a fiber-like (i.e. in this case a greenlet) within a thread. Right now i'm just concatenating the two numbers into a string, if I add a hyphen or dot to separate the two ids, the Backend will reject the profile, if I try to generate a unique integer from the two integers instead of a string there also seem to be some problems on the backend, so right now there is some use of Union that is not very pretty for get_current_thread_id and related members/functions.
Daverball@e7d7dcd

@Zylphrex
Copy link
Member

@Zylphrex I've made a little proof-of-concept for sampling all the currently active greenlets. It's a little bit messy still, but it works. Cleaning it up properly will probably require some backend-support for an optional second identifier in addition to the thread_id for a fiber-like (i.e. in this case a greenlet) within a thread. Right now i'm just concatenating the two numbers into a string, if I add a hyphen or dot to separate the two ids, the Backend will reject the profile, if I try to generate a unique integer from the two integers instead of a string there also seem to be some problems on the backend, so right now there is some use of Union that is not very pretty for get_current_thread_id and related members/functions.
Daverball@e7d7dcd

@Daverball Thanks for this and the repro! I'll take a look this soon

@Zylphrex
Copy link
Member

Hey @Daverball, I'm still looking into a more permanent solution into this issue but I think I've narrowed down what's happening.

When running uwsgi with master mode, which your cache options implicitly enables, threads that were started in the master process do not run in the forked worker processes. This means that the sample only runs in the master process while the profiles are being pushed on the the queue in the worker process so no samples are ever taken.

As a short term solution, you can potentially try enabling lazy-apps in uwsgi (see uwsgi docs) which will delay the loading of the app and correctly start the profiler thread on the worker processes.

I'll look into a more permanent fix to this issue. Let me know if that works for you in the mean time. And I'll also be looking into your poc with the greenlets soon!

@Zylphrex
Copy link
Member

@Daverball if you are able to test this out this branch to see if it solves your problem that'd be much appreciated 😄

@Daverball
Copy link
Contributor Author

@Zylphrex Looks like delaying the start of the sampler thread does indeed do the trick.

@github-actions
Copy link

github-actions bot commented Mar 9, 2023

This issue has gone three weeks without activity. In another week, I will close it.

But! If you comment or otherwise update it, I will reset the clock, and if you label it Status: Backlog or Status: In Progress, I will leave it alone ... forever!


"A weed is but an unloved flower." ― Ella Wheeler Wilcox 🥀

@Daverball
Copy link
Contributor Author

@Zylphrex Since the original issue has been resolved I will close the issue. I may open another one for greenlet tracking improvements.

I might take a look at the backend code myself to see what's needed there to support tracking fibers in addition to threads.

@Zylphrex
Copy link
Member

Yes, do open a separate issue and we can discuss how we can improve greenlet tracking there 👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants