-
Notifications
You must be signed in to change notification settings - Fork 19
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(backend): add e2e useful tracing #2037
Conversation
3f53aae
to
7aa257d
Compare
7aa257d
to
76ec6b1
Compare
76ec6b1
to
b139f40
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like a great improvement, well done
It's probably nothing but just in case .. Looking at open-telemetry/opentelemetry-python-contrib#1835 (comment) it looks like we need to add an ENTRYPOINT but it looks like this is not part of this PR ... is that on purpose or should we add it ?
Nice :) I'll take a closer look but first a comment and a question. Type hintsWe get a type hint error due to the endpoint, we already have the same thing in infrahub/server.py but that file is ignored. I think this should be fixed on the config side i.e. infrahub/config.py so that either the endpoint is required or that we use a property that returns a string. (This could also be handled as a separate PR as it's not related to your work here) AsyncioAs you mention in the issue text there might be some async/sync concerns. When setting up this tracing from the beginning we left it off by default in the config, I have some memory that the library used for this wasn't using asyncio and that it needed to export traces to an external URL. I haven't looked into the libraries used for this but can at least see that we're now executing the messages within a synchronous context manager, would this mean that we'd be sending any synchronous requests when we do that? We already had a sync context manager for the db queries but that part didn't touch the network it only allowed information to be gathered from the API server (or http endpoint on the git-agents). If we could abstract |
b139f40
to
333d166
Compare
This is required to apply the monkeypatch through Poetry but I couldn't make it work. |
Thanks Patrick :)
Yes, this is exactly my point. The otel library cannot use async context managers yet so as you mentioned, traces seem exported synchronously that would lead to performance drop. I'll take a look and maybe implement what you proposed if we can't make much progress. At least the feature would be available for debugging purposes even if performance is worse when enabled. |
Some inputs for later: WIP PR to have the async decorator upstream (so we may remove the third party otel_extensions lib): open-telemetry/opentelemetry-python#3633 For the sync context issue, this might be the solution but still have to confirm perfs ourselves: open-telemetry/opentelemetry-python#62 (comment) |
333d166
to
c5fd503
Compare
c5fd503
to
037dadb
Compare
2abd296
to
8e109ae
Compare
I managed to remove the dependency to the I don't see any performance issue in the different test suites we have: we make use of We can either wait for this last feature to be done upstream, or merge this as-is... I don't have a strong opinion on this one, but at least the feature is ready. I have also enabled tracing for the CI e2e tests. And the message bus RPCs are now auto instrumented (less custom code) thanks to 501a1f8 |
8e109ae
to
30a3f1c
Compare
@fatih-acar, is the tooling around this in a state where you'd want to merge it now? I missed that it was marked for review again. :( As it's semi related could be nice with the cleanup required to remove this from pyproject.toml (not required just nice :) ) [[tool.mypy.overrides]]
module = "infrahub.server"
ignore_errors = true |
30a3f1c
to
2422bd2
Compare
CodSpeed Performance ReportMerging #2037 will not alter performanceComparing Summary
|
@ogenstad yes as I have stated in my previous comment, I think we can merge this now since I've removed the 3rd party lib I previously used. There's still this async exporter issue upstream that won't be resolved anytime soon I guess... but batching should help avoid perf issues. I have removed the mypy ignore statement, tests are still passing :) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, sorry for the delay.
2422bd2
to
309ea43
Compare
fa756d9
to
ace1aae
Compare
Enabled tracing component inside git agent in order to have end to end spans for a single request. Added spans at most useful places such as message bus' execute_message and database execute_query. Signed-off-by: Fatih Acar <[email protected]>
Signed-off-by: Fatih Acar <[email protected]>
ace1aae
to
44cbe7b
Compare
Enabled tracing component inside git agent in order to have end to end spans for a single request.
Added spans at most useful places such as database execute_query.
You can use the jaeger all-in-one container (or grafana tempo, as you wish :)) in order to try it out, don't forget to enable tracing in the config.
Note the custom span I added as an example on BranchCreate mutation. You can either use the decorator or use the longer "start_as_current_span" context method if you want custom dynamic attributes.
Note: I will leave this as draft since the internet says the opentracing lib slows down async apps, I'd like to ensure there is no perf regression with this one...
This is how it renders on jaeger: