-
Notifications
You must be signed in to change notification settings - Fork 214
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: add telemetry #298
feat: add telemetry #298
Conversation
Current results from the telemetry+prometheus companion livebook:
Some validation from users would be nice, but it looks like we have most if not full retrocompatibility, aside from the unsupported summary metric |
To minimize the size of this PR, I'll add client events to the livebook in a follow-up PR :) |
@beligante @wingyplus reviews are welcome (for anyone else reading this as well!) :) |
@polvalente Great Job! Code LGTM, tho, I have some considerations. You may know this behavior, but server and client codes behave differently for streams. ServerFor
ClientFor clients, the behavior for streams, in any case, is different from the server. Before, I mentioned that the server needs a blocking code. On the other hand, clients have non-blocking/async behavior. Dealing with streams on the client side is transparent because the interactions happen inside the adapters asynchronously or are wrapped inside an I want to ensure that this behavior is okay with you. Otherwise, we could add a to-do to improve that. |
lib/grpc/stub.ex
Outdated
) | ||
|
||
# re-raise accordingly | ||
case kind do |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice!
@beligante regarding the duration for clients, I think that one could aggregate events based on the stream metadata somehow to provide a meaningful number. I need to check what's the current behavior with |
lib/grpc/server.ex
Outdated
duration = System.monotonic_time() - t0 | ||
|
||
:ok = | ||
GRPC.Telemetry.server_rpc_exception( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't :telemetry.span
suitable for this case?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
:telemetry.span
doesn't treat rescue
separately
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, I tested locally that raise
can be catch
-ed as an :error
, so it seems that it's indeed equivalent, though I have to check if the only metadata that changes is what span handles :) I'll keep this unresolved til I confirm this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There were some changes needed anyway, but I went with span because it decouples a bit of the code and leaves basically the formatting and error-case handling to us.
@@ -0,0 +1,282 @@ | |||
# Telemetry |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wow. We can now run grpc server inside Livebook? o
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yup! You need to have the rights to bind the port you'll run the server
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
:)
I believe you can in theory run anything inside livebook if you try hard enough haha
In this case it was luckily not hard at all, and so the livebook both shows the way to use telemetry and works as a manual validation file for developing this PR
@@ -0,0 +1,282 @@ | |||
# Telemetry |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yup! You need to have the rights to bind the port you'll run the server
Got it! Makes Sense! |
Merging this, with #315 left as a pendency. |
closes #229
related to #299
"minimal event set" here means the set of events needed to publish the same Prometheus data that
is published by
:grpc_prometheus
, which is being deprecated due to incompatibilities with Elixir 1.14and in favor of adding the more portable
:telemetry
events.It's worth noting that we aren't measuring exactly the same times because the previous approach would actually depend on where the interceptors were placed in the pipeline.
That can still be mimicked by a user by placing their own interceptor + telemetry event there, following the same ideas that this PR introduces. However, for better documentation and clarity of expectations, we're not providing optional interceptors in favor of having these events always be published.