-
Notifications
You must be signed in to change notification settings - Fork 431
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adding correlation IDs to reconcile loops #1460
Conversation
Welcome @arschles! |
Hi @arschles. Thanks for your PR. I'm waiting for a kubernetes-sigs member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/kind feature |
/ok-to-test Welcome @arschles! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
WDYT about changing the way that tele.Tracer.Start()
works such that anytime start is called and the context does not contain a correlationID, one is generated and added to the value bag. This could be done as a best effort kind of thing, so the error could be ignored, or a default value substituted if a new UUID can not be created?
Also, pkg/trace/corr.go
requires a file header comment. You will see them littered through the other code files. If you run make test
, it will generate files, lint, and run the local tests.
@devigned I think that's a great idea! I'm working on both the |
/retest |
pkg/trace/span.go
Outdated
// } | ||
// defer newSpan.End() | ||
// doSomething(ctx) | ||
func StartSpan( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
WDYT of something like this:
In ./util/tele.go
, we wrap the otel.Tracer
with our own struct that implements trace.Tracer
. We then write our own Start(...)
func which would then delegate back to otel.Tracer().Start(...)
. That way the interface can stay the same.
package tele
type tracer struct {
otel.Tracer
}
func (t *tracer) Start(ctx, opName string, opt ...otel.SpanOption) (context.Context, otel.Span) {
// add correlationID to ctx and continue as normal
// return the inner implementation of Start
return t.Tracer.Start(ctx, opName, opt...)
}
Then either in the same package or elsewhere, add context helpers to set / fetch the correlationID.
package corr // or something else...
// FromCtx or some better name you think of
func FromCtx(ctx context.Context) (ctx, string) {
// create new context.WithValue containing correlationID and return both if it does not exist, else return context and correlationID
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, happy to do that! A few questions before I get started, just to make sure I'm on the same page
- I'm assuming that you'd want to still call
tele.Tracer().Start(...)
in all the reconcilers, correct? - Are you ok with creating our own tracer instance (the wrapper you mentioned) and then passing that -- instead of the default one -- to
otel.SetTracerProvider
inmain.go
? - Do we need
FromCtx
to be exported at all? It seems like it would only be used inside the same package as the custom tracer, but not sure if you're thinking something more
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm assuming that you'd want to still call tele.Tracer().Start(...) in all the reconcilers, correct?
If possible, that would be great.
Are you ok with creating our own tracer instance (the wrapper you mentioned) and then passing that -- instead of the default one -- to otel.SetTracerProvider in main.go?
💯
Do we need FromCtx to be exported at all? It seems like it would only be used inside the same package as the custom tracer, but not sure if you're thinking something more
I was adding that because I saw you were returning the correlationID in your StartSpan
func. This led me to think you wanted a way to access it from the context. I think one might need to access the correlationID from the context to be able to add that as a HTTP header for the Azure SDK for Go clients.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
perfect - I'll make these changes.
regarding FromCtx
- just in case someone wants to access the correlation ID, I'll make FromCtx
exported like you said.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@devigned just an update, so you know I haven't forgotten about this! I'm looking through the implementation & docs trying to figure out a way to create a new TracerProvider
that we can pass to the SetTracerProvider
function in main.go
. That would need to create a new Tracer
which wraps the implementation you referenced above. I think that's going to require a lot of plumbing, which we could make unnecessary if we called a custom function to get either:
- get a new tracer that which wraps the underlying tracer implementation, like what you showed here
- create a new span, which is what's already in this PR
I'll keep digging, but let me know about going with either of the above options in case there isn't a good way to wrap everything we need
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another option would be to create a custom TracerProvider
that wraps the jaeger one. Let me know your thoughts on that too?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for the delay, @arschles. I was thinking something more like this: master...devigned:corr
The above diff should add the correlationID to the context without needing to manipulate the signature of the Tracer. wdyt?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @devigned I see what you mean now! Thanks, I'll implement that.
@devigned got it, I've removed most of the formatting changes made earlier - there are a few still here in which I moved the creation of cancellable contexts below the creation of the spans. that is so the new contexts can be based on the contexts that were created by the new span. I do think it's valuable - but not critical - to do that so the cancellable context has the new correlation IDs in it. let me know what you think of that. also to address your and @CecileRobertMichon's point about percolating these correlation IDs up to the Azure API - I had originally planned on doing that in a follow-up PR when this PR was large, to draw a clear line between the change to add the IDs and the change to send them up to Azure. now that this is smaller, would you prefer I do that all in this PR? I'm happy to do it either way, whatever makes it easier for you all |
separate PR is fine with me, up to you |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
/assign @CecileRobertMichon
Thanks @CecileRobertMichon - I'll submit a follow-up PR with those changes |
8fb96e3
to
ac73cf6
Compare
/lgtm |
defer span.End() | ||
ctx, cancel := context.WithTimeout(ctx, reconciler.DefaultedLoopTimeout(ampr.ReconcileTimeout)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why the change in order here but not in every controller? azuremachine and azurecluster still have the reverse order
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I moved this context creation down to below the creation of the span so that the timeout context includes the correlation ID value. IIRC the others had this change but I reverted them by accident. Would it make more sense to save all these changes for a separate PR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
either way, as long as the changes are consistent (make the changes here to all the controllers or none of them)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, going to take this change out then because it's not strictly necessary
Signed-off-by: Aaron Schlesinger <[email protected]>
), | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@CecileRobertMichon are you ok with this the the other similar format fixes?
/retest |
/lgtm |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: CecileRobertMichon The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
/test pull-cluster-api-provider-azure-e2e |
Signed-off-by: Aaron Schlesinger [email protected]
What type of PR is this?
/kind feature
What this PR does / why we need it:
This PR adds shared correlation IDs to
context.Context
s passed to reconcile loops, so that the API requests made to Azure get logged and tracked properly. That feature is especially useful for debugging.Which issue(s) this PR fixes (optional, in
fixes #<issue number>(, fixes #<issue_number>, ...)
format, will close the issue(s) when PR gets merged):Fixes #1310
Special notes for your reviewer:
Please confirm that if this PR changes any image versions, then that's the sole change this PR makes.
TODOs:
Release note: