Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(otel): ensure OTEL sends final span when -o opt is used #6505

Merged
merged 3 commits into from
Oct 7, 2024

Conversation

eysi09
Copy link
Collaborator

@eysi09 eysi09 commented Oct 3, 2024

Before this fix, when the -o/--output option was used the last OTEL span wouldn't be sent.

Fixing this surfaced some other issues and the code changes might look odd out of context so I'll try and explain the motivations below (also explained in the individual commit messages):

First commit

First off, the Cli.run() method would either return the results to the callee which would then do the clean up OR, if a certain combination of flags was passed, it would trigger a process exit itself (via the shutdown function) and the cleanup logic in the callee would never run. This was one of the reasons for the OTEL bug.

This meant that the behaviour of Cli.run() was pretty unpredictable and hard to track and the cleanup logic in the callee wasn't guaranteed to run. This was fixed in the first commit and now the Cli.run() method always returns the results in a consistent manner and cleanup is done by the callee.

Second commit

After this change, it became obvious that the exitOnAbort flag didn't really do anything so it was removed in the second commit.

Third commit

Finally, fixing the OTEL bug with the Cli.run() changes exposed another bug where calling otelSdk.shutdown() would hang in those cases where an exporter hadn't been configured. This happens when the Garden class isn't initialised AND no OTEL exporter set via the OTEL_* env var. For example when running garden help. This was fixed by checking whether an exporter was set in the first place before calling otelSdk.shutdown(). It's a little ugly but the other alternatives I explored didn't feel any better, it's kind of a function of how we use OTEL.

What this PR does / why we need it:

Which issue(s) this PR fixes:

Fixes #6482

Special notes for your reviewer:

@eysi09 eysi09 force-pushed the fix-otel-shutdown branch 5 times, most recently from 52d5d23 to ccdaf2c Compare October 4, 2024 13:06
@eysi09 eysi09 marked this pull request as ready for review October 4, 2024 13:19
@eysi09 eysi09 requested review from twelvemo and removed request for stefreak and thsig October 4, 2024 14:07
@eysi09 eysi09 changed the title fix(otel): ensure OTEL sends final span when -o opt is used (WIP) fix(otel): ensure OTEL sends final span when -o opt is used Oct 7, 2024
@vvagaytsev
Copy link
Collaborator

Great! 💯
I left a minor comment, otherwise looks good! 👍

Before this fix, when the -o/--output option was used, Garden would call
the `shutdown` function early which prevented the final OTEL span to be
emitted.

Now we only run the shutdown logic and (wait for output flush) at the
very end of the command run.

This also means that the `Cli.run()` method no longer has
hard-to-predict side effects, depending on what flags were passed.
This option didn't appear to have any real meaning after we moved the
call to the `shutdown` function to the parent caller (i.e. the
`runCli()` function) as opposed to the `Cli.run()` method so we're
removing it here.
Before this fix, attempting to shutdown the OTEL SDK would hang if
called before a target exporter is set. The target exporter is set in
the Garden class so this would e.g. happen when running 'garden help'
and cause a delay before the attempt to shutdown OTEL times out.

With this commit we now check whether an exporter is configured, either
via env vars or the ReconfigurableExporter, and only attempt to shut down
if that's the case.

Note that we need to handle this in the enclosing context as opposed to
having the ReconfigurableExporter handle it itself because the exporter
doesn't "know" if a target exporter will be set later or not.
@eysi09 eysi09 requested a review from vvagaytsev October 7, 2024 15:31
@eysi09 eysi09 added this pull request to the merge queue Oct 7, 2024
Merged via the queue into main with commit 7b96bdb Oct 7, 2024
40 checks passed
@eysi09 eysi09 deleted the fix-otel-shutdown branch October 7, 2024 16:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Bug]: Opentelemetry missing span when using "-o yaml" option
2 participants