-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
JSON agent logs buffered on startup #13015
Comments
Hi @wjordan! The reasoning behind the buffering is that Nomad's setup involves a bunch of concurrent goroutines all spinning up various bits of work, but the log output is intended to start with the "banner message" once the configuration has been validated. If we didn't buffer, that banner would be interleaved with the rest of the logs. But obviously that's not a desirable behavior when you're trying to debug the startup itself. It looks like it'd be possible to make this feature configurable in the logging config, as that's read early enough that we could probably set everything up the same way and then just flush the gate right away. I'll mark this for roadmapping, but if you're up for opening a PR for it we'd also be happy to review it I'm sure. |
Writes to logger implementations are typically protected by mutexes to prevent any such interleaving from occurring. However, one exception is Nomad's use of Lines 94 to 98 in be7ec8d
Is this the underlying issue you're referring to? In this case, a better solution would be to use the |
Yes, and the "obvious" fix here if we'd used So that's why I think we want this to be part of the logging configuration. |
Ohh, you're not referring to character-level interleaving on an underlying log stream, you're just referring to the ordering of log-lines in the agent startup, which is a very different story. I was completely confused by what you were saying about the buffering preventing the banner from being interleaved, until I finally discovered through further experimentation that the
Here's what the non-json log outputs:
The json case buffers the logs until the agent finishes starting up and also still interleaves the written log lines, so the buffering is doing nothing other than delaying the output. So one narrow way to fix this issue for json logging would be to just add a nomad/command/agent/command.go Lines 703 to 706 in 88e8c22
Does this approach sound reasonable? |
Yes, sounds good to me! Tag me for review on the PR and I'll be happy to get that merged. Thanks! |
I'm going to lock this issue because it has been closed for 120 days ⏳. This helps our maintainers find and focus on the active issues. |
Issue
I recently discovered that the Nomad agent buffers all of its logs on startup until it reaches a certain point in the initialization process:
nomad/command/agent/command.go
Lines 775 to 779 in bab219a
This behavior made a recent attempt to debug issues with a slow-loading custom task driver plugin confusing and difficult. I wasn't able to view the logs until after the initialization was completed, and log lines were tagged with incorrect timestamps (reflecting when the buffer was flushed instead of when the log lines were originally recorded).
I couldn't find any mention of this behavior in the documentation, and only figured out what was happening by reading through the source code. (In hindsight, it seems that the
Log data will stream in below
message hints at the buffer being flushed, but this meaning not clear enough from the text alone.)The original commit introducing this
logGate
/GatedWriter
(2165576) was very early in the project, and there is no commit message or discussion around this behavior, so I still don't understand why this log buffering was implemented in the first place or what purpose it continues to serve. I can't think of any scenario where this kind of buffering behavior would actually be helpful.Could this buffering behavior be simply removed, or at least made configurable, so that agent logs (including logs generated by task-driver plugins) can be flushed immediately? If there is a reason why this buffering can't or shouldn't be removed, could some additional documentation / comments detailing the reason for this behavior be added to the project?
The text was updated successfully, but these errors were encountered: