-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Promtail: Fix deadlock on tailer shutdown. #2717
Conversation
…ads the Lines channel from the tailer and it will exit when the tailer closes the channel.
Codecov Report
@@ Coverage Diff @@
## master #2717 +/- ##
==========================================
- Coverage 61.40% 61.36% -0.05%
==========================================
Files 173 173
Lines 13446 13463 +17
==========================================
+ Hits 8257 8261 +4
- Misses 4431 4447 +16
+ Partials 758 755 -3
|
… position information of the file gracefully fail if the file no longer exists.
… shutdown and errors are handled.
// If the file no longer exists, no need to save position information | ||
if err == os.ErrNotExist { | ||
level.Info(t.logger).Log("msg", "skipping update of position for a file which does not currently exist") | ||
return nil | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is really the only important change in this function, the diff is screwy because I swapped the order of Tell() and Size() calls.
This is a more graceful way to handle ignoring the size/position for a file which doesn't exist, it leverages the change in the HP tail lib which returns the os.ErrNotExist error
|
||
return nil | ||
} | ||
|
||
func (t *tailer) stop(removed bool) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In a change I made recently, I added this bool to do the conditional position update below but that isn't necessary (and was also not implemented correctly), it can be removed, calling the markPositionAndSize is basically a NOOP now if the file does not exist
pkg/promtail/targets/file/tailer.go
Outdated
level.Error(t.logger).Log("msg", "error marking file position when stopping tailer", "path", t.path, "error", err) | ||
} | ||
|
||
err = t.tail.Stop() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Stop the tail here, these leaves the main goroutine running still which will consume the tail.Lines channel until it's closed
pkg/promtail/targets/file/tailer.go
Outdated
level.Error(t.logger).Log("msg", "position timer: error getting tail position and/or size, stopping tailer", "path", t.path, "error", err) | ||
// To prevent a deadlock on stopping the tailer we need to launch a thread to consume any unread lines | ||
go func() { | ||
for range t.tail.Lines {} | ||
}() | ||
t.tail.Stop() | ||
if err != nil { | ||
level.Error(t.logger).Log("msg", "position timer: error stopping tailer", "path", t.path, "error", err) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is to help recover from weird errors we have seen in some environments, if we fail to update the position information on the timer we want to close the tailer so it can be re-opened by the upper level sync function.
Adding the goroutine here to consume t.tail.Lines makes sure we dont' deadlock on lines being in the channel when we tel the tail to stop
case <-t.quit: | ||
return |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removing the quit channel and instead we will quit when the underlying tailer closes the t.tail.Lines channel above.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! I just have small nits, but I'm concerned about the vendor update. There's a few more cases where tail.file
is being modified or referenced without holding the lock on the new mutex. I think we should play it safe and protect all accesses to it.
// If the file no longer exists, no need to save position information | ||
if err == os.ErrNotExist { | ||
level.Info(t.logger).Log("msg", "skipping update of position for a file which does not currently exist") | ||
return nil | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Small nit: can you make this an if/else if?
size, err := t.tail.Size()
if err == os.ErrNotExist {
// ...
return nil
} else if err != nil {
return err
}
… a little more clear and remove some special case logic.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTMBWN! (Looks Good To Me But With Nits)
Thanks for making the changes, I think this is a lot easier to follow. Now that I think about it, doing it this way also guarantees we'll finish writing all lines to Loki, which I'm not sure would've happened with your first attempt at this.
to avoid a deadlock on shutdown, leave the goroutine running which reads the Lines channel from the tailer and it will exit when the tailer closes the channel.
Also updated the underlying hp tail fork we have to return an os.ErrNotExist when you call the Size or Tell methods and the tailer is not currently tailing a file (happens when a file is deleted before we have a chance to remove the tailer or on file rotation)