Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Promtail: Fix deadlock on tailer shutdown. #2717

Merged
merged 12 commits into from
Oct 4, 2020

Conversation

slim-bean
Copy link
Collaborator

@slim-bean slim-bean commented Oct 3, 2020

to avoid a deadlock on shutdown, leave the goroutine running which reads the Lines channel from the tailer and it will exit when the tailer closes the channel.

Also updated the underlying hp tail fork we have to return an os.ErrNotExist when you call the Size or Tell methods and the tailer is not currently tailing a file (happens when a file is deleted before we have a chance to remove the tailer or on file rotation)

…ads the Lines channel from the tailer and it will exit when the tailer closes the channel.
@codecov-commenter
Copy link

codecov-commenter commented Oct 3, 2020

Codecov Report

Merging #2717 into master will decrease coverage by 0.04%.
The diff coverage is 74.41%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #2717      +/-   ##
==========================================
- Coverage   61.40%   61.36%   -0.05%     
==========================================
  Files         173      173              
  Lines       13446    13463      +17     
==========================================
+ Hits         8257     8261       +4     
- Misses       4431     4447      +16     
+ Partials      758      755       -3     
Impacted Files Coverage Δ
pkg/promtail/targets/file/tailer.go 70.09% <72.50%> (+1.20%) ⬆️
pkg/promtail/targets/file/filetarget.go 64.08% <100.00%> (ø)
pkg/promtail/positions/positions.go 46.80% <0.00%> (-11.71%) ⬇️
pkg/querier/queryrange/downstreamer.go 97.64% <0.00%> (+2.35%) ⬆️

… position information of the file gracefully fail if the file no longer exists.
@pull-request-size pull-request-size bot added size/M and removed size/S labels Oct 3, 2020
Comment on lines +141 to +145
// If the file no longer exists, no need to save position information
if err == os.ErrNotExist {
level.Info(t.logger).Log("msg", "skipping update of position for a file which does not currently exist")
return nil
}
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is really the only important change in this function, the diff is screwy because I swapped the order of Tell() and Size() calls.
This is a more graceful way to handle ignoring the size/position for a file which doesn't exist, it leverages the change in the HP tail lib which returns the os.ErrNotExist error


return nil
}

func (t *tailer) stop(removed bool) {
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In a change I made recently, I added this bool to do the conditional position update below but that isn't necessary (and was also not implemented correctly), it can be removed, calling the markPositionAndSize is basically a NOOP now if the file does not exist

level.Error(t.logger).Log("msg", "error marking file position when stopping tailer", "path", t.path, "error", err)
}

err = t.tail.Stop()
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Stop the tail here, these leaves the main goroutine running still which will consume the tail.Lines channel until it's closed

Comment on lines 101 to 109
level.Error(t.logger).Log("msg", "position timer: error getting tail position and/or size, stopping tailer", "path", t.path, "error", err)
// To prevent a deadlock on stopping the tailer we need to launch a thread to consume any unread lines
go func() {
for range t.tail.Lines {}
}()
t.tail.Stop()
if err != nil {
level.Error(t.logger).Log("msg", "position timer: error stopping tailer", "path", t.path, "error", err)
}
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is to help recover from weird errors we have seen in some environments, if we fail to update the position information on the timer we want to close the tailer so it can be re-opened by the upper level sync function.

Adding the goroutine here to consume t.tail.Lines makes sure we dont' deadlock on lines being in the channel when we tel the tail to stop

Comment on lines -128 to -129
case <-t.quit:
return
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removing the quit channel and instead we will quit when the underlying tailer closes the t.tail.Lines channel above.

@slim-bean slim-bean requested a review from rfratto October 3, 2020 19:57
Copy link
Member

@rfratto rfratto left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! I just have small nits, but I'm concerned about the vendor update. There's a few more cases where tail.file is being modified or referenced without holding the lock on the new mutex. I think we should play it safe and protect all accesses to it.

pkg/promtail/targets/file/tailer.go Outdated Show resolved Hide resolved
Comment on lines +141 to +145
// If the file no longer exists, no need to save position information
if err == os.ErrNotExist {
level.Info(t.logger).Log("msg", "skipping update of position for a file which does not currently exist")
return nil
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Small nit: can you make this an if/else if?

size, err := t.tail.Size()
if err == os.ErrNotExist {
  // ...
  return nil
} else if err != nil {
  return err
} 

@pull-request-size pull-request-size bot added size/L and removed size/M labels Oct 4, 2020
Copy link
Member

@rfratto rfratto left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTMBWN! (Looks Good To Me But With Nits)

Thanks for making the changes, I think this is a lot easier to follow. Now that I think about it, doing it this way also guarantees we'll finish writing all lines to Loki, which I'm not sure would've happened with your first attempt at this.

pkg/promtail/targets/file/tailer.go Outdated Show resolved Hide resolved
vendor/github.com/hpcloud/tail/tail.go Outdated Show resolved Hide resolved
@slim-bean slim-bean merged commit 8ea6c38 into master Oct 4, 2020
@slim-bean slim-bean deleted the promtail-tailer-clean-exit branch October 4, 2020 21:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants