Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fluent-bit: Fix fluent-bit exit callback when buffering is enabled #2365

Merged
merged 1 commit into from
Jul 17, 2020

Conversation

dojci
Copy link
Contributor

@dojci dojci commented Jul 16, 2020

What this PR does / why we need it:
Fluent Bit loki output plugin cannot handle exit callback properly while dque buffering (#2142) is in use. Fluent Bit will remain in the blocking state forever and never stops normally because of loki output plugin.

Which issue(s) this PR fixes:

When Fluent Bit will stop using the instance of the output plugin (on shutdown), it will trigger the exit callback FLBPluginExit and this in turn will call Stop() func with shutdown procedure:

loki/cmd/fluent-bit/dque.go

Lines 119 to 122 in b27f92e

func (c *dqueClient) Stop() {
c.once.Do(func() { close(c.quit) })
c.loki.Stop()
c.wg.Wait()

Dequeuer goroutine listen for a stop signal (non-blocking) in endless loop where next func DequeueBlock() is blocking call until new log is available:
for {
select {
case <-c.quit:
return
default:
}
// Dequeue the next item in the queue
entry, err := c.queue.DequeueBlock()

In case where Dequeuer waits for a next log and Fluent Bit triggers the exit callback at the same time, the next log never arrives to unblock Dequeuer to be able to check quit channel in the next loop iteration. Fluent-bit will wait forever for exit callback to return but this will never happen because Dequeuer is blocked.

This PR fixes unwanted Dequeuer gorutine stop behavior by replacing quit channel and synchronization with Close() which is part of dque package. Close() func unblocks waiting Dequeuer goroutine and return ErrQueueClosed in DequeueBlock().

@codecov-commenter
Copy link

Codecov Report

Merging #2365 into master will increase coverage by 0.05%.
The diff coverage is 0.00%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #2365      +/-   ##
==========================================
+ Coverage   61.50%   61.55%   +0.05%     
==========================================
  Files         160      160              
  Lines       13534    13527       -7     
==========================================
+ Hits         8324     8327       +3     
+ Misses       4585     4576       -9     
+ Partials      625      624       -1     
Impacted Files Coverage Δ
cmd/fluent-bit/dque.go 0.00% <0.00%> (ø)
pkg/promtail/targets/file/filetarget.go 70.48% <0.00%> (+1.80%) ⬆️

Copy link
Contributor

@cyriltovena cyriltovena left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

Small remark, some tests here would make working on this much easier for others, if you have time to write some of them feels free to :)

@cyriltovena cyriltovena merged commit ca94b2c into grafana:master Jul 17, 2020
@dojci
Copy link
Contributor Author

dojci commented Jul 28, 2020

LGTM.

Small remark, some tests here would make working on this much easier for others, if you have time to write some of them feels free to :)

Thank you. I will definitely do it.

@jcdauchy-moodys
Copy link

jcdauchy-moodys commented Aug 10, 2020

@dojci does your feature fixes the issue with "out of order" of loki fluent-bit plugin described here : https://github.com/fluent/fluent-bit/issues/1746 ?

Basically does it have the ability to reorder the data based on time before sending the data to loki ?

Thanks
JC

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants