Fix concurrent harvesters #2541

ruflin · 2016-09-14T06:54:23Z

In case newly started harvesters did not persist their first state before the next scan started, it could have happened that multiple harvesters were started for the same file. This could have been cause by a large number of files or the output blocking.

The problem is solve that the Setup step of the Harvester is now synchronus and blocking the scan. Part of this is also updating the first state of the as part of the prospector.

The side affect of this change is that now a scan is blocking in case the channel is blocked which means the output is probably not responding. If the output is not responding, scans will not continue and new files will not be discovered until output is available again.

The code can be further simplified in the future by merging create/startHarvester. This will be done in a second step to keep backport commit to a minimum.

See also #2539

urso · 2016-09-14T12:05:27Z

filebeat/prospector/prospector.go

-				ok := p.outlet.OnEvent(event)
-				if !ok {
-					logp.Info("Prospector outlet closed")
-					return


change in logic here. This return used to stop this worker. The return having moved to updateState will not stop the worker potentially publishing more incoming events. The worker instead should be dropped + report to prospector it needs to 'shutdown' + drain the harvesterChan, so no harvester will be blocked.

I added an bool to updateState to be returned so the loop can check if event was sent and return accordingly. Same is needed for other calls of updateState.

urso · 2016-09-14T14:33:30Z

LGTM

In case newly started harvesters did not persist their first state before the next scan started, it could have happened that multiple harvesters were started for the same file. This could have been cause by a large number of files or the output blocking. The problem is solve that the Setup step of the Harvester is now synchronus and blocking the scan. Part of this is also updating the first state of the as part of the prospector. The side affect of this change is that now a scan is blocking in case the channel is blocked which means the output is probably not responding. If the output is not responding, scans will not continue and new files will not be discovered until output is available again. The code can be further simplified in the future by merging create/startHarvester. This will be done in a second step to keep backport commit to a minimum. See also elastic#2539

ruflin added bug review Filebeat Filebeat needs_backport PR is waiting to be backported to other branches. labels Sep 14, 2016

ruflin mentioned this pull request Sep 14, 2016

Prevent prospector scans from racing and leaking harvesters #2539

Closed

ruflin force-pushed the paralallel-harvesters-fix branch from 6818c40 to b90eabd Compare September 14, 2016 07:04

ruflin mentioned this pull request Sep 14, 2016

Filebeat reading deleted files, caused FS 100% used #2011

Closed

urso reviewed Sep 14, 2016
View reviewed changes

ruflin force-pushed the paralallel-harvesters-fix branch from cde1400 to dc12cfe Compare September 14, 2016 14:46

Review comments implemented

3086818

ruflin force-pushed the paralallel-harvesters-fix branch from dc12cfe to 3086818 Compare September 14, 2016 15:01

tsg merged commit dc80b9c into elastic:master Sep 14, 2016

ruflin deleted the paralallel-harvesters-fix branch September 14, 2016 17:41

ruflin mentioned this pull request Sep 21, 2016

Filebeat holds open deleted file descriptions with close_removed #2608

Closed

tsg removed the needs_backport PR is waiting to be backported to other branches. label Sep 22, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix concurrent harvesters #2541

Fix concurrent harvesters #2541

ruflin commented Sep 14, 2016 •

edited

Loading

urso Sep 14, 2016

ruflin Sep 14, 2016

urso commented Sep 14, 2016

Fix concurrent harvesters #2541

Fix concurrent harvesters #2541

Conversation

ruflin commented Sep 14, 2016 • edited Loading

urso Sep 14, 2016

Choose a reason for hiding this comment

ruflin Sep 14, 2016

Choose a reason for hiding this comment

urso commented Sep 14, 2016

ruflin commented Sep 14, 2016 •

edited

Loading