Known Issue: Prospector reloading unfinished files #3546

ruflin · 2017-02-07T09:32:41Z

Prospector reloading was introduce in 5.3. This issue is intended to describe a known issue with the implementation.

This bug affects all reloading which reloads a prospector with a file that was harvested before. If a prospector is started with new files this should not have any effect. In general the recommendation is to use reloading not to update settings like fields or paths in a prospector but to add new prospectors with new paths and to remove old ones.

Example Working

subconfig.yml before

- input_type: log
  paths:
    - /var/log/test.log
  scan_frequency: 1s

subconfig.yml after

- input_type: log
  paths:
    - /var/log/test.log
  scan_frequency: 1s
- input_type: log
  paths:
    - /var/log/newfile.log
  scan_frequency: 1s

This works because only a new prospector for newfile.log has to be started and the older prospector keeps running.

Example NOT Working

subconfig.yml before

- input_type: log
  paths:
    - /var/log/test.log
  scan_frequency: 1s

subconfig.yml after

- input_type: log
  paths:
    - /var/log/test.log
    - /var/log/newfile.log
  scan_frequency: 1s

The above is not working because the prospector with test.log has to be stopped and a new prospector with test.log has to be started.

Technical Details

On shutdown filebeat tries to shut down as fast as possible. This means on shutdown filebeat is not waiting to complete sending all events and persisting all states to disk as it is unknown how long this will take. This can become an issue with reloading.

When a new prospector configuration option is loaded, filebeat ensures that all states which are loaded by the new prospector have the state set to finish. This is to verify that no file is harvested by 2 different prospectors at the same time as this could lead to duplicated events and unexpected behaviour.

In Filebeat in harvester/log.go the following code exists which is used to send events and state.

func (h *Harvester) sendEvent(event *input.Event) bool {
	select {
	case <-h.done:
		return false
	case h.prospectorChan <- event: // ship the new event downstream
		return true
	}
}

This code is called the last time before Stopping a harvester with state.Finished: true to set the state to finished, so a new harvester can pick it up. The problem now is that it can happen, that this state is not sent because the select statement goes into h.done. On normal shutdown this is not an issue it is not persisted to disk if a state is Finished or not. For prospector reloading it matters because the memory states contain the info if the state is Finished. So in case h.done is selected instead of h.propsecotrChan a harvester state is never finished.

The reason h.done is required here is that when stopping filebeat and the output is blocking, it still shuts down directly. This implies that a prospector and harvester need 2 different stop methods: One that waits for completion of sending which is used for reloading and one that shuts down immediately.

Some experiments happen in #3538

The text was updated successfully, but these errors were encountered:

See elastic#3546

See #3546

There are two options for stopping a harvester or a prospector. Either the harvester and prospector finish sending all events and stop them self or they are killed because the output is blocking. In case of shutting down filebeat without using `shutdown_timeout` filebeat is expected to shut down as fast as possible. This means channels are directly closed and the events are not passed through to the registry. In case of dynamic prospector reloading, prospectors and harvesters must be stopped properly as otherwise no new harvester for the same file can be started. To make this possible the following changes were made: * Introduce harvester tracking in prospector to better control / manage the harvesters. The implementation is based on a harvester registry which starts and stops the harvesters * Use an outlet to send events from harvester to prospector. This outlet has an additional signal to have two options on when the outlet should be finished. Like this the outlet can be stopped by the harvester itself or globally through closing beatDone. * Introduce more done channels in prospector to make shutdown more fine grained * Add system tests to verify new behaviour Closes elastic#3546

* Fix harvester shutdown for prospector reloading There are two options for stopping a harvester or a prospector. Either the harvester and prospector finish sending all events and stop them self or they are killed because the output is blocking. In case of shutting down filebeat without using `shutdown_timeout` filebeat is expected to shut down as fast as possible. This means channels are directly closed and the events are not passed through to the registry. In case of dynamic prospector reloading, prospectors and harvesters must be stopped properly as otherwise no new harvester for the same file can be started. To make this possible the following changes were made: * Introduce harvester tracking in prospector to better control / manage the harvesters. The implementation is based on a harvester registry which starts and stops the harvesters * Use an outlet to send events from harvester to prospector. This outlet has an additional signal to have two options on when the outlet should be finished. Like this the outlet can be stopped by the harvester itself or globally through closing beatDone. * Introduce more done channels in prospector to make shutdown more fine grained * Add system tests to verify new behaviour Closes #3546 * review added

* Fix harvester shutdown for prospector reloading There are two options for stopping a harvester or a prospector. Either the harvester and prospector finish sending all events and stop them self or they are killed because the output is blocking. In case of shutting down filebeat without using `shutdown_timeout` filebeat is expected to shut down as fast as possible. This means channels are directly closed and the events are not passed through to the registry. In case of dynamic prospector reloading, prospectors and harvesters must be stopped properly as otherwise no new harvester for the same file can be started. To make this possible the following changes were made: * Introduce harvester tracking in prospector to better control / manage the harvesters. The implementation is based on a harvester registry which starts and stops the harvesters * Use an outlet to send events from harvester to prospector. This outlet has an additional signal to have two options on when the outlet should be finished. Like this the outlet can be stopped by the harvester itself or globally through closing beatDone. * Introduce more done channels in prospector to make shutdown more fine grained * Add system tests to verify new behaviour Closes elastic#3546 * review added (cherry picked from commit 15b32e4)

* Fix harvester shutdown for prospector reloading There are two options for stopping a harvester or a prospector. Either the harvester and prospector finish sending all events and stop them self or they are killed because the output is blocking. In case of shutting down filebeat without using `shutdown_timeout` filebeat is expected to shut down as fast as possible. This means channels are directly closed and the events are not passed through to the registry. In case of dynamic prospector reloading, prospectors and harvesters must be stopped properly as otherwise no new harvester for the same file can be started. To make this possible the following changes were made: * Introduce harvester tracking in prospector to better control / manage the harvesters. The implementation is based on a harvester registry which starts and stops the harvesters * Use an outlet to send events from harvester to prospector. This outlet has an additional signal to have two options on when the outlet should be finished. Like this the outlet can be stopped by the harvester itself or globally through closing beatDone. * Introduce more done channels in prospector to make shutdown more fine grained * Add system tests to verify new behaviour Closes #3546 * review added (cherry picked from commit 15b32e4)

nacesprin · 2018-04-25T10:38:33Z

Filebeat 6.2.3
Error on /var/log/filebeat/filebeat

Unable to create runner due to error: Can only start a prospector when all related states are finished: {Id: Finished:false Fileinfo:0xc420196b60 Source:/var/log/nginx/access.log Offset:169527 Timestamp:2018-04-25 12:22:05.087086821 +0200 CEST m=+932.461573499 TTL:-1ns Type:log FileStateOS:39185-2048}

filebeat config:

- type: log

  enabled: true

  # Paths that should be crawled and fetched. Glob based paths.
  paths:
    - /var/log/*.log
    - /var/log/nginx/*.log
    - /var/log/elasticsearch/*.log
  tags: ["sistema"]

Only generate errors for all files under /var/log/nginx/*.log
If I remove /var/log/nginx/error.log, then get error with /var/log/nginx/access.log and so on

Why only fails under nginx logs?

Note: I've installed on all client servers and not errors found, but when filebeat is installed on the same ELK server, I got the errors mentioned.

ruflin added bug Filebeat Filebeat labels Feb 7, 2017

ruflin added a commit to ruflin/beats that referenced this issue Feb 7, 2017

Add known issue for filebeat

2bcbca3

See elastic#3546

ruflin mentioned this issue Feb 7, 2017

Add known issue for filebeat #3547

Merged

tsg pushed a commit that referenced this issue Feb 7, 2017

Add known issue for filebeat (#3547)

2c869e4

See #3546

ruflin mentioned this issue Feb 9, 2017

Fix harvester shutdown for prospector reloading #3563

Merged

urso closed this as completed in #3563 Feb 13, 2017

ruflin mentioned this issue Feb 14, 2017

Fix harvester shutdown for prospector reloading (#3563) #3584

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Known Issue: Prospector reloading unfinished files #3546

Known Issue: Prospector reloading unfinished files #3546

ruflin commented Feb 7, 2017 •

edited

Loading

nacesprin commented Apr 25, 2018

Known Issue: Prospector reloading unfinished files #3546

Known Issue: Prospector reloading unfinished files #3546

Comments

ruflin commented Feb 7, 2017 • edited Loading

Example Working

Example NOT Working

Technical Details

nacesprin commented Apr 25, 2018

ruflin commented Feb 7, 2017 •

edited

Loading