Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement prospector reloading #3362

Merged
merged 1 commit into from
Jan 23, 2017
Merged

Conversation

ruflin
Copy link
Member

@ruflin ruflin commented Jan 13, 2017

This PR allows to dynamically reload prospectors. It works the same way as module reloading in metricbeat.

Refactoring

  • LoadStates was separated from NewProspector. The reason is that after New only the ID is needed and setting up states requires more calculations. So this can be done in a second step when all the validations are done.
  • Only allow to start a prospector when all states are set to Finished. If not, LoadStates returns an error. This is to prevent a prospector starting before a harvester finished with a file. The prospector will be picked up again during the next reloading phase.
  • Extract ReloadConfig to libbeat

Limitations

This implementation currently has the some limitations. This are not new in filebeat but require more care as configurations change more often.

  • Two prospectors on one file: It is possible, that two prospectors pick up one file because they defined overlapping patterns. This can have the consequence that two harvesters on the same file are running which can lead to duplicates and unpredictable behaviour. The risk is minimized in that a prospector does not start as long as a state it takes care of is not finished. But it can still happen that a Finished state is picked up but it also managed by an other prospector. The user must ensure no prospector paths overlap. This problem can potentially be solved in the future with a global harvester registry.

Notes

  • In a later PR, more refactoring and unification of the reloading should happen.

@ruflin ruflin added Filebeat Filebeat in progress Pull request is currently in progress. review labels Jan 13, 2017
@ruflin ruflin force-pushed the prospector-reloading branch 7 times, most recently from 24b5023 to d156ea3 Compare January 16, 2017 13:25
@ruflin ruflin requested a review from urso January 16, 2017 13:31
@ruflin ruflin removed the in progress Pull request is currently in progress. label Jan 16, 2017
@ruflin ruflin force-pushed the prospector-reloading branch 2 times, most recently from 964a09e to 1ba7710 Compare January 17, 2017 08:14

[source,yaml]
------------------------------------------------------------------------------
filebeat.reload.modules:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, this might be confusing. It's called reload.modules but reloads prospectors config, not module configs.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is actually reload.prospectors not modules. Copy / paste error from metricbeat ...

go func(pr *Prospector) {
defer wg.Done()
logp.Debug("reload", "stopping prospector: %v", pr.ID)
// TODO: Stop prospectors in parallel
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't this already the case since a goroutine is started for each prospector?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, the comment was a left over.

configReloads = expvar.NewInt("filebeat.config.reloads")
prospectorStarts = expvar.NewInt("filebeat.config.prospector.starts")
prospectorStops = expvar.NewInt("filebeat.config.prospector.stops")
prospectorRunning = expvar.NewInt("filebeat.config.prospector.running")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems to only count dynamically configured prospectors, right? We should reflect that in the name.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I changed it to filebeat.config.prospector.dyamic.*

case <-time.After(r.config.Period):

debugr("Scan for new config files")
configReloads.Add(1)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this is incremented at each period regardless of whether any file has changed, the expvar name might be a little confusing.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be easy to just move it after !updated and then it's more useful, right?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I moved it after !updated


path := r.config.Path
if !filepath.IsAbs(path) {
path = filepath.Join(cfgfile.GetPathConfig(), path)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Better use paths.Resolve(paths.Config, path).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I see that GetPathConfig, which was recently introduced, duplicates the logic of paths.Resolve. Was there a reason for that? I'd prefer using paths.Resolve everywhere.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

GetPathConfig was around for quite some time I think, or at least the logic around it. I was not aware of paths.Resolve that is why I used GetPathConfig. I changed it now in the reloaders.

@ruflin
Copy link
Member Author

ruflin commented Jan 19, 2017

@tsg New version pushed with reviews implemented.

@ruflin ruflin force-pushed the prospector-reloading branch from 7122a59 to 0e0bcf0 Compare January 20, 2017 09:53
Copy link
Contributor

@tsg tsg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Member

@andrewkroh andrewkroh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we deprecate config_dir now that we have this feature?

return nil, errors.New("No prospectors defined. What files do you want me to watch?")
}

if *once && config.ProspectorReload.Enabled() {
return nil, errors.New("Once and prospector reloading cannot be used together.")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider changing s/Once/-once/ and/or swapping it around like "prospector reloading and -once cannot be used together".

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I swapped the two.

if reloaderConfig.Enabled() {
logp.Warn("EXPERIMENTAL feature dynamic configuration reloading is enabled.")

go func() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How does this goroutine stop?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good you spotted that. It's missing :-( Will add it.

experimental[]

Reload configuration allows to dynamically reload prospector configuration files. A glob can be defined which should be watched
for prospector configuration changes. New prospectors will started / stopped accordingly. This is especially useful in
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"will be started"

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

changed


A path with a glob must be defined on which files should be checked for changes. A period is set on how often
the files are checked for changes. Do not set period below 1s as the modification time of files is often stored in seconds.
Setting it below 1s will have an unnecessary overhead.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/will have/will cause/

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

changed

@ruflin
Copy link
Member Author

ruflin commented Jan 20, 2017

@andrewkroh I pushed a new commit with the changes. About deprecating config_dir. I think we should definitively deprecate it in filebeat and reintroduce it in all beats, especially as config_dir requires different config files. The part I'm not sure about is if we should mix reload and config dir. From a technical point of view it is almost the same, but configuration wise I'm not sure if we should mix the two. Someone using config_dir should not require to enable reloading.

I will open a seperate PR to deprecate config_dir with a note that it will be replace with something "better" :-)

logp.Warn("EXPERIMENTAL feature dynamic configuration reloading is enabled.")

go func() {
c.reloader = prospector.NewProspectorReloader(reloaderConfig, c.out, r)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think anything bad will happen here, but doing the c.reloader assignment outside of the goroutine would be less prone to race conditions.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agreed, then code can become:

c.reloader = ...
go c.reloader.Run()

@ruflin
Copy link
Member Author

ruflin commented Jan 20, 2017

@andrewkroh Deprecation of config_dir: #3429

@ruflin ruflin force-pushed the prospector-reloading branch from f1f94c7 to 8846cca Compare January 20, 2017 15:28
logp.Warn("EXPERIMENTAL feature dynamic configuration reloading is enabled.")

go func() {
c.reloader = prospector.NewProspectorReloader(reloaderConfig, c.out, r)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agreed, then code can become:

c.reloader = ...
go c.reloader.Run()


if c.reloader != nil {
c.wg.Add(1)
go c.reloader.Stop()
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmmm... where do you call c.wg.Done

asyncWaitStop := func(stop func()) {
    c.wg.Add(1)
    go func() {
      defer c.wg.Done()
      stop()
  }
}

...
for _, p := range c.prospectors {
    asyncWaitStop(p.Stop)
}
if c.reloader != nil {
    asyncWaitStop(c.reloader.Stop)
}

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

c.wg.Done is called on line 50. Current implementation works as expected.

Copy link

@urso urso Jan 23, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh... I'm not really ok with calling it on line 50. When using waitGroups one must/should ensure Add is always run before Done, so the internal semaphore(counter) can not become negative at any time. What if Run returns early (even if it doesn't). With Done being called on line 50, let's move the add right before that line.

With Add/Done being somewhat symmetric, usage should clearly follow the simple pattern of: always increase right before starting the go-routine which is going to call Done. This leaves less room for errors and makes usage more clear for readers (we've already got quite a many wait groups and done channels in here).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree. Changed.

This PR allows to dynamically reload prospectors. It works the same way as module reloading in metricbeat.

**Refactoring**

* LoadStates was separated from NewProspector. The reason is that after New only the ID is needed and setting up states requires more calculations. So this can be done in a second step when all the validations are done.
* Only allow to start a prospector when all states are set to Finished. If not, LoadStates returns an error. This is to prevent a prospector starting before a harvester finished with a file. The prospector will be picked up again during the next reloading phase.
* Extract ReloadConfig to libbeat

**Limitations**

This implementation currently has the some limitations. This are not new in filebeat but require more care as configurations change more often.

* Two prospectors on one file: It is possible, that two prospectors pick up one file because they defined overlapping patterns. This can have the consequence that two harvesters on the same file are running which can lead to duplicates and unpredictable behaviour. The risk is minimized in that a prospector does not start as long as a state it takes care of is not finished. But it can still happen that a Finished state is picked up but it also managed by an other prospector. The user must ensure no prospector paths overlap. This problem can potentially be solved in the future with a global harvester registry.

**Notes**

* In a later PR, more refactoring and unification of the reloading should happen.
@ruflin ruflin force-pushed the prospector-reloading branch from e8ae484 to 88a0ff1 Compare January 23, 2017 12:48
@urso urso merged commit 240bbd2 into elastic:master Jan 23, 2017
@ruflin ruflin deleted the prospector-reloading branch January 24, 2017 07:05
@superwhykz
Copy link

Thanks so much guys for pushing this PR! Quick question we heavily use "config_dir" option. Does it mean if we have that option enabled reload will not work?

@ruflin
Copy link
Member Author

ruflin commented Jan 25, 2017

@superwhykz No, currently config_dir and the reloading can be enabled together, you should just ensure that the directories they are fetching config files from are not overlapping. Let us know if you hit any issues with the reloading.

@superwhykz
Copy link

@ruflin awesome! this feature should be in night build by now right?

@ruflin
Copy link
Member Author

ruflin commented Jan 26, 2017

@superwhykz yes

ruflin added a commit to ruflin/beats that referenced this pull request Jan 31, 2017
The plan is replace this with a better implementation in the future similar to prospector reloading. See elastic#3362.
@jasonkylemorse
Copy link

In what version are you guys targeting this feature to be available? Any idea on a timeframe for that? Thanks!

@ruflin
Copy link
Member Author

ruflin commented Mar 10, 2017

@jasonkylemorse It is available in 5.3 as an experimental feature. 5.3 should be coming out in the next weeks. If you want to test it already, you can also use the nightly builds: https://beats-nightlies.s3.amazonaws.com/index.html?prefix=filebeat/

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants