Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Migration to new state structure #1703

Merged
merged 3 commits into from
May 24, 2016
Merged

Conversation

ruflin
Copy link
Member

@ruflin ruflin commented May 23, 2016

The previous state list was dependent on a file path. This had the potential to lead to overwrite and conflicts on file rotation. As the file path is also stored inside the state object this information was duplicated. Now a pure array is stored in the registry which makes the format more flexibel and brings it close to the format used by Logstash.

Changes:

  • Not storing an index with paths as this leaded to duplicates
  • Introduction of last_seen to differentiate between entries with same bath which show up multiple times. This introduces also the base to clean up the registry file. Currently it is only used in the prospector, no registry yet.
  • Registry can now contain multiple entries for a path
  • Refactor registrar to use the same state array as prospector does
  • Introduce migration check to migrate from old to new state. Make it backward compatible by checking for old structure on startup
  • Decouple registrar from prospector
  • Update tests to follow new model

@ruflin ruflin added in progress Pull request is currently in progress. Filebeat Filebeat labels May 23, 2016
@ruflin ruflin force-pushed the mb-registry-array branch from bac63c4 to 3c4602f Compare May 23, 2016 14:30
@ruflin ruflin added review and removed in progress Pull request is currently in progress. labels May 23, 2016
spoolerChan: spoolerChan,
harvesterChan: make(chan *input.FileEvent),
done: make(chan struct{}),
states: input.NewStates(),
states: &states,
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the states parameter given to NewProspector is a copy of original one, but here we allocate the copied state onto heap. You want input.States to be shared or to be a copy in here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should be a copy as from here on, each prospector treats the state independent.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

states is defined like this:

 // States handles list of FileState
 type States struct {
    states []FileState
    mutex  sync.Mutex
 }

The copy of 'states' is mostly shallow. Some code checkers (newer go vet should) will complain about copy on mutex. The underlying byte-buffer is shared when slice is copied. The (*States).Update method can override a file its state, if file is member of slice. Due to mutex being copied, any 2 go-routines attempting to updates a file state for the same file might actually race.

@ruflin ruflin force-pushed the mb-registry-array branch from 3c4602f to 24ca6c0 Compare May 23, 2016 20:03
@@ -55,6 +55,7 @@ https://github.com/elastic/beats/compare/v5.0.0-alpha2...master[Check the HEAD d
*Topbeat*

*Filebeat*
- The registry format was changed to an array instead of dict. The migration to the new format will happen automatically at the first startup. {pull}1703[1703]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+++ for doing the migration automatically!

@ruflin ruflin force-pushed the mb-registry-array branch from 24ca6c0 to 8f32e3f Compare May 24, 2016 06:55
The previous state list was dependent on a file path. This had the potential to lead to overwrite and conflicts on file rotation. As the file path is also stored inside the state object this information was duplicated. Now a pure array is stored in the registry which makes the format more flexibel and brings it close to the format used by Logstash.

Changes:
* Not storing an index with paths as this leaded to duplicates
* Introduction of last_seen to differentiate between entries with same bath which show up multiple times. This introduces also the base to clean up the registry file. Currently it is only used in the prospector, no registry yet.
* Registry can now contain multiple entries for a path
* Refactor registrar to use the same state array as prospector does
* Introduce migration check to migrate from old to new state. Make it backward compatible by checking for old structure on startup
* Decouple registrar from prospector
* Update tests to follow new model
@ruflin ruflin force-pushed the mb-registry-array branch from 8f32e3f to c824301 Compare May 24, 2016 07:58
@urso
Copy link

urso commented May 24, 2016

LGTM

@urso urso merged commit 60d9e0a into elastic:master May 24, 2016
@ruflin ruflin deleted the mb-registry-array branch May 26, 2016 07:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants