Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Lower the default number of primary shards #3431

Closed
andrewkroh opened this issue Jan 20, 2017 · 5 comments
Closed

Lower the default number of primary shards #3431

andrewkroh opened this issue Jan 20, 2017 · 5 comments
Labels
discuss Issue needs further discussion.

Comments

@andrewkroh
Copy link
Member

We should consider lowering the default number of shards created by the Beats' daily indices. Each index uses the Elasticsearch default of 5 primaries with 1 replica (5p1r).

I think it would improve our out of the box experience for new users to reduce the level of sharding. In my experience with only a few hosts reporting data, you end up with lots of tiny shards (indices with hundreds of MBs).

Each shard can easily handle many GB of data. So we could lower the sharding values in our index templates, and more experienced users can tune these numbers as they scale up.

We have 5 Beats now (FB, PB, MB, WLB, HB). Here are some sample numbers:

Num of Beats Sharding Strategy Days Total Shards
3 5p1r* 30 900
5 5p1r* 30 1500
3 3p1r 30 540
5 3p1r 30 900
3 2p1r 30 360
5 2p1r 30 600
3 1p1r 30 180
5 1p1r 30 300

* default from Elasticsearch

Keeping more than 1p will allow users to take advantage of a multi-node cluster. I would go with 2p1r as it the smallest number that can take advantage of distributing the ingest load across nodes.

@andrewkroh andrewkroh added the discuss Issue needs further discussion. label Jan 20, 2017
@ruflin
Copy link
Contributor

ruflin commented Feb 17, 2017

+1 on changing the default number of shards. I would also suggest to have 2 or 3 by default. But we should make it possible that this default can be changed in the filebeat config. As soon as we have #3603 we can generate a new template as soon as a change is made. The challenge we then have is what happens, if 2 filebeat instances for example have a different number of shards configured?

@andrewkroh
Copy link
Member Author

+1 On making it easily configurable to the user.

I think conflicting shard values would be treated the same as other conflicting template values. AFAIK this means last one to write it's template to ES wins? Do we have any other strategy?

@ruflin
Copy link
Contributor

ruflin commented Feb 23, 2017

If we don't overwrite templates, first one wins :-) I think that is fine and is an edge we do not have to worry about at the moment.

@ruflin
Copy link
Contributor

ruflin commented Feb 28, 2017

This PR is good news for us as the "soft limit" of 500 shards will be soonish increased and searching across lots of shards should become more efficient: elastic/elasticsearch#23253 (comment) I still think the default should be a lower number then now.

@tsg
Copy link
Contributor

tsg commented Sep 5, 2017

Closing in favour of #5095.

@tsg tsg closed this as completed Sep 5, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
discuss Issue needs further discussion.
Projects
None yet
Development

No branches or pull requests

3 participants