Skip to content
This repository has been archived by the owner on Aug 2, 2022. It is now read-only.

Rollover AD result index less frequently #168

Merged
merged 2 commits into from
Jun 22, 2020

Conversation

kaituo
Copy link
Member

@kaituo kaituo commented Jun 20, 2020

Issue #, if available:

Description of changes:
Currently, we roll over the result index every 30 days or every 300000 docs. Assuming each doc has 1 KB and our result index has five shards, each shard takes about 60 MB, which is too small. Small shards are against ES best practice. This PR increases the rollover threshold to 9000000 docs, which increases the max shard size to roughly 1.8 GB.

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

Currently, we roll over the result index every 30 days or every 300000 docs. Assuming each doc has 1 KB and our result index has five shards, each shard takes about 60 MB, which is too small. Small shards are against ES best practice. This PR increases the rollover threshold to 9000000 docs, which increases the max shard size to roughly 1.8 GB. 
@kaituo kaituo requested review from vamshin and ylwu-amzn June 20, 2020 00:18
Comment on lines 88 to 90
// Suppose generally per cluster has 200 detectors and all run with 1 minute interval.
// We will get 288,000 AD result docs. So set it as 300k to avoid multiple roll overs
// per day.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor. comments are outdated.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated

Copy link
Contributor

@ylwu-amzn ylwu-amzn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As Lai said, please update the comments.

@kaituo kaituo merged commit 50be929 into opendistro-for-elasticsearch:master Jun 22, 2020
// per day.
300 * 1000L,
9_000_000L,
Copy link
Member

@vamshin vamshin Jun 22, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this a problem for 200 detectors? From the comment 300k is taking care of 288k limit to avoid multiple roll overs right?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it is not a problem for 200 detectors. It is a problem of small shard. Please see my PR descriptions.

Copy link
Member

@vamshin vamshin Jun 22, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right i get that. Comment is misleading here. Why not mention the message in PR description as comment? We are talking about 200 detectors and roll overs in the comment. Atleast mention the PR?

One more thing you might want to confirm is does having larger index impact performance when compared to previous smaller index?

@yizheliu-amazon yizheliu-amazon added the enhancement New feature or request label Jun 25, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants