Rollover AD result index less frequently #168

kaituo · 2020-06-20T00:18:41Z

Issue #, if available:

Description of changes:
Currently, we roll over the result index every 30 days or every 300000 docs. Assuming each doc has 1 KB and our result index has five shards, each shard takes about 60 MB, which is too small. Small shards are against ES best practice. This PR increases the rollover threshold to 9000000 docs, which increases the max shard size to roughly 1.8 GB.

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

Currently, we roll over the result index every 30 days or every 300000 docs. Assuming each doc has 1 KB and our result index has five shards, each shard takes about 60 MB, which is too small. Small shards are against ES best practice. This PR increases the rollover threshold to 9000000 docs, which increases the max shard size to roughly 1.8 GB.

wnbts · 2020-06-22T16:52:42Z

src/main/java/com/amazon/opendistroforelasticsearch/ad/settings/AnomalyDetectorSettings.java

            // Suppose generally per cluster has 200 detectors and all run with 1 minute interval.
            // We will get 288,000 AD result docs. So set it as 300k to avoid multiple roll overs
            // per day.


minor. comments are outdated.

ylwu-amzn

As Lai said, please update the comments.

vamshin · 2020-06-22T18:48:04Z

src/main/java/com/amazon/opendistroforelasticsearch/ad/settings/AnomalyDetectorSettings.java

            // per day.
-            300 * 1000L,
+            9_000_000L,


Is this a problem for 200 detectors? From the comment 300k is taking care of 288k limit to avoid multiple roll overs right?

it is not a problem for 200 detectors. It is a problem of small shard. Please see my PR descriptions.

Right i get that. Comment is misleading here. Why not mention the message in PR description as comment? We are talking about 200 detectors and roll overs in the comment. Atleast mention the PR?

One more thing you might want to confirm is does having larger index impact performance when compared to previous smaller index?

kaituo requested review from vamshin and ylwu-amzn June 20, 2020 00:18

wnbts approved these changes Jun 22, 2020

View reviewed changes

ylwu-amzn approved these changes Jun 22, 2020

View reviewed changes

Update comments

ac3bc59

kaituo merged commit 50be929 into opendistro-for-elasticsearch:master Jun 22, 2020

vamshin reviewed Jun 22, 2020

View reviewed changes

yizheliu-amazon added the enhancement New feature or request label Jun 25, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Rollover AD result index less frequently #168

Rollover AD result index less frequently #168

kaituo commented Jun 20, 2020

wnbts Jun 22, 2020

kaituo Jun 22, 2020

ylwu-amzn left a comment

vamshin Jun 22, 2020 •

edited

Loading

kaituo Jun 22, 2020

vamshin Jun 22, 2020 •

edited

Loading

Rollover AD result index less frequently #168

Rollover AD result index less frequently #168

Conversation

kaituo commented Jun 20, 2020

wnbts Jun 22, 2020

Choose a reason for hiding this comment

kaituo Jun 22, 2020

Choose a reason for hiding this comment

ylwu-amzn left a comment

Choose a reason for hiding this comment

vamshin Jun 22, 2020 • edited Loading

Choose a reason for hiding this comment

kaituo Jun 22, 2020

Choose a reason for hiding this comment

vamshin Jun 22, 2020 • edited Loading

Choose a reason for hiding this comment

vamshin Jun 22, 2020 •

edited

Loading

vamshin Jun 22, 2020 •

edited

Loading