Skip to content
This repository has been archived by the owner on Aug 2, 2022. It is now read-only.

Publish shard state metrics #212

Merged
merged 10 commits into from
Oct 22, 2020
Merged

Publish shard state metrics #212

merged 10 commits into from
Oct 22, 2020

Conversation

amathur1893
Copy link
Contributor

@amathur1893 amathur1893 commented Sep 25, 2020

Fixes #, if available: #213

Description of changes:
This change will start publishing shard state - Initializing, Unassigned and Relocating for each shard. We have not published Active shards to save space.

Testing

  1. ./gradlew test
SUCCESS: Executed 455 tests in 41m 55s (1 skipped)

Deprecated Gradle features were used in this build, making it incompatible with Gradle 7.0.
Use '--warning-mode all' to show the individual deprecation warnings.
See https://docs.gradle.org/6.5/userguide/command_line_interface.html#sec:command_line_warnings

BUILD SUCCESSFUL in 42m 22s
11 actionable tasks: 2 executed, 9 up-to-date
  1. Tested using docker.

Tmp File -

^shard_state_metrics
{"current_time":1602500493586}
{"IndexName":pmc}
{"ShardID":2,"ShardType":"p","NodeName":"elasticsearch2","Shard_State":"UNASSIGNED"}
{"ShardID":4,"ShardType":"p","NodeName":"elasticsearch2","Shard_State":"INITIALIZING"}
{"IndexName":pmc1}
{"ShardID":1,"ShardType":"p","NodeName":"elasticsearch1","Shard_State":"UNASSIGNED"}
{"ShardID":0,"ShardType":"p","NodeName":"elasticsearch2","Shard_State":"UNASSIGNED"}
$

Table created

sqlite> .tables
Shard_State

Schema of the table

sqlite> .schema Shard_State
CREATE TABLE Shard_State(IndexName varchar null, ShardID varchar null, ShardType varchar null, NodeName varchar null, Shard_State varchar null, sum double null, avg double null, min double null, max double null);

Content of the table

sqlite> select * from Shard_State;
pmc|4|p|elasticsearch1|UNASSIGNED|1.0|1.0|1.0|1.0
pmc|2|p|elasticsearch2|INITIALIZING|1.0|1.0|1.0|1.0
pmc1|1|p|elasticsearch2|UNASSIGNED|1.0|1.0|1.0|1.0
pmc1|0|p|elasticsearch1|UNASSIGNED|1.0|1.0|1.0|1.0

Rest API

arpitamt@c4b301d5bc1f ~> curl -XGET "localhost:9600/_opendistro/_performanceanalyzer/metrics?metrics=Shard_State&agg=avg&dim=Shard_State&nodes=all" | python -m json.tool
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100   188  100   188    0     0     22      0  0:00:08  0:00:08 --:--:--    45
{
    "id3qQArxRPSA2EHbEfPlew": {
        "timestamp": 1602771645000,
        "data": {
            "fields": [
                {
                    "name": "Shard_State",
                    "type": "VARCHAR"
                },
                {
                    "name": "Shard_State",
                    "type": "DOUBLE"
                }
            ],
            "records": [
                [
                    "UNASSIGNED",
                    0.0
                ]
            ]
        }
    }
}

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

}
value
.append(new ShardStateMetrics(
shard.getIndexName(),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently the format in which data is emitted is flat, which includes the node name and index name which happen to be user provided strings. Maybe we should add a node level filter as a predicate so that we get information about shards only on this node and not store the node Name (as it is implicit) and make it hierarchical so that no mater the number of shards, we mention index only one time. So something like this:
// 0 is initializing
// 1 is unassigned
// we are skipping reporting the active shards
{
"unassigned": {
"index1": [0, 8, ...],
"index10": [0]
},
"initializing": {
"index99": [0, 8, ...],
"index1": [7]
}
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed it to indexName at the parent level. So Sample metric would look like
Sample event :
^shard_state_metrics
{"current_time":1600677426860}
{IndexName:"pmc"}
{"ShardID":2,"ShardType":"primary","NodeName":"elasticsearch2","Shard_State":"Unassigned"}
{"ShardID":2,"ShardType":"primary","NodeName":"elasticsearch2","Shard_State:"Initializing"}
{IndexName:"pmc1"}
{"ShardID":2,"ShardType":"primary","NodeName":"elasticsearch2","Shard_State":"Unassigned"}

@yojs yojs requested a review from vigyasharma October 1, 2020 18:35
Copy link
Contributor

@yojs yojs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please also expand on the PR description with tests that we have done for this change ?

This PR can add some delays on the collector threads and we we would like to know how ES takes it, when we call their APIs quite frequently.

@yojs yojs requested a review from rguo-aws October 9, 2020 17:52
yojs
yojs previously approved these changes Oct 16, 2020
}

@Override
void collectMetrics( long startTime) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We also want to track how long we spend inside the collectMetrics

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@yojs yojs requested review from khushbr and adityaj1107 and removed request for vigyasharma and rguo-aws October 19, 2020 17:25
if(inActiveShard) {
saveMetricValues(value.toString(), startTime);
}
PerformanceAnalyzerApp.ERRORS_AND_EXCEPTIONS_AGGREGATOR.updateStat(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not error, so should not be part of ERRORS_AND_EXCEPTIONS_AGGREGATOR

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can refer to this PR.

yojs
yojs previously approved these changes Oct 19, 2020
adityaj1107
adityaj1107 previously approved these changes Oct 19, 2020
@adityaj1107
Copy link
Contributor

Please merge the RCA changes first since the PA build depends on the RCA build.

@yojs yojs merged commit b5ce3b0 into opendistro-for-elasticsearch:master Oct 22, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants