-
Notifications
You must be signed in to change notification settings - Fork 49
Publish shard state metrics #212
Publish shard state metrics #212
Conversation
.../com/amazon/opendistro/elasticsearch/performanceanalyzer/collectors/ShardStateCollector.java
Outdated
Show resolved
Hide resolved
} | ||
value | ||
.append(new ShardStateMetrics( | ||
shard.getIndexName(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Currently the format in which data is emitted is flat, which includes the node name and index name which happen to be user provided strings. Maybe we should add a node level filter as a predicate so that we get information about shards only on this node and not store the node Name (as it is implicit) and make it hierarchical so that no mater the number of shards, we mention index only one time. So something like this:
// 0 is initializing
// 1 is unassigned
// we are skipping reporting the active shards
{
"unassigned": {
"index1": [0, 8, ...],
"index10": [0]
},
"initializing": {
"index99": [0, 8, ...],
"index1": [7]
}
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed it to indexName at the parent level. So Sample metric would look like
Sample event :
^shard_state_metrics
{"current_time":1600677426860}
{IndexName:"pmc"}
{"ShardID":2,"ShardType":"primary","NodeName":"elasticsearch2","Shard_State":"Unassigned"}
{"ShardID":2,"ShardType":"primary","NodeName":"elasticsearch2","Shard_State:"Initializing"}
{IndexName:"pmc1"}
{"ShardID":2,"ShardType":"primary","NodeName":"elasticsearch2","Shard_State":"Unassigned"}
.../amazon/opendistro/elasticsearch/performanceanalyzer/collectors/ShardStateCollectorTest.java
Outdated
Show resolved
Hide resolved
.../com/amazon/opendistro/elasticsearch/performanceanalyzer/collectors/ShardStateCollector.java
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please also expand on the PR description with tests that we have done for this change ?
This PR can add some delays on the collector threads and we we would like to know how ES takes it, when we call their APIs quite frequently.
.../amazon/opendistro/elasticsearch/performanceanalyzer/collectors/ShardStateCollectorTest.java
Outdated
Show resolved
Hide resolved
.../com/amazon/opendistro/elasticsearch/performanceanalyzer/collectors/ShardStateCollector.java
Show resolved
Hide resolved
} | ||
|
||
@Override | ||
void collectMetrics( long startTime) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We also want to track how long we spend inside the collectMetrics
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The change is done for all the collectors in https://github.com/opendistro-for-elasticsearch/performance-analyzer-rca/pull/436/files#diff-115bbe9019d03c22dc720eb9fbc6670f354808d16bebb1eb6933dd468ce470af
Syncing from upstream
if(inActiveShard) { | ||
saveMetricValues(value.toString(), startTime); | ||
} | ||
PerformanceAnalyzerApp.ERRORS_AND_EXCEPTIONS_AGGREGATOR.updateStat( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not error, so should not be part of ERRORS_AND_EXCEPTIONS_AGGREGATOR
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can refer to this PR.
Please merge the RCA changes first since the PA build depends on the RCA build. |
Fixes #, if available: #213
Description of changes:
This change will start publishing shard state - Initializing, Unassigned and Relocating for each shard. We have not published Active shards to save space.
Testing
Tmp File -
Table created
Schema of the table
Content of the table
Rest API
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.