-
Notifications
You must be signed in to change notification settings - Fork 24.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add timing stats to publication process #76771
Changes from 1 commit
9bdb2c9
211c13e
eab60c3
bd8ef0c
8e7477a
9b0c5f3
be888f6
bd244dd
ab2fde0
bee970a
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -2077,6 +2077,157 @@ Number of incompatible differences between published cluster states. | |
(integer) | ||
Number of compatible differences between published cluster states. | ||
======= | ||
|
||
`master_timing`:: | ||
(object) | ||
Contains low-level statistics about how long various activities took while the | ||
node was the elected master. Omitted if the node is not master-eligible. Every | ||
field whose name ends in `_time` within this object is also represented as a | ||
raw number of milliseconds in a field whose name ends in `_time_millis`. The | ||
human-readable fields with a `_time` suffix are only returned if requested with | ||
the `?human=true` query parameter. | ||
+ | ||
.Properties of `master_timing` | ||
[%collapsible] | ||
======= | ||
`unchanged`:: | ||
(object) | ||
Contains statistics about cluster state update attempts that did not change the | ||
cluster state. | ||
+ | ||
.Properties of `unchanged` | ||
[%collapsible] | ||
======== | ||
`count`:: | ||
(long) | ||
The number of cluster state update attempts that did not change the cluster | ||
state since the node started. | ||
|
||
`computation_time`:: | ||
(<<time-units,time value>>) | ||
The cumulative amount of time spent computing no-op cluster state updates since | ||
the node started. | ||
|
||
`notification_time`:: | ||
(<<time-units,time value>>) | ||
The cumulative amount of time spent notifying listeners of a no-op cluster | ||
state update since the node started. | ||
|
||
======== | ||
|
||
`success`:: | ||
(object) | ||
Contains statistics about cluster state update attempts that did not change the | ||
cluster state. | ||
+ | ||
.Properties of `success` | ||
[%collapsible] | ||
======== | ||
`count`:: | ||
(long) | ||
The number of cluster state update attempts that successfully changed the | ||
cluster state since the node started. | ||
|
||
`computation_time`:: | ||
(<<time-units,time value>>) | ||
The cumulative amount of time spent computing cluster state updates that were | ||
ultimately successful since the node started. | ||
|
||
`publication_time`:: | ||
(<<time-units,time value>>) | ||
The cumulative amount of time spent publishing cluster state updates which | ||
ultimately succeeded, which includes everything from the end of the computation | ||
of the new cluster state until the publication has finished and the master node | ||
is ready to start processing the next state update. | ||
|
||
`context_construction_time`:: | ||
(<<time-units,time value>>) | ||
The cumulative amount of time spent constructing a _publication context_ since | ||
the node started for publications that ultimately succeeded. This statistic | ||
includes the time spent computing the difference between cluster states and | ||
preparing a serialized representation of the state. | ||
|
||
`commit_time`:: | ||
(<<time-units,time value>>) | ||
The cumulative amount of time spent waiting for a successful cluster state | ||
update to _commit_, which happens when a majority of the master nodes have | ||
written the state to disk. | ||
|
||
`completion_time`:: | ||
(<<time-units,time value>>) | ||
The cumulative amount of time spent waiting for a successful cluster state | ||
update to _complete_, which happens when all the nodes have applied the cluster | ||
state except for the elected master. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Perhaps clarify that these are from publication start, i.e., IIUC includes There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. sure, see bee970a. |
||
|
||
`master_apply_time`:: | ||
(<<time-units,time value>>) | ||
The cumulative amount of time spent successfully applying cluster state updates | ||
on the elected master since the node started. | ||
|
||
`notification_time`:: | ||
(<<time-units,time value>>) | ||
The cumulative amount of time spent notifying listeners of a successful cluster | ||
state update since the node started. | ||
|
||
======== | ||
|
||
`failure`:: | ||
(object) | ||
Contains statistics about cluster state update attempts that did not | ||
successfully change the cluster state, typically because a new master node was | ||
elected before completion. | ||
+ | ||
.Properties of `failure` | ||
[%collapsible] | ||
======== | ||
`count`:: | ||
(long) | ||
The number of cluster state update attempts that failed to change the cluster | ||
state since the node started. | ||
|
||
`computation_time`:: | ||
(<<time-units,time value>>) | ||
The cumulative amount of time spent computing cluster state updates that were | ||
ultimately unsuccessful since the node started. | ||
|
||
`publication_time`:: | ||
(<<time-units,time value>>) | ||
The cumulative amount of time spent publishing cluster state updates which | ||
ultimately failed, which includes everything from the end of the computation of | ||
the new cluster state until the publication has finished and the master node is | ||
ready to start processing the next state update. | ||
|
||
`context_construction_time`:: | ||
(<<time-units,time value>>) | ||
The cumulative amount of time spent constructing a _publication context_ since | ||
the node started for publications that ultimately failed. This statistic | ||
includes the time spent computing the difference between cluster states and | ||
preparing a serialized representation of the state. | ||
|
||
`commit_time`:: | ||
(<<time-units,time value>>) | ||
The cumulative amount of time spent waiting for an unsuccessful cluster state | ||
update to _commit_, which happens when a majority of the master nodes have | ||
written the state to disk. | ||
|
||
`completion_time`:: | ||
(<<time-units,time value>>) | ||
The cumulative amount of time spent waiting for an unsuccessful cluster state | ||
update to _complete_, which happens when all the nodes have applied the cluster | ||
state except for the elected master. | ||
|
||
`master_apply_time`:: | ||
(<<time-units,time value>>) | ||
The cumulative amount of time spent unsuccessfully applying cluster state | ||
updates on the elected master since the node started. | ||
|
||
`notification_time`:: | ||
(<<time-units,time value>>) | ||
The cumulative amount of time spent notifying listeners of a failed cluster | ||
state update since the node started. | ||
|
||
======== | ||
======= | ||
====== | ||
|
||
[[cluster-nodes-stats-api-response-body-ingest]] | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -40,3 +40,99 @@ | |
- gte: { nodes.$master.discovery.published_cluster_states.incompatible_diffs: 0 } | ||
- gte: { nodes.$master.discovery.published_cluster_states.compatible_diffs: 0 } | ||
- is_false: nodes.$master.roles | ||
|
||
--- | ||
"Master timing stats": | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Really just asserting that the stats really do come out of the API and that we do the right thing with |
||
- skip: | ||
features: [arbitrary_key] | ||
version: "- 7.99.99" | ||
reason: "master timing stats added in 8.0.0" | ||
|
||
- do: | ||
nodes.info: | ||
node_id: _master | ||
- set: | ||
nodes._arbitrary_key_: master | ||
|
||
- do: | ||
nodes.stats: | ||
metric: [ discovery ] | ||
|
||
- gte: { nodes.$master.discovery.master_timing.unchanged.count: 0 } | ||
- gte: { nodes.$master.discovery.master_timing.unchanged.computation_time_millis: 0 } | ||
- gte: { nodes.$master.discovery.master_timing.unchanged.notification_time_millis: 0 } | ||
- gte: { nodes.$master.discovery.master_timing.success.count: 0 } | ||
- gte: { nodes.$master.discovery.master_timing.success.computation_time_millis: 0 } | ||
- gte: { nodes.$master.discovery.master_timing.success.publication_time_millis: 0 } | ||
- gte: { nodes.$master.discovery.master_timing.success.context_construction_time_millis: 0 } | ||
- gte: { nodes.$master.discovery.master_timing.success.commit_time_millis: 0 } | ||
- gte: { nodes.$master.discovery.master_timing.success.completion_time_millis: 0 } | ||
- gte: { nodes.$master.discovery.master_timing.success.master_apply_time_millis: 0 } | ||
- gte: { nodes.$master.discovery.master_timing.success.notification_time_millis: 0 } | ||
- gte: { nodes.$master.discovery.master_timing.failure.count: 0 } | ||
- gte: { nodes.$master.discovery.master_timing.failure.computation_time_millis: 0 } | ||
- gte: { nodes.$master.discovery.master_timing.failure.publication_time_millis: 0 } | ||
- gte: { nodes.$master.discovery.master_timing.failure.context_construction_time_millis: 0 } | ||
- gte: { nodes.$master.discovery.master_timing.failure.commit_time_millis: 0 } | ||
- gte: { nodes.$master.discovery.master_timing.failure.completion_time_millis: 0 } | ||
- gte: { nodes.$master.discovery.master_timing.failure.master_apply_time_millis: 0 } | ||
- gte: { nodes.$master.discovery.master_timing.failure.notification_time_millis: 0 } | ||
|
||
- is_false: nodes.$master.discovery.master_timing.unchanged.computation_time | ||
- is_false: nodes.$master.discovery.master_timing.unchanged.notification_time | ||
- is_false: nodes.$master.discovery.master_timing.success.computation_time | ||
- is_false: nodes.$master.discovery.master_timing.success.publication_time | ||
- is_false: nodes.$master.discovery.master_timing.success.context_construction_time | ||
- is_false: nodes.$master.discovery.master_timing.success.commit_time | ||
- is_false: nodes.$master.discovery.master_timing.success.completion_time | ||
- is_false: nodes.$master.discovery.master_timing.success.master_apply_time | ||
- is_false: nodes.$master.discovery.master_timing.success.notification_time | ||
- is_false: nodes.$master.discovery.master_timing.failure.computation_time | ||
- is_false: nodes.$master.discovery.master_timing.failure.publication_time | ||
- is_false: nodes.$master.discovery.master_timing.failure.context_construction_time | ||
- is_false: nodes.$master.discovery.master_timing.failure.commit_time | ||
- is_false: nodes.$master.discovery.master_timing.failure.completion_time | ||
- is_false: nodes.$master.discovery.master_timing.failure.master_apply_time | ||
- is_false: nodes.$master.discovery.master_timing.failure.notification_time | ||
|
||
- do: | ||
nodes.stats: | ||
metric: [ discovery ] | ||
human: true | ||
|
||
- gte: { nodes.$master.discovery.master_timing.unchanged.count: 0 } | ||
- gte: { nodes.$master.discovery.master_timing.unchanged.computation_time_millis: 0 } | ||
- gte: { nodes.$master.discovery.master_timing.unchanged.notification_time_millis: 0 } | ||
- gte: { nodes.$master.discovery.master_timing.success.count: 0 } | ||
- gte: { nodes.$master.discovery.master_timing.success.computation_time_millis: 0 } | ||
- gte: { nodes.$master.discovery.master_timing.success.publication_time_millis: 0 } | ||
- gte: { nodes.$master.discovery.master_timing.success.context_construction_time_millis: 0 } | ||
- gte: { nodes.$master.discovery.master_timing.success.commit_time_millis: 0 } | ||
- gte: { nodes.$master.discovery.master_timing.success.completion_time_millis: 0 } | ||
- gte: { nodes.$master.discovery.master_timing.success.master_apply_time_millis: 0 } | ||
- gte: { nodes.$master.discovery.master_timing.success.notification_time_millis: 0 } | ||
- gte: { nodes.$master.discovery.master_timing.failure.count: 0 } | ||
- gte: { nodes.$master.discovery.master_timing.failure.computation_time_millis: 0 } | ||
- gte: { nodes.$master.discovery.master_timing.failure.publication_time_millis: 0 } | ||
- gte: { nodes.$master.discovery.master_timing.failure.context_construction_time_millis: 0 } | ||
- gte: { nodes.$master.discovery.master_timing.failure.commit_time_millis: 0 } | ||
- gte: { nodes.$master.discovery.master_timing.failure.completion_time_millis: 0 } | ||
- gte: { nodes.$master.discovery.master_timing.failure.master_apply_time_millis: 0 } | ||
- gte: { nodes.$master.discovery.master_timing.failure.notification_time_millis: 0 } | ||
|
||
- is_true: nodes.$master.discovery.master_timing.unchanged.computation_time | ||
- is_true: nodes.$master.discovery.master_timing.unchanged.notification_time | ||
- is_true: nodes.$master.discovery.master_timing.success.computation_time | ||
- is_true: nodes.$master.discovery.master_timing.success.publication_time | ||
- is_true: nodes.$master.discovery.master_timing.success.context_construction_time | ||
- is_true: nodes.$master.discovery.master_timing.success.commit_time | ||
- is_true: nodes.$master.discovery.master_timing.success.completion_time | ||
- is_true: nodes.$master.discovery.master_timing.success.master_apply_time | ||
- is_true: nodes.$master.discovery.master_timing.success.notification_time | ||
- is_true: nodes.$master.discovery.master_timing.failure.computation_time | ||
- is_true: nodes.$master.discovery.master_timing.failure.publication_time | ||
- is_true: nodes.$master.discovery.master_timing.failure.context_construction_time | ||
- is_true: nodes.$master.discovery.master_timing.failure.commit_time | ||
- is_true: nodes.$master.discovery.master_timing.failure.completion_time | ||
- is_true: nodes.$master.discovery.master_timing.failure.master_apply_time | ||
- is_true: nodes.$master.discovery.master_timing.failure.notification_time |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think these are update attempts that did change the cluster state successfully?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Bah yes. Copy/paste might have been involved a bit here.