-
Notifications
You must be signed in to change notification settings - Fork 24.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add timing stats to publication process #76771
Add timing stats to publication process #76771
Conversation
This commit introduces into the node stats API various statistics to track the time that the elected master spends in various phases of the cluster state publication process. Relates elastic#76625
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The changes to Coordinator
and MasterService
are pretty much the only content here, everything else is just plumbing and other boilerplate. Adding 19 statistics in one go generates a lot of noise.
@@ -40,3 +40,99 @@ | |||
- gte: { nodes.$master.discovery.published_cluster_states.incompatible_diffs: 0 } | |||
- gte: { nodes.$master.discovery.published_cluster_states.compatible_diffs: 0 } | |||
- is_false: nodes.$master.roles | |||
|
|||
--- | |||
"Master timing stats": |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Really just asserting that the stats really do come out of the API and that we do the right thing with ?human
.
return ifSet(masterApplyElapsedMillis); | ||
} | ||
|
||
private static long ifSet(long millis) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Used by the maybeGet
functions for failed-publication stats since the failure could happen at any stage.
@@ -881,4 +900,91 @@ public void onNonemptyQueue() { | |||
} | |||
} | |||
|
|||
private static class TimingStatisticsTracker { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All methods synchronized
(but very fast) just to avoid getting partially-updated stats.
Pinging @elastic/es-distributed (Team:Distributed) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This generally looks good, thanks for working on this @DaveCTurner! My one concern however is using the CachedTimeThread
for calculating the times rather than raw System.nanoTime()
. I think if we have a bunch of sub-200ms operations they'll appear to have completed instantaneously while the count is increasing, unless I am mistaken in the implementation. This inaccuracy seems like it will be very confusing for a user (and not to mention trying to graph/track stats like these).
|
||
`success`:: | ||
(object) | ||
Contains statistics about cluster state update attempts that did not change the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think these are update attempts that did change the cluster state successfully?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Bah yes. Copy/paste might have been involved a bit here.
@@ -14,14 +14,30 @@ | |||
*/ | |||
public class ClusterStatePublicationEvent { | |||
|
|||
private static final long NOT_SET = -1L; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Out of curiosity, why not just use 0 for NOT_SET
instead of having a separate flag for it, especially since when retrieving the stats we adjust NOT_SET back to 0 anyway?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Really just to support the assertion that these things are set once and only once on each event. I added a comment in bd8ef0c.
final PublicationTransportHandler.PublicationContext publicationContext = | ||
publicationHandler.newPublicationContext(clusterStatePublicationEvent); | ||
clusterStatePublicationEvent.setPublicationContextConstructionElapsedMillis( | ||
transportService.getThreadPool().relativeTimeInMillis() - publicationContextConstructionStartMillis); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a concern that if this is < 200ms that it won't be counted with the use of the CachedTimeThread
that only updates the clock every 200ms? Should this use "real" time instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My thinking was that if these things are only taking 200ms or so then we probably don't care, or more precisely we only care if they're happening with high frequency and in the high frequency case we will pick up the occasional 200ms clock tick with a probability that matches the mean duration, so on balance it'll all work out.
Also it's a pain to write tests that assert things about time when using System::nanoTime
. I can fix up the assertions that say that at least T
ms elapsed, but I'd just have to drop the ones with an upper bound. Maybe it's not that bad, the only one that I can't easily add an artificial delay to is the publication context construction.
I'll try out some ideas next week. Do you think we should accumulate the timers in nanoseconds too? Long.MAX_VALUE
nanoseconds is ~300 years, I don't think there's much risk of overflow.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok in 9b0c5f3 I added a method to expose the raw (uncached) relative time to ThreadPool
which we can still override to solve the testing pain, but which has better granularity. I stuck with milliseconds.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
I would also prefer a more granular time measurement.
`commit_time`:: | ||
(<<time-units,time value>>) | ||
The cumulative amount of time spent waiting for a successful cluster state | ||
update to _commit_, which happens when a majority of the master nodes have | ||
written the state to disk. | ||
|
||
`completion_time`:: | ||
(<<time-units,time value>>) | ||
The cumulative amount of time spent waiting for a successful cluster state | ||
update to _complete_, which happens when all the nodes have applied the cluster | ||
state except for the elected master. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps clarify that these are from publication start, i.e., IIUC includes publication_time
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sure, see bee970a.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks for using the raw time!
Thanks both! |
This commit introduces into the node stats API various statistics to track the time that the elected master spends in various phases of the cluster state publication process. Relates #76625
Adjusts the wire compatibility constraints and reverts commit 18e657a.
* master: (21 commits) [Test] More robust assertions for sorting and pagination (elastic#76654) [Test] Fix filename check on Windows (elastic#76807) Upgrade build scan plugin to 3.6.4 (elastic#76784) Remove keystore initial_md5sum (elastic#76835) Don't export docker images on assemble (elastic#76817) Fix testMasterStatsOnSuccessfulUpdate (elastic#76844) AwaitsFix for elastic#76840 Make Releasing Aggregation Buffers Safer (elastic#76741) Re-enable BWC tests after backport of elastic#76771 (elastic#76839) Dispatch large bulk requests to write thread (elastic#76736) Disable BWC tests for elastic#76771 Pull down beats artifacts when performing release tests Add timing stats to publication process (elastic#76771) Fix BanFailureLoggingTests some more (elastic#76668) Mention "warn threshold" in master service slowlog (elastic#76815) Fix DockerTests.test010Install Re-enable tests affected by elastic#75097 (elastic#76814) Fix testRecoveryIsCancelledAfterDeletingTheIndex (elastic#76644) Test fix -WildcardFieldMapperTests bad test data. (elastic#76819) Updating supported version after backporting the feature (elastic#76794) ... # Conflicts: # server/src/main/java/org/elasticsearch/action/bulk/TransportBulkAction.java
This commit introduces into the node stats API various statistics to
track the time that the elected master spends in various phases of the
cluster state publication process.
Relates #76625