Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add timing stats to publication process #76771

Merged

Conversation

DaveCTurner
Copy link
Contributor

This commit introduces into the node stats API various statistics to
track the time that the elected master spends in various phases of the
cluster state publication process.

Relates #76625

This commit introduces into the node stats API various statistics to
track the time that the elected master spends in various phases of the
cluster state publication process.

Relates elastic#76625
@DaveCTurner DaveCTurner added >enhancement :Distributed Coordination/Cluster Coordination Cluster formation and cluster state publication, including cluster membership and fault detection. v8.0.0 v7.16.0 labels Aug 20, 2021
Copy link
Contributor Author

@DaveCTurner DaveCTurner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The changes to Coordinator and MasterService are pretty much the only content here, everything else is just plumbing and other boilerplate. Adding 19 statistics in one go generates a lot of noise.

@@ -40,3 +40,99 @@
- gte: { nodes.$master.discovery.published_cluster_states.incompatible_diffs: 0 }
- gte: { nodes.$master.discovery.published_cluster_states.compatible_diffs: 0 }
- is_false: nodes.$master.roles

---
"Master timing stats":
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Really just asserting that the stats really do come out of the API and that we do the right thing with ?human.

return ifSet(masterApplyElapsedMillis);
}

private static long ifSet(long millis) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Used by the maybeGet functions for failed-publication stats since the failure could happen at any stage.

@@ -881,4 +900,91 @@ public void onNonemptyQueue() {
}
}

private static class TimingStatisticsTracker {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All methods synchronized (but very fast) just to avoid getting partially-updated stats.

@DaveCTurner DaveCTurner marked this pull request as ready for review August 20, 2021 17:32
@elasticmachine elasticmachine added the Team:Distributed (Obsolete) Meta label for distributed team (obsolete). Replaced by Distributed Indexing/Coordination. label Aug 20, 2021
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-distributed (Team:Distributed)

@DaveCTurner DaveCTurner requested a review from dakrone August 20, 2021 17:32
Copy link
Member

@dakrone dakrone left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This generally looks good, thanks for working on this @DaveCTurner! My one concern however is using the CachedTimeThread for calculating the times rather than raw System.nanoTime(). I think if we have a bunch of sub-200ms operations they'll appear to have completed instantaneously while the count is increasing, unless I am mistaken in the implementation. This inaccuracy seems like it will be very confusing for a user (and not to mention trying to graph/track stats like these).


`success`::
(object)
Contains statistics about cluster state update attempts that did not change the
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think these are update attempts that did change the cluster state successfully?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bah yes. Copy/paste might have been involved a bit here.

@@ -14,14 +14,30 @@
*/
public class ClusterStatePublicationEvent {

private static final long NOT_SET = -1L;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Out of curiosity, why not just use 0 for NOT_SET instead of having a separate flag for it, especially since when retrieving the stats we adjust NOT_SET back to 0 anyway?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Really just to support the assertion that these things are set once and only once on each event. I added a comment in bd8ef0c.

final PublicationTransportHandler.PublicationContext publicationContext =
publicationHandler.newPublicationContext(clusterStatePublicationEvent);
clusterStatePublicationEvent.setPublicationContextConstructionElapsedMillis(
transportService.getThreadPool().relativeTimeInMillis() - publicationContextConstructionStartMillis);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a concern that if this is < 200ms that it won't be counted with the use of the CachedTimeThread that only updates the clock every 200ms? Should this use "real" time instead?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My thinking was that if these things are only taking 200ms or so then we probably don't care, or more precisely we only care if they're happening with high frequency and in the high frequency case we will pick up the occasional 200ms clock tick with a probability that matches the mean duration, so on balance it'll all work out.

Also it's a pain to write tests that assert things about time when using System::nanoTime. I can fix up the assertions that say that at least T ms elapsed, but I'd just have to drop the ones with an upper bound. Maybe it's not that bad, the only one that I can't easily add an artificial delay to is the publication context construction.

I'll try out some ideas next week. Do you think we should accumulate the timers in nanoseconds too? Long.MAX_VALUE nanoseconds is ~300 years, I don't think there's much risk of overflow.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok in 9b0c5f3 I added a method to expose the raw (uncached) relative time to ThreadPool which we can still override to solve the testing pain, but which has better granularity. I stuck with milliseconds.

Copy link
Contributor

@henningandersen henningandersen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

I would also prefer a more granular time measurement.

Comment on lines 2150 to 2160
`commit_time`::
(<<time-units,time value>>)
The cumulative amount of time spent waiting for a successful cluster state
update to _commit_, which happens when a majority of the master nodes have
written the state to disk.

`completion_time`::
(<<time-units,time value>>)
The cumulative amount of time spent waiting for a successful cluster state
update to _complete_, which happens when all the nodes have applied the cluster
state except for the elected master.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps clarify that these are from publication start, i.e., IIUC includes publication_time?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure, see bee970a.

@DaveCTurner DaveCTurner requested a review from dakrone August 23, 2021 09:33
Copy link
Member

@dakrone dakrone left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks for using the raw time!

@DaveCTurner DaveCTurner merged commit 4a17847 into elastic:master Aug 23, 2021
@DaveCTurner DaveCTurner deleted the 2021-08-20-master-publication-stats branch August 23, 2021 16:38
@DaveCTurner
Copy link
Contributor Author

Thanks both!

DaveCTurner added a commit that referenced this pull request Aug 23, 2021
DaveCTurner added a commit that referenced this pull request Aug 23, 2021
This commit introduces into the node stats API various statistics to
track the time that the elected master spends in various phases of the
cluster state publication process.

Relates #76625
DaveCTurner added a commit to DaveCTurner/elasticsearch that referenced this pull request Aug 23, 2021
Adjusts the wire compatibility constraints and reverts commit
18e657a.
elasticsearchmachine pushed a commit that referenced this pull request Aug 23, 2021
Adjusts the wire compatibility constraints and reverts commit
18e657a.
wjp719 added a commit to wjp719/elasticsearch that referenced this pull request Aug 24, 2021
* master: (21 commits)
  [Test] More robust assertions for sorting and pagination (elastic#76654)
  [Test] Fix filename check on Windows (elastic#76807)
  Upgrade build scan plugin to 3.6.4 (elastic#76784)
  Remove keystore initial_md5sum (elastic#76835)
  Don't export docker images on assemble (elastic#76817)
  Fix testMasterStatsOnSuccessfulUpdate (elastic#76844)
  AwaitsFix for elastic#76840
  Make Releasing Aggregation Buffers Safer (elastic#76741)
  Re-enable BWC tests after backport of elastic#76771 (elastic#76839)
  Dispatch large bulk requests to write thread  (elastic#76736)
  Disable BWC tests for elastic#76771
  Pull down beats artifacts when performing release tests
  Add timing stats to publication process (elastic#76771)
  Fix BanFailureLoggingTests some more (elastic#76668)
  Mention "warn threshold" in master service slowlog (elastic#76815)
  Fix DockerTests.test010Install
  Re-enable tests affected by elastic#75097 (elastic#76814)
  Fix testRecoveryIsCancelledAfterDeletingTheIndex (elastic#76644)
  Test fix -WildcardFieldMapperTests bad test data. (elastic#76819)
  Updating supported version after backporting the feature (elastic#76794)
  ...

# Conflicts:
#	server/src/main/java/org/elasticsearch/action/bulk/TransportBulkAction.java
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Distributed Coordination/Cluster Coordination Cluster formation and cluster state publication, including cluster membership and fault detection. >enhancement Team:Distributed (Obsolete) Meta label for distributed team (obsolete). Replaced by Distributed Indexing/Coordination. v7.16.0 v8.0.0-alpha2
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants