Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CCR] Add total fetch time took stat #34577

Merged
merged 5 commits into from
Oct 23, 2018

Conversation

martijnvg
Copy link
Member

This new stat keeps track how much time was spent on fetches from the leader cluster perspective.

The current name of this new stat is total_fetch_took_time_millis and it doesn't describe well what this stat represents. I'm not sure what would be a good name. Maybe total_fetch_leader_time_millis or total_fetch_remote_time_millis to indicate it is the same as total_fetch_time_millis but from a different perspective? cc: @bleskes / @jasontedor

keeps track how much time was spent on fetches
from the leader cluster perspective.
@martijnvg martijnvg added >non-issue WIP :Distributed Indexing/CCR Issues around the Cross Cluster State Replication features labels Oct 18, 2018
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-distributed

@bleskes
Copy link
Contributor

bleskes commented Oct 18, 2018

I think I'm good with the following:

  1. Changes response has a took time field
  2. Shard Follow Node Task sums the value of that feed under total_fetch_leader_time_millis
  3. We have a total_fetch_time_millis as we do today.

@martijnvg
Copy link
Member Author

martijnvg commented Oct 18, 2018 via email

Copy link
Member

@jasontedor jasontedor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wrote more words than necessary to say we should start the clock sooner.

}

@Override
protected void asyncShardOperation(
final Request request,
final ShardId shardId,
final ActionListener<Response> listener) throws IOException {
request.relativeStartNanos = System.nanoTime();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is the wrong place to start the clock. The goal with this metric is to capture the non-network time on the leader so we want to start the clock as soon as possible. Right now these requests are handled on the network thread before we go async, so there is not much in terms of queuing time, but it would be a bug if somehow this ever changed and handling of this request was not on the network thread before going async, this would miss queuing time in that other thread pool (we miss internal Netty queuing, but I don't think there's anything obvious that we can do about that). This is also something to think about in the context of the NIO work. I also don't like that we are mutating the request, it's doing to make it harder to have immutable request objects.

I think we should start the clock the moment the request is de-serialized.

@martijnvg
Copy link
Member Author

@bleskes @jasontedor I've updated this PR.

@martijnvg martijnvg removed the WIP label Oct 23, 2018
Copy link
Member

@jasontedor jasontedor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

@martijnvg martijnvg merged commit e6d87cc into elastic:master Oct 23, 2018
martijnvg added a commit that referenced this pull request Oct 23, 2018
Add total fetch time leader stat, that
keeps track how much time was spent on fetches
from the leader cluster perspective.
jasontedor added a commit to jasontedor/elasticsearch that referenced this pull request Oct 23, 2018
* master: (24 commits)
  ingest: better support for conditionals with simulate?verbose (elastic#34155)
  [Rollup] Job deletion should be invoked on the allocated task (elastic#34574)
  [DOCS] .Security index is never auto created (elastic#34589)
  CCR: Requires soft-deletes on the follower (elastic#34725)
  re-enable bwc tests (elastic#34743)
  Empty GetAliases authorization fix (elastic#34444)
  INGEST: Document Processor Conditional (elastic#33388)
  [CCR] Add total fetch time leader stat (elastic#34577)
  SQL: Support pattern against compatible indices (elastic#34718)
  [CCR] Auto follow pattern APIs adjustments (elastic#34518)
  [Test] Remove dead code from ExceptionSerializationTests (elastic#34713)
  A small typo in migration-assistance doc (elastic#34704)
  ingest: processor stats (elastic#34724)
  SQL: Implement IN(value1, value2, ...) expression. (elastic#34581)
  Tests: Add checks to GeoDistanceQueryBuilderTests (elastic#34273)
  INGEST: Rename Pipeline Processor Param. (elastic#34733)
  Core: Move IndexNameExpressionResolver to java time (elastic#34507)
  [DOCS] Force Merge: clarify execution and storage requirements (elastic#33882)
  TESTING.asciidoc fix examples using forbidden annotation (elastic#34515)
  SQL: Implement `CONVERT`, an alternative to `CAST` (elastic#34660)
  ...
kcm pushed a commit that referenced this pull request Oct 30, 2018
Add total fetch time leader stat, that
keeps track how much time was spent on fetches
from the leader cluster perspective.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Distributed Indexing/CCR Issues around the Cross Cluster State Replication features >non-issue
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants