-
Notifications
You must be signed in to change notification settings - Fork 24.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[CCR] Add time since last auto follow fetch to auto follow stats #36542
[CCR] Add time since last auto follow fetch to auto follow stats #36542
Conversation
For each remote cluster the auto follow coordinator, starts an auto follower that checks the remote cluster state and determines whether an index needs to be auto followed. The time since last auto follow is reported per remote cluster and gives insight whether the auto follow process is alive. Relates to elastic#33007 Originates from elastic#35895
Pinging @elastic/es-distributed |
@@ -73,24 +90,28 @@ public static AutoFollowStats fromXContent(final XContentParser parser) { | |||
private final long numberOfFailedRemoteClusterStateRequests; | |||
private final long numberOfSuccessfulFollowIndices; | |||
private final NavigableMap<String, ElasticsearchException> recentAutoFollowErrors; | |||
private final NavigableMap<String, Long> trackingRemoteClusters; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder whether all stats should be reported per remote cluster, so numberOfSuccessfulFollowIndices
, recentAutoFollowErrors
etc. would then be reported on a per remote cluster bases. This makes more sense to me now considering how the auto follow coordinator currently operates.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @martijnvg. This looks good. I left some comments.
@@ -156,7 +156,7 @@ public Ccr(final Settings settings) { | |||
|
|||
return Arrays.asList( | |||
ccrLicenseChecker, | |||
new AutoFollowCoordinator(client, clusterService, ccrLicenseChecker) | |||
new AutoFollowCoordinator(client, clusterService, ccrLicenseChecker, System::nanoTime) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe use ThreadPool::relativeTimeInMillis
.
|
||
private volatile long lastAutoFollowTime = -1; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we add the time unit to the variable names (relativeTimeProvider and lastAutoFollowTime)?
long numberOfFailedRemoteClusterStateRequests, | ||
long numberOfSuccessfulFollowIndices, | ||
NavigableMap<String, ElasticsearchException> recentAutoFollowErrors, | ||
NavigableMap<String, Long> trackingRemoteClusters |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure about this parameter name (trackingRemoteClusters
).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed, current name is not so good. I'm thinking about auto_followed_cluster
and also including the last_seen_metadata version together with time_since_last_auto_follow_millis.
fixed monitor mapping tests
…_auto_follow_fetch
@dnhatn Thanks for look and I've updated the PR. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I left some comments around the time unit. Feel free to merge it after addressing these.
@@ -79,10 +84,13 @@ | |||
public AutoFollowCoordinator( | |||
Client client, | |||
ClusterService clusterService, | |||
CcrLicenseChecker ccrLicenseChecker) { | |||
CcrLicenseChecker ccrLicenseChecker, | |||
LongSupplier relativeNanoTimeProvider) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: this became milliseconds.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yikes, I completely missed that. Good catch!
long lastSeenMetadataVersion = entry.getValue().metadataVersion; | ||
if (lastAutoFollowTimeInNanos != -1) { | ||
long timeSinceLastAutoFollowInMillis = | ||
TimeUnit.NANOSECONDS.toMillis(relativeNanoTimeProvider.getAsLong() - lastAutoFollowTimeInNanos); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since the time unit is ms, we should remove this conversion.
"cluster_name": { | ||
"type": "keyword" | ||
}, | ||
"time_since_last_auto_follow_started_millis": { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe drop started
in the field name?
|
||
private volatile long lastAutoFollowTimeInNanos = -1; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This also became ms now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
run the gradle build tests 1 |
…_auto_follow_fetch
I wonder if we should call this |
👍 I will rename it to |
) For each remote cluster the auto follow coordinator, starts an auto follower that checks the remote cluster state and determines whether an index needs to be auto followed. The time since last auto follow is reported per remote cluster and gives insight whether the auto follow process is alive. Relates to #33007 Originates from #35895
* master: (30 commits) Revert "[Geo] Integrate Lucene's LatLonShape (BKD Backed GeoShapes) as default `geo_shape` indexing approach (elastic#35320)" Deprecate types in get_source and exist_source (elastic#36426) Fix duplicate phrase in shrink/split error message (elastic#36734) ingest: support default pipelines + bulk upserts (elastic#36618) TESTS:Debug Log. IndexStatsIT#testFilterCacheStats [Geo] Integrate Lucene's LatLonShape (BKD Backed GeoShapes) as default `geo_shape` indexing approach (elastic#35320) [TEST] fix float comparison in RandomObjects#getExpectedParsedValue Initialize startup `CcrRepositories` (elastic#36730) ingest: fix on_failure with Drop processor (elastic#36686) SNAPSHOTS: Adjust BwC Versions in Restore Logic (elastic#36718) [Painless] Add boxed type to boxed type casts for method/return (elastic#36571) Do not resolve addresses in remote connection info (elastic#36671) Add back one line removed by mistake regarding java version check and COMPAT jvm parameter existence Fixing line length for EnvironmentTests and RecoveryTests (elastic#36657) SQL: Fix translation of LIKE/RLIKE keywords (elastic#36672) [DOCS] Adds monitoring requirement for ingest node (elastic#36665) SNAPSHOTS: Disable BwC Tests Until elastic#36659 Landed (elastic#36709) Add doc's sequence number + primary term to GetResult and use it for updates (elastic#36680) [CCR] Add time since last auto follow fetch to auto follow stats (elastic#36542) Watcher accounts constructed lazily (elastic#36656) ...
Would it be possible to update the documentation? https://www.elastic.co/guide/en/elasticsearch/reference/6.7/ccr-get-stats.html |
For each remote cluster the auto follow coordinator, starts an auto
follower that checks the remote cluster state and determines whether an
index needs to be auto followed. The time since last auto follow is
reported per remote cluster and gives insight whether the auto follow
process is alive.
Relates to #33007
Originates from #35895