Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a shard snapshot current status debug string #118198

Conversation

DiannaHohensee
Copy link
Contributor

@DiannaHohensee DiannaHohensee commented Dec 6, 2024

Add a shard snapshot current status debug string

SnapshotShardsService will update the new debug string field in
IndexShardSnapshotStatus as a shard snapshot operation proceeds so
that the current state can be logged. The
SnapshotShutdownProgressTracker will iterate through the
SnapshotShardsService's list of shard snapshots and log their current
status.

We want to know where in the code a shard snapshot operation
potentially gets stuck. This new field should be updated as
frequently as is reasonable to inform on the shard snapshot's progress.

Closes ES-10261

@DiannaHohensee DiannaHohensee self-assigned this Dec 6, 2024
@elasticsearchmachine elasticsearchmachine added needs:triage Requires assignment of a team area label v9.0.0 labels Dec 6, 2024
@DiannaHohensee DiannaHohensee changed the title Add a shard snapshot current state debug string Add a shard snapshot current status debug string Dec 6, 2024
The SnapshotShutdownProgressTracker will iterate through the
SnapshotShardsService's list of shard snapshots and log their current
status. SnapshotShardsService will now update a new debug string
field in IndexShardSnapshotStatus as a shard snapshot operation
proceeds so that the current state can be logged.
@DiannaHohensee DiannaHohensee force-pushed the 2024/12/06/ES-10261-snapshot-debug-status branch from 3c06926 to 4ed1d23 Compare December 9, 2024 14:28
@DiannaHohensee DiannaHohensee added Team:Distributed Coordination Meta label for Distributed Coordination team :Distributed Coordination/Snapshot/Restore Anything directly related to the `_snapshot/*` APIs and removed needs:triage Requires assignment of a team area label labels Dec 9, 2024
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-distributed-coordination (Team:Distributed Coordination)

Copy link
Contributor

@DaveCTurner DaveCTurner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 makes sense to me, just a couple of comments

@@ -98,6 +98,7 @@ public enum AbortStatus {
private long processedSize;
private String failure;
private final SubscribableListener<AbortStatus> abortListeners = new SubscribableListener<>();
private String debugStatusString;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably needs to be volatile so that toString() and friends all see the latest value. And in that case there's no need to make the update method synchronized.

Suggested change
private String debugStatusString;
private volatile String debugStatusString;

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, the toString(). Changed 👍 .

* Updates the string explanation for what the snapshot is actively doing right now.
*/
public synchronized void updateDebugStatusString(String statusString) {
this.debugStatusString = statusString;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we at least assert that statusString is not null (or empty)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, added 👍

@@ -461,6 +482,8 @@ public String toString() {
+ processedSize
+ ", failure='"
+ failure
+ "', debugStatusString='"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure about exposing this to users as debugStatusString, which I believe might happen since this is logged at INFO sometimes. Seems a little overly technical. How about status or statusDescription?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed. updated to statusDescription 👍

Copy link
Contributor Author

@DiannaHohensee DiannaHohensee left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pushed an update

@@ -461,6 +482,8 @@ public String toString() {
+ processedSize
+ ", failure='"
+ failure
+ "', debugStatusString='"
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed. updated to statusDescription 👍

* Updates the string explanation for what the snapshot is actively doing right now.
*/
public synchronized void updateDebugStatusString(String statusString) {
this.debugStatusString = statusString;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, added 👍

@@ -98,6 +98,7 @@ public enum AbortStatus {
private long processedSize;
private String failure;
private final SubscribableListener<AbortStatus> abortListeners = new SubscribableListener<>();
private String debugStatusString;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, the toString(). Changed 👍 .

Copy link
Contributor

@DaveCTurner DaveCTurner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM (one nit)

Comment on lines +282 to +283
assert statusString != null;
assert statusString.isEmpty() == false;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: should really check the value passed into the constructor too in order to be certain that the statusDescription field is never null or empty.

@DiannaHohensee DiannaHohensee merged commit 47be542 into elastic:main Dec 10, 2024
16 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Distributed Coordination/Snapshot/Restore Anything directly related to the `_snapshot/*` APIs >enhancement Team:Distributed Coordination Meta label for Distributed Coordination team v9.0.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants