Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Include extra snapshot details in logs/APIs #75917

Merged
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
38 changes: 22 additions & 16 deletions server/src/main/java/org/elasticsearch/cluster/ClusterState.java
Original file line number Diff line number Diff line change
Expand Up @@ -52,24 +52,23 @@
/**
* Represents the current state of the cluster.
* <p>
* The cluster state object is immutable with the exception of the {@link RoutingNodes} structure, which is
* built on demand from the {@link RoutingTable}.
* The cluster state can be updated only on the master node. All updates are performed by on a
* single thread and controlled by the {@link ClusterService}. After every update the
* {@link Discovery#publish} method publishes a new version of the cluster state to all other nodes in the
* cluster. The actual publishing mechanism is delegated to the {@link Discovery#publish} method and depends on
* the type of discovery.
* The cluster state object is immutable with the exception of the {@link RoutingNodes} structure, which is built on demand from the {@link
* RoutingTable}. The cluster state can be updated only on the master node. All updates are performed by on a single thread and controlled
* by the {@link ClusterService}. After every update the {@link Discovery#publish} method publishes a new version of the cluster state to
* all other nodes in the cluster.
* <p>
* The cluster state implements the {@link Diffable} interface in order to support publishing of cluster state
* differences instead of the entire state on each change. The publishing mechanism should only send differences
* to a node if this node was present in the previous version of the cluster state. If a node was
* not present in the previous version of the cluster state, this node is unlikely to have the previous cluster
* state version and should be sent a complete version. In order to make sure that the differences are applied to the
* correct version of the cluster state, each cluster state version update generates {@link #stateUUID} that uniquely
* identifies this version of the state. This uuid is verified by the {@link ClusterStateDiff#apply} method to
* make sure that the correct diffs are applied. If uuids don’t match, the {@link ClusterStateDiff#apply} method
* throws the {@link IncompatibleClusterStateVersionException}, which causes the publishing mechanism to send
* Implements the {@link Diffable} interface in order to support publishing of cluster state differences instead of the entire state on each
* change. The publishing mechanism only sends differences to a node if this node was present in the previous version of the cluster state.
* If a node was not present in the previous version of the cluster state, this node is unlikely to have the previous cluster state version
* and should be sent a complete version. In order to make sure that the differences are applied to the correct version of the cluster
* state, each cluster state version update generates {@link #stateUUID} that uniquely identifies this version of the state. This uuid is
* verified by the {@link ClusterStateDiff#apply} method to make sure that the correct diffs are applied. If uuids don’t match, the {@link
* ClusterStateDiff#apply} method throws the {@link IncompatibleClusterStateVersionException}, which causes the publishing mechanism to send
* a full version of the cluster state to the node on which this exception was thrown.
* <p>
* Implements {@link ToXContentFragment} to be exposed in REST APIs (e.g. {@code GET _cluster/state} and {@code POST _cluster/reroute}) and
* to be indexed by monitoring, mostly just for diagnostics purposes. The XContent representation does not need to be 100% faithful since we
* never reconstruct a cluster state from its XContent representation, but the more faithful it is the more useful it is for diagnostics.
*/
public class ClusterState implements ToXContentFragment, Diffable<ClusterState> {

Expand All @@ -85,6 +84,13 @@ default boolean isPrivate() {
return false;
}

/**
* Serialize this {@link Custom} for diagnostic purposes, exposed by the <pre>GET _cluster/state</pre> API etc. The XContent
* representation does not need to be 100% faithful since we never reconstruct a cluster state from its XContent representation, but
* the more faithful it is the more useful it is for diagnostics.
*/
@Override
XContentBuilder toXContent(XContentBuilder builder, Params params) throws IOException;
}

private static final NamedDiffableValueSerializer<Custom> CUSTOM_VALUE_SERIALIZER = new NamedDiffableValueSerializer<>(Custom.class);
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -932,7 +932,22 @@ private void writeShardSnapshotStatus(XContentBuilder builder, ToXContent indexI
builder.field("index", indexId);
builder.field("shard", shardId);
builder.field("state", status.state());
builder.field("generation", status.generation());
builder.field("node", status.nodeId());

if (status.state() == ShardState.SUCCESS) {
final ShardSnapshotResult result = status.shardSnapshotResult();
builder.startObject("result");
builder.field("generation", result.getGeneration());
builder.humanReadableField("size_in_bytes", "size", result.getSize());
builder.field("segments", result.getSegmentCount());
builder.endObject();
}

if (status.reason() != null) {
builder.field("reason", status.reason());
}

builder.endObject();
}

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,7 @@
import org.elasticsearch.ResourceNotFoundException;
import org.elasticsearch.Version;
import org.elasticsearch.action.AliasesRequest;
import org.elasticsearch.cluster.ClusterState;
import org.elasticsearch.cluster.Diff;
import org.elasticsearch.cluster.Diffable;
import org.elasticsearch.cluster.DiffableUtils;
Expand All @@ -26,6 +27,7 @@
import org.elasticsearch.cluster.block.ClusterBlockLevel;
import org.elasticsearch.cluster.coordination.CoordinationMetadata;
import org.elasticsearch.common.xcontent.NamedObjectNotFoundException;
import org.elasticsearch.common.xcontent.NamedXContentRegistry;
import org.elasticsearch.common.xcontent.ToXContent;
import org.elasticsearch.common.xcontent.ToXContentFragment;
import org.elasticsearch.common.xcontent.XContentBuilder;
Expand Down Expand Up @@ -75,6 +77,10 @@
import static org.elasticsearch.common.settings.Settings.readSettingsFromStream;
import static org.elasticsearch.common.settings.Settings.writeSettingsToStream;

/**
* {@link Metadata} is the part of the {@link ClusterState} which persists across restarts. This persistence is XContent-based, so a
* round-trip through XContent must be faithful in {@link XContentContext#GATEWAY} context.
*/
public class Metadata implements Iterable<IndexMetadata>, Diffable<Metadata>, ToXContentFragment {

private static final Logger logger = LogManager.getLogger(Metadata.class);
Expand Down Expand Up @@ -117,6 +123,10 @@ public enum XContentContext {
*/
public static EnumSet<XContentContext> ALL_CONTEXTS = EnumSet.allOf(XContentContext.class);

/**
* Custom metadata that persists (via XContent) across restarts. The deserialization method for each implementation must be registered
* with the {@link NamedXContentRegistry}.
*/
public interface Custom extends NamedDiffable<Custom>, ToXContentFragment {

EnumSet<XContentContext> context();
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -2193,7 +2193,7 @@ public void clusterStateProcessed(String source, ClusterState oldState, ClusterS
}
}
}
}, "delete snapshot", listener::onFailure);
}, "delete snapshot [" + repository + "]" + Arrays.toString(snapshotNames), listener::onFailure);
}

private static List<SnapshotId> matchingSnapshotIds(
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -375,9 +375,13 @@ public void testXContent() throws IOException {
new ShardId("index", "uuid", 0),
SnapshotsInProgress.ShardSnapshotStatus.success(
"nodeId",
new ShardSnapshotResult("generation", new ByteSizeValue(1L), 1)
new ShardSnapshotResult("shardgen", new ByteSizeValue(1L), 1)
)
)
.fPut(
new ShardId("index", "uuid", 1),
new SnapshotsInProgress.ShardSnapshotStatus("nodeId", ShardState.FAILED, "failure-reason", "fail-gen")
)
.build(),
null,
null,
Expand All @@ -398,9 +402,13 @@ public void testXContent() throws IOException {
"{\"snapshots\":[{\"repository\":\"repo\",\"snapshot\":\"name\",\"uuid\":\"uuid\","
+ "\"include_global_state\":true,\"partial\":true,\"state\":\"SUCCESS\","
+ "\"indices\":[{\"name\":\"index\",\"id\":\"uuid\"}],\"start_time\":\"1970-01-01T00:20:34.567Z\","
+ "\"start_time_millis\":1234567,\"repository_state_id\":0,"
+ "\"shards\":[{\"index\":{\"index_name\":\"index\",\"index_uuid\":\"uuid\"},"
+ "\"shard\":0,\"state\":\"SUCCESS\",\"node\":\"nodeId\"}],\"feature_states\":[],\"data_streams\":[]}]}"
+ "\"start_time_millis\":1234567,\"repository_state_id\":0,\"shards\":["
+ "{\"index\":{\"index_name\":\"index\",\"index_uuid\":\"uuid\"},\"shard\":0,\"state\":\"SUCCESS\","
+ "\"generation\":\"shardgen\",\"node\":\"nodeId\","
+ "\"result\":{\"generation\":\"shardgen\",\"size\":\"1b\",\"size_in_bytes\":1,\"segments\":1}},"
+ "{\"index\":{\"index_name\":\"index\",\"index_uuid\":\"uuid\"},\"shard\":1,\"state\":\"FAILED\","
+ "\"generation\":\"fail-gen\",\"node\":\"nodeId\",\"reason\":\"failure-reason\"}"
+ "],\"feature_states\":[],\"data_streams\":[]}]}"
)
);
}
Expand Down