Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

INFO logging of snapshot restore and completion #88257

Merged
merged 12 commits into from
Jul 12, 2022

Conversation

kingherc
Copy link
Contributor

@kingherc kingherc commented Jul 4, 2022

But DEBUG (silent) logging of snapshot restore/completion when
done in the context of CCR or searchable snapshots.

Fixes #86610

But DEBUG (silent) logging of snapshot restore/completion when
done in the context of CCR or searchable snapshots.

Fixes elastic#86610
@@ -1051,6 +1053,7 @@ public ClusterState execute(ClusterState currentState) {
boolean changed = false;
for (RestoreInProgress.Entry entry : currentState.custom(RestoreInProgress.TYPE, RestoreInProgress.EMPTY)) {
if (entry.state().completed()) {
logger.log(entry.silent() ? Level.DEBUG : Level.INFO, "restore of snapshot [{}] completed", entry.snapshot());
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@fcofdez suggested to me this is a nice place to log the completion of a snapshot restore. This meant passing the silent property from the RestoreSnapshotRequest to RestoreInProgress.Entry so that we can read the silent parameter here.

@kingherc kingherc marked this pull request as ready for review July 4, 2022 15:45
@kingherc
Copy link
Contributor Author

kingherc commented Jul 4, 2022

Hi. Adding @DaveCTurner as reviewer (since he created the issue) and @original-brownbear as another reviewer (since you have experience with snapshot restores AFAIK). I manually tested this in this way:

  • Executed org.elasticsearch.snapshots.RestoreSnapshotIT#testParallelRestoreOperations (for regular snapshot restore use case) and saw that it emits the INFO log messages
  • Executed org.elasticsearch.xpack.ccr.AutoFollowIT#testAutoFollow (for CCR use case) and saw it does not emit the INFO log messages
  • Executed org.elasticsearch.xpack.searchablesnapshots.SearchableSnapshotsIntegTests#testCreateAndRestoreSearchableSnapshot (for searchable snapshot use case) and saw it does not emit the INFO log messages

Feel free to tell me:

  • if only one of you is needed as reviewer
  • whether we would like to add some sort of logging testing (if it exists for this part of the code, similar to logging tests that I saw in CoordinatorTests.java).

@kingherc kingherc self-assigned this Jul 4, 2022
@kingherc kingherc added the Team:Distributed (Obsolete) Meta label for distributed team (obsolete). Replaced by Distributed Indexing/Coordination. label Jul 4, 2022
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-distributed (Team:Distributed)

@kingherc kingherc added >enhancement :Distributed Coordination/Snapshot/Restore Anything directly related to the `_snapshot/*` APIs Supportability Improve our (devs, SREs, support eng, users) ability to troubleshoot/self-service product better. labels Jul 4, 2022
@elasticsearchmachine
Copy link
Collaborator

Hi @kingherc, I've created a changelog YAML for you.

@kingherc
Copy link
Contributor Author

kingherc commented Jul 5, 2022

OK, I understand I should have added you as reviewers after the CI passes -- will keep that in mind.

After conversing a bit, I also understand that in order to fix the CI bwc failures, I should (a) not expose the silent parameter in XContent, and (b) add ifs around the serialization parts. I'll try that 🤞 .

@DaveCTurner
Copy link
Contributor

DaveCTurner commented Jul 5, 2022

++ yes we shouldn't see this parameter in anything user-facing (particularly this means neither in docs nor XContent1) and the wire-format change needs version guards. When receiving a transport message from an older node I suggest just treating the missing silent parameter as true.

Footnotes

  1. NB XContent is not always user-facing, it's also used for writing the cluster metadata to disk (e.g. for cluster coordination and snapshots) but in this case neither of those cases apply.

@kingherc
Copy link
Contributor Author

kingherc commented Jul 5, 2022

Hi @original-brownbear and @DaveCTurner , CI passed. Feel free to review.

Copy link
Contributor

@DaveCTurner DaveCTurner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, I left a couple of small suggestions.

I'd like to have a test which asserts that we do actually report the log message we're adding at the right level. I think you can add a test case in RestoreSnapshotIT for that.

@kingherc kingherc requested a review from DaveCTurner July 7, 2022 10:48
@kingherc
Copy link
Contributor Author

kingherc commented Jul 7, 2022

@DaveCTurner incorporated your feedback. Feel free to review. @original-brownbear feel free to also review if you'd like. Thanks!

@kingherc
Copy link
Contributor Author

Gentle reminder for review @original-brownbear , @DaveCTurner

Copy link
Contributor

@DaveCTurner DaveCTurner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the delay here @kingherc, this all looks good although I've one question about the test you added.

Comment on lines 182 to 204
mockLogAppender.addExpectation(new MockLogAppender.LoggingExpectation() {
String startMessageSnapshot;
Object startMessageIndicesList;
String completionMessageSnapshot;

@Override
public void match(LogEvent event) {
final String message = event.getMessage().getFormattedMessage();
if (message.startsWith("started restore of snapshot")) {
startMessageSnapshot = event.getMessage().getParameters()[0].toString();
startMessageIndicesList = event.getMessage().getParameters()[1];
} else if (message.startsWith("completed restore of snapshot")) {
completionMessageSnapshot = event.getMessage().getParameters()[0].toString();
}
}

@Override
public void assertMatched() {
assertThat(startMessageSnapshot, startsWith(snapshotName));
assertEquals(Arrays.asList(indexName), startMessageIndicesList);
assertThat(completionMessageSnapshot, startsWith(snapshotName));
}
});
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it'd be clearer to write this as two regular SeenEventExpectation expectations with the actual message we expect (maybe with some * wildcards) instead of doing this. But maybe I've missed some subtlety about why we're doing it this way.

Done this way I think the fields would need to be volatile right?

Copy link
Contributor

@DaveCTurner DaveCTurner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@kingherc kingherc merged commit 1e07915 into elastic:master Jul 12, 2022
@kingherc kingherc deleted the feature/log-snapshot-restore branch July 12, 2022 13:52
weizijun added a commit to weizijun/elasticsearch that referenced this pull request Jul 13, 2022
* upstream/master: (38 commits)
  Simplify map copying (elastic#88432)
  Make DiffableUtils.diff implementation agnostic (elastic#88403)
  Ingest: Start separating Metadata from IngestSourceAndMetadata (elastic#88401)
  Move runtime fields base scripts out of scripting fields api package. (elastic#88488)
  Enable TRACE Logging for test and increase timeout (elastic#88477)
  Mute ReactiveStorageIT#testScaleDuringSplitOrClone (elastic#88480)
  Track the count of failed invocations since last successful policy snapshot (elastic#88398)
  Avoid noisy exceptions on data nodes when aborting snapshots (elastic#88476)
  Fix ReactiveStorageDeciderServiceTests testNodeSizeForDataBelowLowWatermark (elastic#88452)
  INFO logging of snapshot restore and completion (elastic#88257)
  unmute test (elastic#88454)
  Updatable API keys - noop check (elastic#88346)
  Corrected an incomplete sentence. (elastic#86542)
  Use consistent shard map type in IndexService (elastic#88465)
  Stop registering TestGeoShapeFieldMapperPlugin in ESIntegTestCase (elastic#88460)
  TSDB: RollupShardIndexer logging improvements (elastic#88416)
  Audit API key ID when create or grant API keys (elastic#88456)
  Bound random negative size test in SearchSourceBuilderTests#testNegativeSizeErrors (elastic#88457)
  Updatable API keys - logging audit trail event (elastic#88276)
  Polish reworked LoggedExec task (elastic#88424)
  ...

# Conflicts:
#	x-pack/plugin/rollup/src/main/java/org/elasticsearch/xpack/rollup/v2/RollupShardIndexer.java
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Distributed Coordination/Snapshot/Restore Anything directly related to the `_snapshot/*` APIs >enhancement Supportability Improve our (devs, SREs, support eng, users) ability to troubleshoot/self-service product better. Team:Distributed (Obsolete) Meta label for distributed team (obsolete). Replaced by Distributed Indexing/Coordination. v8.4.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Log snapshot restores at INFO level
4 participants