Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Only log warning when actually failing shards #28558

Merged
merged 4 commits into from
Feb 8, 2018

Conversation

dnhatn
Copy link
Member

@dnhatn dnhatn commented Feb 8, 2018

Currently the master node logs a warning message whenever it receives a failed shard request. However, this can be noisy because

  • Multiple failed shard requests can be issued for a single shard
  • Failed shard requests can be still issued for an already failed shard

This commit moves the log-warn to AllocationService in which the failing shard action actually happens. This is another prerequisite step in order to not ignore the shard not-available exceptions in the replication.

Relates #28534

@@ -188,6 +189,7 @@ public ClusterState applyFailedShards(final ClusterState clusterState, final Lis
if (failedShardEntry.markAsStale()) {
allocation.removeAllocationId(failedShard);
}
logger.warn(new ParameterizedMessage("failing shard [{}]", failedShardEntry), failedShardEntry.getFailure());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also add logging information for the "mark as stale" part here as well as in removeStaleIdsWithoutRoutings?

@dnhatn
Copy link
Member Author

dnhatn commented Feb 8, 2018

@ywelsch, I've added a logging for stale shards. The failedShardEntry already includes the mark as stale information. Please have another look. Thank you!

@@ -238,6 +239,11 @@ public static ClusterState removeStaleIdsWithoutRoutings(ClusterState clusterSta
}
indexMetaDataBuilder.putInSyncAllocationIds(shardNumber, remainingInSyncAllocations);
}
// Only log the stale shards which have been actually removed.
if (oldInSyncAllocations.size() != remainingInSyncAllocations.size()) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This check is not necessary. idsToRemove already contains the info that's needed (it does not contain non-existing ids, you can even add an assertion for that if you like).
Also I would change the message to "{} marking unavailable shards as stale: {}"

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I pushed b0cd954

Copy link
Contributor

@ywelsch ywelsch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@dnhatn
Copy link
Member Author

dnhatn commented Feb 8, 2018

Thanks @ywelsch for reviewing.

@dnhatn dnhatn changed the title Only log warning of failing shard when actually failing it Only log warning when actually failing shards Feb 8, 2018
@dnhatn dnhatn merged commit 5b8870f into elastic:master Feb 8, 2018
@dnhatn dnhatn deleted the log-failed-shard branch February 8, 2018 20:37
dnhatn added a commit that referenced this pull request Feb 8, 2018
Currently the master node logs a warning message whenever it receives a 
failed shard request. However, this can be noisy because

- Multiple failed shard requests can be issued for a single shard
- Failed shard requests can be still issued for an already failed shard

This commit moves the log-warn to AllocationService in which the failing 
shard action actually happens. This is another prerequisite step in 
order to not ignore the shard not-available exceptions in the
replication.

Relates #28534
@lcawl lcawl added :Search/Search Search-related issues that do not fall into other categories and removed :Allocation labels Feb 13, 2018
@clintongormley clintongormley added :Distributed Indexing/Distributed A catch all label for anything in the Distributed Area. Please avoid if you can. and removed :Search/Search Search-related issues that do not fall into other categories labels Feb 13, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Distributed Indexing/Distributed A catch all label for anything in the Distributed Area. Please avoid if you can. >enhancement v6.3.0 v7.0.0-beta1
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants