Skip to content

Commit

Permalink
Update checkpoints after post-replication actions, even on failure
Browse files Browse the repository at this point in the history
A failed post write refresh should not prevent advancing the local
checkpoint if the translog operations have been fsynced correctly,
hence we should update the checkpoints in all situations. On the
other hand, if the fsync failed the local checkpoint won't advance
anyway and the engine will fail during the next indexing operation.
  • Loading branch information
fcofdez committed Jun 19, 2024
1 parent b60d77e commit 08ced55
Showing 1 changed file with 9 additions and 3 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -187,9 +187,15 @@ public void onResponse(Void aVoid) {
@Override
public void onFailure(Exception e) {
logger.trace("[{}] op [{}] post replication actions failed for [{}]", primary.routingEntry().shardId(), opType, request);
// TODO: fail shard? This will otherwise have the local / global checkpoint info lagging, or possibly have replicas
// go out of sync with the primary
finishAsFailed(e);
// We update the checkpoints since a refresh might fail but the operations could be safely persisted, in the case that the
// fsync failed the local checkpoint won't advance and the engine will be marked as failed when the next indexing operation
// is appended into the translog.
updateCheckPoints(
primary.routingEntry(),
primary::localCheckpoint,
primary::globalCheckpoint,
() -> finishAsFailed(e)
);
}
});
}
Expand Down

0 comments on commit 08ced55

Please sign in to comment.