Skip to content

Commit

Permalink
[BACKPORT 2.15.0][#13042] docdb: fixed clearing pending config for ab…
Browse files Browse the repository at this point in the history
…orted CONFIG_CHANGE_OP

Summary:
1. Before KUDU-1330 port (9305a20) we had a sanity check for pending config to be set inside`ReplicaState::SetCommittedConfigUnlocked`:
```
Status ReplicaState::SetCommittedConfigUnlocked(const RaftConfigPB& committed_config) {
  TRACE_EVENT0("consensus", "ReplicaState::SetCommittedConfigUnlocked");
  DCHECK(IsLocked());
  DCHECK(committed_config.IsInitialized());
  RETURN_NOT_OK_PREPEND(VerifyRaftConfig(committed_config, COMMITTED_QUORUM),
                        "Invalid config to set as committed");

  // Compare committed with pending configuration, ensure they are the same.
  // Pending will not have an opid_index, so ignore that field.
  DCHECK(cmeta_->has_pending_config());
// ^^^^^^^
```

Mentioned commit removed this sanity check.

2. Before d26017d we had some missed Raft operations callback invocations during abort. This commit added sanity check that we call Raft operation callback exactly once and also added missed callback invocations.

3. Very old fix 2051e04 implemented pending config change on aborting CHANGE_CONFIG_OP:
```
void RaftConsensus::NonTrackedRoundReplicationFinished(ConsensusRound* round,
                                                       const StdStatusCallback& client_cb,
                                                       const Status& status) {
...
  if (!status.ok()) {
    // TODO: Do something with the status on failure?
    LOG_WITH_PREFIX(INFO) << op_str << " replication failed: " << status << "\n" << GetStackTrace();

    // Clear out the pending state (ENG-590).
    if (IsChangeConfigOperation(op_type)) {
      WARN_NOT_OK(state_->ClearPendingConfigUnlocked(), "Could not clear pending state");
// ^^^^^^^
    }
  } else if (IsChangeConfigOperation(op_type)) {
    // Notify the TabletPeer owner object.
    state_->context()->ChangeConfigReplicated(state_->GetCommittedConfigUnlocked());
  }
...
}
```

If we receive two subsequent CHANGE_CONFIG_OP at the follower (for example this happens during `RemoteBootstrapITest.TestLeaderCrashesBeforeChangeRoleKeyValueTableType`), `RaftConsensus::StartReplicaOperationUnlocked` succeeds for the first CHANGE_CONFIG_OP and sets pending config. For the second CHANGE_CONFIG_OP it fails with error:
```
[ts-1] I0623 19:10:15.999151 189487 replica_state.cc:646] T 6afaa10647e644dd91fba0a32ced35b5 P 1da13ee035b044858177326059575579 [term 2 FOLLOWER]: Illegal state (yb/consensus/replica_state.cc:323): RaftConfig change currently pending. Only one is allowed at a time.
```
Due to combination of changes 2 and 3 mentioned above, callback is invoked and clears pending state.
After that Raft starts to apply first CHANGE_CONFIG_OP and it without change 1 it would fail on DCHECK.
But in version 2.14 we had all 3 changes in order: 3, 1, 2, so DCHECK was removed and we didn't fail due to broken invariant  (we should have pending config by the time we apply CHANGE_CONFIG_OP).

In version 2.12 we only have change 3 and when I started to backport change 2 to 2.12, mentioned test started to fail, because DCHECK is still there.

*Solution*

1. In order to keep sanity check for maintaining invariant DCHECK should be restored, but updated to `DCHECK(cmeta_->has_pending_config() || config_to_commit.unsafe_config_change());` because we only allow to not have pending config in case of unsafe config change.

2. On aborting CONFIG_CHANGE_OP it is not correct to clear pending config if it was set by another Raft operation, we should only clear pending config if it was set by operation we are aborting.

Original commit: 84e980a / D17907

Test Plan: RemoteBootstrapITest.TestLeaderCrashesBeforeChangeRoleKeyValueTableType

Reviewers: amitanand

Reviewed By: amitanand

Subscribers: bogdan

Differential Revision: https://phabricator.dev.yugabyte.com/D17990
  • Loading branch information
ttyusupov committed Jun 29, 2022
1 parent 7d2be40 commit f30111b
Show file tree
Hide file tree
Showing 8 changed files with 78 additions and 35 deletions.
20 changes: 12 additions & 8 deletions src/yb/consensus/consensus_meta-test.cc
Original file line number Diff line number Diff line change
Expand Up @@ -165,7 +165,8 @@ TEST_F(ConsensusMetadataTest, TestActiveRole) {
vector<string> uuids = { "a", "b", "c", "d" };
string peer_uuid = "e";
RaftConfigPB config1 = BuildConfig(uuids); // We aren't a member of this config...
config1.set_opid_index(1);
const auto op_id = OpId(1, 1);
config1.set_opid_index(op_id.index);

std::unique_ptr<ConsensusMetadata> cmeta;
ASSERT_OK(ConsensusMetadata::Create(&fs_manager_, kTabletId, peer_uuid,
Expand All @@ -177,12 +178,12 @@ TEST_F(ConsensusMetadataTest, TestActiveRole) {
// Follower.
uuids.push_back(peer_uuid);
RaftConfigPB config2 = BuildConfig(uuids); // But we are a member of this one.
config2.set_opid_index(1);
config2.set_opid_index(op_id.index);
cmeta->set_committed_config(config2);
ASSERT_EQ(PeerRole::FOLLOWER, cmeta->active_role());

// Pending should mask committed.
cmeta->set_pending_config(config1);
cmeta->set_pending_config(config1, op_id);
ASSERT_EQ(PeerRole::NON_PARTICIPANT, cmeta->active_role());
cmeta->clear_pending_config();
ASSERT_EQ(PeerRole::FOLLOWER, cmeta->active_role());
Expand All @@ -192,9 +193,9 @@ TEST_F(ConsensusMetadataTest, TestActiveRole) {
ASSERT_EQ(PeerRole::LEADER, cmeta->active_role());

// Again, pending should mask committed.
cmeta->set_pending_config(config1);
cmeta->set_pending_config(config1, op_id);
ASSERT_EQ(PeerRole::NON_PARTICIPANT, cmeta->active_role());
cmeta->set_pending_config(config2); // pending == committed.
cmeta->set_pending_config(config2, op_id); // pending == committed.
ASSERT_EQ(PeerRole::LEADER, cmeta->active_role());
cmeta->set_committed_config(config1); // committed now excludes this node, but is masked...
ASSERT_EQ(PeerRole::LEADER, cmeta->active_role());
Expand Down Expand Up @@ -222,7 +223,8 @@ TEST_F(ConsensusMetadataTest, TestToConsensusStatePB) {
// Set the pending configuration to be one containing the current leader (who is not
// in the committed configuration). Ensure that the leader shows up when we ask for
// the active consensus state.
cmeta->set_pending_config(pending_config);
const auto op_id = OpId(1, 1);
cmeta->set_pending_config(pending_config, op_id);
cmeta->set_leader_uuid(peer_uuid);
ConsensusStatePB active_cstate = cmeta->ToConsensusStatePB(CONSENSUS_CONFIG_ACTIVE);
ASSERT_TRUE(active_cstate.has_leader_uuid());
Expand Down Expand Up @@ -277,7 +279,8 @@ TEST_F(ConsensusMetadataTest, TestMergeCommittedConsensusStatePB) {

uuids.push_back("e");
RaftConfigPB pending_config = BuildConfig(uuids);
cmeta->set_pending_config(pending_config);
auto op_id = OpId(1, 2);
cmeta->set_pending_config(pending_config, op_id);
cmeta->set_leader_uuid("e");
cmeta->set_voted_for("e");

Expand All @@ -297,7 +300,8 @@ TEST_F(ConsensusMetadataTest, TestMergeCommittedConsensusStatePB) {
// Higher term, so wipe out the prior state.
remote_state.set_current_term(2);
*remote_state.mutable_config() = BuildConfig({ "i", "j", "k" });
cmeta->set_pending_config(pending_config);
op_id = OpId(2, 3);
cmeta->set_pending_config(pending_config, op_id);
cmeta->MergeCommittedConsensusStatePB(remote_state);
ASSERT_NO_FATALS(AssertConsensusMergeExpected(*cmeta, remote_state, 2, ""));
}
Expand Down
16 changes: 15 additions & 1 deletion src/yb/consensus/consensus_meta.cc
Original file line number Diff line number Diff line change
Expand Up @@ -197,15 +197,29 @@ const RaftConfigPB& ConsensusMetadata::pending_config() const {
void ConsensusMetadata::clear_pending_config() {
has_pending_config_ = false;
pending_config_.Clear();
pending_config_op_id_ = OpId();
UpdateActiveRole();
}

void ConsensusMetadata::set_pending_config(const RaftConfigPB& config) {
void ConsensusMetadata::set_pending_config(const RaftConfigPB& config, const OpId& config_op_id) {
has_pending_config_ = true;
pending_config_ = config;
pending_config_op_id_ = config_op_id;
UpdateActiveRole();
}

Status ConsensusMetadata::set_pending_config_op_id(const OpId& config_op_id) {
SCHECK(has_pending_config_, IllegalState, "Expected pending config to be set");
if (pending_config_op_id_.is_valid_not_empty() && pending_config_op_id_ != config_op_id) {
return STATUS_FORMAT(
InvalidArgument,
"Pending config OpId is already set to $0, but requested to overwrite with $1",
pending_config_op_id_, config_op_id);
}
pending_config_op_id_ = config_op_id;
return Status::OK();
}

const RaftConfigPB& ConsensusMetadata::active_config() const {
if (has_pending_config_) {
return pending_config();
Expand Down
7 changes: 6 additions & 1 deletion src/yb/consensus/consensus_meta.h
Original file line number Diff line number Diff line change
Expand Up @@ -44,6 +44,7 @@

#include "yb/gutil/macros.h"

#include "yb/util/opid.h"
#include "yb/util/status_fwd.h"

namespace yb {
Expand Down Expand Up @@ -126,7 +127,10 @@ class ConsensusMetadata {

// Set & clear the pending configuration.
void clear_pending_config();
void set_pending_config(const RaftConfigPB& config);
void set_pending_config(const RaftConfigPB& config, const OpId& config_op_id);
Status set_pending_config_op_id(const OpId& config_op_id);

OpId pending_config_op_id() { return pending_config_op_id_; }

// If a pending configuration is set, return it.
// Otherwise, return the committed configuration.
Expand Down Expand Up @@ -206,6 +210,7 @@ class ConsensusMetadata {
// configuration change pending.
// RaftConfig used by the peers when there is a pending config change operation.
RaftConfigPB pending_config_;
OpId pending_config_op_id_;

// Cached role of the peer_uuid_ within the active configuration.
PeerRole active_role_;
Expand Down
18 changes: 12 additions & 6 deletions src/yb/consensus/raft_consensus.cc
Original file line number Diff line number Diff line change
Expand Up @@ -3051,10 +3051,11 @@ Status RaftConsensus::ReplicateConfigChangeUnlocked(const ReplicateMsgPtr& repli
const RaftConfigPB& new_config,
ChangeConfigType type,
StdStatusCallback client_cb) {
LOG(INFO) << "Setting replicate pending config " << new_config.ShortDebugString()
<< ", type = " << ChangeConfigType_Name(type);
LOG_WITH_PREFIX(INFO) << "Setting replicate pending config " << new_config.ShortDebugString()
<< ", type = " << ChangeConfigType_Name(type);

RETURN_NOT_OK(state_->SetPendingConfigUnlocked(new_config));
// We will set pending config op id below once we have it.
RETURN_NOT_OK(state_->SetPendingConfigUnlocked(new_config, OpId()));

if (type == CHANGE_ROLE &&
PREDICT_FALSE(FLAGS_TEST_inject_delay_leader_change_role_append_secs)) {
Expand All @@ -3070,13 +3071,18 @@ Status RaftConsensus::ReplicateConfigChangeUnlocked(const ReplicateMsgPtr& repli
round->SetCallback(MakeNonTrackedRoundCallback(round.get(), std::move(client_cb)));
auto status = AppendNewRoundToQueueUnlocked(round);
if (!status.ok()) {
// We could just cancel pending config, because there is could be only one pending config.
// We could just cancel pending config, because there could be only one pending config that
// we've just set above and it corresponds to replicate_ref.
auto clear_status = state_->ClearPendingConfigUnlocked();
if (!clear_status.ok()) {
LOG(WARNING) << "Could not clear pending config: " << clear_status;
}
return status;
}
return status;

RETURN_NOT_OK(state_->SetPendingConfigOpIdUnlocked(round->id()));

return Status::OK();
}

void RaftConsensus::RefreshConsensusQueueAndPeersUnlocked() {
Expand Down Expand Up @@ -3371,7 +3377,7 @@ void RaftConsensus::NonTrackedRoundReplicationFinished(ConsensusRound* round,
LOG_WITH_PREFIX(INFO) << op_str << " replication failed: " << status << "\n" << GetStackTrace();

// Clear out the pending state (ENG-590).
if (IsChangeConfigOperation(op_type)) {
if (IsChangeConfigOperation(op_type) && state_->GetPendingConfigOpIdUnlocked() == round->id()) {
WARN_NOT_OK(state_->ClearPendingConfigUnlocked(), "Could not clear pending state");
}
} else if (IsChangeConfigOperation(op_type)) {
Expand Down
6 changes: 3 additions & 3 deletions src/yb/consensus/replica_state-test.cc
Original file line number Diff line number Diff line change
Expand Up @@ -99,7 +99,7 @@ TEST_F(RaftConsensusStateTest, TestPendingPersistent) {
ASSERT_OK(state_->LockForConfigChange(&lock));

config_.clear_opid_index();
ASSERT_OK(state_->SetPendingConfigUnlocked(config_));
ASSERT_OK(state_->SetPendingConfigUnlocked(config_, OpId()));
ASSERT_TRUE(state_->IsConfigChangePendingUnlocked());
ASSERT_FALSE(state_->GetPendingConfigUnlocked().has_opid_index());
ASSERT_TRUE(state_->GetCommittedConfigUnlocked().has_opid_index());
Expand All @@ -121,13 +121,13 @@ TEST_F(RaftConsensusStateTest, TestPersistentWrites) {
ASSERT_EQ(kInvalidOpIdIndex, state_->GetCommittedConfigUnlocked().opid_index());

config_.clear_opid_index();
ASSERT_OK(state_->SetPendingConfigUnlocked(config_));
ASSERT_OK(state_->SetPendingConfigUnlocked(config_, OpId()));
config_.set_opid_index(1);
ASSERT_OK(state_->SetCommittedConfigUnlocked(config_));
ASSERT_EQ(1, state_->GetCommittedConfigUnlocked().opid_index());

config_.clear_opid_index();
ASSERT_OK(state_->SetPendingConfigUnlocked(config_));
ASSERT_OK(state_->SetPendingConfigUnlocked(config_, OpId()));
config_.set_opid_index(2);
ASSERT_OK(state_->SetCommittedConfigUnlocked(config_));
ASSERT_EQ(2, state_->GetCommittedConfigUnlocked().opid_index());
Expand Down
37 changes: 22 additions & 15 deletions src/yb/consensus/replica_state.cc
Original file line number Diff line number Diff line change
Expand Up @@ -325,7 +325,8 @@ Status ReplicaState::CheckNoConfigChangePendingUnlocked() const {
return Status::OK();
}

Status ReplicaState::SetPendingConfigUnlocked(const RaftConfigPB& new_config) {
Status ReplicaState::SetPendingConfigUnlocked(
const RaftConfigPB& new_config, const OpId& config_op_id) {
DCHECK(IsLocked());
RETURN_NOT_OK_PREPEND(VerifyRaftConfig(new_config, UNCOMMITTED_QUORUM),
"Invalid config to set as pending");
Expand All @@ -340,12 +341,17 @@ Status ReplicaState::SetPendingConfigUnlocked(const RaftConfigPB& new_config) {
<< cmeta_->pending_config().ShortDebugString() << "; "
<< "New pending config: " << new_config.ShortDebugString();
}
cmeta_->set_pending_config(new_config);
cmeta_->set_pending_config(new_config, config_op_id);
CoarseTimePoint now;
RefreshLeaderStateCacheUnlocked(&now);
return Status::OK();
}

Status ReplicaState::SetPendingConfigOpIdUnlocked(const OpId& config_op_id) {
DCHECK(IsLocked());
return cmeta_->set_pending_config_op_id(config_op_id);
}

Status ReplicaState::ClearPendingConfigUnlocked() {
DCHECK(IsLocked());
if (!cmeta_->has_pending_config()) {
Expand All @@ -372,25 +378,26 @@ Status ReplicaState::SetCommittedConfigUnlocked(const RaftConfigPB& config_to_co
RETURN_NOT_OK_PREPEND(
VerifyRaftConfig(config_to_commit, COMMITTED_QUORUM), "Invalid config to set as committed");

// Only allow to have no pending config here when we want to do unsafe config change, see
// https://github.com/yugabyte/yugabyte-db/commit/9305a202b59eb805f80492c1b7412b0fbfd1ce76.
DCHECK(cmeta_->has_pending_config() || config_to_commit.unsafe_config_change());
// Compare committed with pending configuration, ensure they are the same.
// In the event of an unsafe config change triggered by an administrator,
// it is possible that the config being committed may not match the pending config
// because unsafe config change allows multiple pending configs to exist.
// Therefore we only need to validate that 'config_to_commit' matches the pending config
// if the pending config does not have its 'unsafe_config_change' flag set.
if (IsConfigChangePendingUnlocked()) {
if (!config_to_commit.unsafe_config_change()) {
const RaftConfigPB& pending_config = GetPendingConfigUnlocked();
if (!pending_config.unsafe_config_change()) {
// Pending will not have an opid_index, so ignore that field.
RaftConfigPB config_no_opid = config_to_commit;
config_no_opid.clear_opid_index();
// Quorums must be exactly equal, even w.r.t. peer ordering.
CHECK_EQ(GetPendingConfigUnlocked().SerializeAsString(), config_no_opid.SerializeAsString())
<< Substitute(
"New committed config must equal pending config, but does not. "
"Pending config: $0, committed config: $1",
pending_config.ShortDebugString(), config_to_commit.ShortDebugString());
}
// Pending will not have an opid_index, so ignore that field.
RaftConfigPB config_no_opid = config_to_commit;
config_no_opid.clear_opid_index();
// Quorums must be exactly equal, even w.r.t. peer ordering.
CHECK_EQ(GetPendingConfigUnlocked().SerializeAsString(), config_no_opid.SerializeAsString())
<< Substitute(
"New committed config must equal pending config, but does not. "
"Pending config: $0, committed config: $1",
pending_config.ShortDebugString(), config_to_commit.ShortDebugString());
}
cmeta_->set_committed_config(config_to_commit);
cmeta_->clear_pending_config();
Expand Down Expand Up @@ -666,7 +673,7 @@ Status ReplicaState::AddPendingOperation(const ConsensusRoundPtr& round, Operati
// messages were delayed.
const RaftConfigPB& committed_config = GetCommittedConfigUnlocked();
if (round->replicate_msg()->id().index() > committed_config.opid_index()) {
CHECK_OK(SetPendingConfigUnlocked(new_config));
CHECK_OK(SetPendingConfigUnlocked(new_config, round->id()));
} else {
LOG_WITH_PREFIX(INFO)
<< "Ignoring setting pending config change with OpId "
Expand Down
5 changes: 4 additions & 1 deletion src/yb/consensus/replica_state.h
Original file line number Diff line number Diff line change
Expand Up @@ -210,11 +210,14 @@ class ReplicaState {

// Sets the given configuration as pending commit. Does not persist into the peers
// metadata. In order to be persisted, SetCommittedConfigUnlocked() must be called.
Status SetPendingConfigUnlocked(const RaftConfigPB& new_config);
Status SetPendingConfigUnlocked(const RaftConfigPB& new_config, const OpId& config_op_id);
Status SetPendingConfigOpIdUnlocked(const OpId& config_op_id);

// Clears the pending config.
Status ClearPendingConfigUnlocked();

OpId GetPendingConfigOpIdUnlocked() { return cmeta_->pending_config_op_id(); }

// Return the pending configuration, or crash if one is not set.
const RaftConfigPB& GetPendingConfigUnlocked() const;

Expand Down
4 changes: 4 additions & 0 deletions src/yb/util/opid.h
Original file line number Diff line number Diff line change
Expand Up @@ -51,6 +51,10 @@ struct OpId {
return term == 0 && index == 0;
}

bool is_valid_not_empty() const {
return term > 0 && index > 0;
}

bool operator!() const {
return empty();
}
Expand Down

0 comments on commit f30111b

Please sign in to comment.