fix: Fault-tolerant storage engine errors for write operations #1399

acelyc111 · 2023-03-16T12:26:57Z

Make Pegasus fault-tolerant storage engine (i.e. RocksDB) error
for write operations. The unrecoverable RocksDB instance
directory will be moved to a .err path and the replica will be
closed and removed from stub. Then meta server will detect the
missing replica and recover automatically.
This patch only deal with storage engine unrecoverable errors
for write operations.

src/common/fs_manager.cpp

empiredan · 2023-03-21T04:02:06Z

src/replica/test/mock_utils.h

@@ -46,7 +46,7 @@ class mock_replication_app_base : public replication_app_base
    explicit mock_replication_app_base(replica *replica) : replication_app_base(replica) {}

    error_code start(int, char **) override { return ERR_NOT_IMPLEMENTED; }
-    error_code stop(bool) override { return ERR_NOT_IMPLEMENTED; }
+    error_code stop(bool) override { return ERR_OK; }


Why change to ERR_OK ?

empiredan · 2023-03-21T04:51:57Z

src/replica/prepare_list.cpp


        int count = 0;
        mutation_ptr mu = get_mutation_by_decree(last_committed_decree() + 1);

        while (mu != nullptr && mu->is_ready_for_commit() && mu->data.header.ballot >= last_bt) {
            _last_committed_decree++;
            last_bt = mu->data.header.ballot;
-            _committer(mu);
+            ERR_LOG_PREFIX_AND_RETURN_NOT_OK(_committer(mu), "commit error in COMMIT_ALL_READY");


Once failed, should everything be rolled back ?

src/replica/replica_stub.cpp

empiredan · 2023-03-21T11:04:29Z

src/replica/replica_2pc.cpp

@@ -542,7 +545,10 @@ void replica::on_prepare(dsn::message_ex *request)
    }

    error_code err = _prepare_list->prepare(mu, status(), pop_all_committed_mutations);
-    CHECK_EQ_MSG(err, ERR_OK, "prepare mutation failed");
+    if (err != ERR_OK) {


Since for prepare() there is a call chain prepare_list::prepare() => prepare_list::commit() => replica::execute_mutation() => _app->apply_mutation() => pegasus_server_impl::on_batched_write_requests() => pegasus_server_write::on_batched_write_requests() => pegasus_server_write::on_batched_writes() where rocksdb write interface will be called, is it necessary to call handle_local_failure() for error ?

empiredan · 2023-03-21T11:23:34Z

Currently primary replica server respond to client after rocksdb has been written. However, rocksdb write interface may return kCorruption or kIOError, which will be returned to client and client will think this request has failed. In fact all of primary and secondary logs have been written successfully thus this request should be considered successful. Client will choose to write again and will lead to inconsistency for the non-idempotent writes such as incr, check_and_set and check_and_mutate.

To solve this problem, I think we can make rocksdb write asynchronous. Once fail to write rocksdb asynchronously, for example, kCorruption or kIOError, just remove the replica and move the rocksdb directory to .err and move this primary replica to other secondary replica. The consistency will be guaranteed if and only if logs are consistent. We can just write rocksdb asynchrously.

github-actions bot added the cpp label Mar 16, 2023

acelyc111 changed the title ~~fix: Trash the unrecoverable rocksDB instance to .err path~~ fix: Fault-tolerant storage engine errors for write operations Mar 17, 2023

acelyc111 force-pushed the return_code branch 2 times, most recently from f90ffd7 to 7784c7e Compare March 17, 2023 12:52

acelyc111 marked this pull request as ready for review March 18, 2023 05:31

empiredan reviewed Mar 21, 2023

View reviewed changes

acelyc111 force-pushed the return_code branch 3 times, most recently from 464603e to fcfd1a9 Compare March 27, 2023 08:30

fix: Fault-tolerant storage engine errors for write operations

50c38f9

acelyc111 force-pushed the return_code branch from fcfd1a9 to 50c38f9 Compare March 27, 2023 08:31

acelyc111 added 4 commits March 27, 2023 17:37

commit void

c4b000e

committer void

97c488c

apply_mutation ret

6867b2b

ut

734f241

github-actions bot added the scripts label Mar 28, 2023

acelyc111 closed this Apr 11, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: Fault-tolerant storage engine errors for write operations #1399

fix: Fault-tolerant storage engine errors for write operations #1399

acelyc111 commented Mar 16, 2023 •

edited

Loading

empiredan Mar 21, 2023

empiredan Mar 21, 2023

empiredan Mar 21, 2023

empiredan commented Mar 21, 2023

fix: Fault-tolerant storage engine errors for write operations #1399

fix: Fault-tolerant storage engine errors for write operations #1399

Conversation

acelyc111 commented Mar 16, 2023 • edited Loading

empiredan Mar 21, 2023

Choose a reason for hiding this comment

empiredan Mar 21, 2023

Choose a reason for hiding this comment

empiredan Mar 21, 2023

Choose a reason for hiding this comment

empiredan commented Mar 21, 2023

acelyc111 commented Mar 16, 2023 •

edited

Loading