Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create temporary ledger chunks while recovery is in progress #3563

Merged
merged 44 commits into from
Feb 22, 2022
Merged
Show file tree
Hide file tree
Changes from 34 commits
Commits
Show all changes
44 commits
Select commit Hold shift + click to select a range
c8ba3dc
Added flag to KV
Feb 11, 2022
bcbf65a
Force new ledger chunk
Feb 11, 2022
e2ff39c
End-to-end test check
Feb 11, 2022
8e3913e
Fix
Feb 11, 2022
1734729
Remove TODO
Feb 11, 2022
7f766b0
Merge branch 'main' of github.com:microsoft/CCF into ledger_chunk_rec…
Feb 11, 2022
bd28113
Verify that a new chunk is created for recovery
Feb 11, 2022
8709fbb
.
Feb 11, 2022
93d6a84
Cleanup
Feb 11, 2022
9075519
Recover files are created on recovery
Feb 11, 2022
ecc8066
WIP - remove recovery extension to ledger files
Feb 14, 2022
fcb2852
Ledger open works
Feb 14, 2022
6a3afa9
Single recovery works after failed attempt
Feb 14, 2022
84fc427
Skip recovery files in ledger Python
Feb 14, 2022
8822251
Recovery after aborted recovery works on one node
Feb 15, 2022
00c947d
Also create temporary chunks on backups
Feb 15, 2022
4546b6e
Recovery works
Feb 15, 2022
cf2abc3
Merge branch 'main' of github.com:microsoft/CCF into recovery_tempora…
Feb 15, 2022
3169653
Merge branch 'main' of github.com:microsoft/CCF into recovery_tempora…
Feb 16, 2022
6b06ae0
e2e test almost there
Feb 16, 2022
dfd67f6
.
Feb 16, 2022
a382e71
Add unit test for ledger recovery
Feb 16, 2022
b3a7f69
Fix
Feb 16, 2022
01838ae
Unify recovery paths
Feb 16, 2022
3dc860c
Use fs::path
Feb 17, 2022
dddfecf
Fix build
Feb 17, 2022
b33427a
Finished unit tests for ledger.cpp
Feb 17, 2022
b00c6b7
Changelog
Feb 17, 2022
9b2f728
Docs
Feb 17, 2022
cf45d11
Add to suite
Feb 17, 2022
27e58a6
Do not create recovery chunks on open
Feb 17, 2022
eab905c
Stable order
Feb 17, 2022
62498e5
Fix
Feb 17, 2022
9320a8d
Merge branch 'main' into recovery_temporary_ledger_chunks
jumaffre Feb 17, 2022
3457060
Minor rename
Feb 17, 2022
6d5a256
Merge branch 'recovery_temporary_ledger_chunks' of github.com:jumaffr…
Feb 17, 2022
dfb9d97
Merge branch 'main' into recovery_temporary_ledger_chunks
jumaffre Feb 18, 2022
b9ff885
Recover from service no snapshot
Feb 18, 2022
08cc1b7
Merge branch 'main' into recovery_temporary_ledger_chunks
jumaffre Feb 18, 2022
2f52f1e
Merge branch 'main' into recovery_temporary_ledger_chunks
jumaffre Feb 18, 2022
9b89cee
Merge branch 'main' into recovery_temporary_ledger_chunks
jumaffre Feb 18, 2022
036852e
Merge branch 'main' into recovery_temporary_ledger_chunks
jumaffre Feb 21, 2022
df1763c
Merge branch 'main' into recovery_temporary_ledger_chunks
achamayou Feb 21, 2022
8ef30a2
fmt
Feb 22, 2022
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .daily_canary
Original file line number Diff line number Diff line change
@@ -1 +1 @@
chirp tv
chirp
6 changes: 6 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,12 @@ All notable changes to this project will be documented in this file.
The format is based on [Keep a Changelog](http://keepachangelog.com/en/1.0.0/)
and this project adheres to [Semantic Versioning](http://semver.org/spec/v2.0.0.html).

## Unreleased

## Changed

- Failed recovery procedures no longer block subsequent recoveries: `.recovery` ledger files are now created while the recovery is in progress and ignored or deleted by nodes on startup (#3563).

## [2.0.0-rc1]

### Added
Expand Down
5 changes: 4 additions & 1 deletion doc/operations/ledger_snapshot.rst
Original file line number Diff line number Diff line change
Expand Up @@ -40,7 +40,10 @@ The listing below is an example of what a ledger directory may look like:
-rw-rw-r-- 1 user user 1.1M Jan 31 14:00 ledger_92502-97520.committed
-rw-rw-r-- 1 user user 553K Jan 31 14:00 ledger_97521 # File still in progress

.. note:: On startup, a CCF node started with existing ledger files may suffix some of the file names with ``.corrupted`` if the ledger file cannot be parsed, depending on the sequence number the node will join from.
.. note::

- On startup, a CCF node started with existing ledger files may suffix some of the file names with ``.corrupted`` if the ledger file cannot be parsed, depending on the sequence number the node will join from.
- While the :doc:`/operations/recovery` procedure is in progress, new ledger files are suffixed with ``.recovery``. These files are automatically renamed (i.e. recovery suffix removed) once the recovery procedure is complete. ``.recovery`` files are automatically discarded on node startup so that a failed recovery attempt does not prevent further recoveries.

Snapshots
---------
Expand Down
11 changes: 11 additions & 0 deletions include/ccf/ds/nonstd.h
Original file line number Diff line number Diff line change
Expand Up @@ -160,4 +160,15 @@ namespace nonstd

return s;
}

static inline std::string remove_suffix(
const std::string& s, const std::string& suffix)
{
if (ends_with(s, suffix))
{
return s.substr(0, s.size() - suffix.size());
}

return s;
}
}
14 changes: 13 additions & 1 deletion python/ccf/ledger.py
Original file line number Diff line number Diff line change
Expand Up @@ -37,6 +37,7 @@
SERVICE_INFO_TABLE_NAME = "public:ccf.gov.service.info"

COMMITTED_FILE_SUFFIX = ".committed"
RECOVERY_FILE_SUFFIX = ".recovery"

# Key used by CCF to record single-key tables
WELL_KNOWN_SINGLETON_TABLE_KEY = bytes(bytearray(8))
Expand Down Expand Up @@ -110,6 +111,7 @@ def range_from_filename(filename: str) -> Tuple[int, Optional[int]]:
elements = (
os.path.basename(filename)
.replace(COMMITTED_FILE_SUFFIX, "")
.replace(RECOVERY_FILE_SUFFIX, "")
.replace("ledger_", "")
.split("-")
)
Expand Down Expand Up @@ -822,6 +824,7 @@ def __init__(
self,
directories: List[str],
committed_only: bool = True,
read_recovery_files: bool = False,
insecure_skip_verification: bool = False,
):

Expand All @@ -830,8 +833,17 @@ def __init__(
ledger_files: List[str] = []
for directory in directories:
for path in os.listdir(directory):
if committed_only and not path.endswith(COMMITTED_FILE_SUFFIX):
sanitised_path = path
if path.endswith(RECOVERY_FILE_SUFFIX):
wintersteiger marked this conversation as resolved.
Show resolved Hide resolved
sanitised_path = path[: -len(RECOVERY_FILE_SUFFIX)]
if not read_recovery_files:
continue

if committed_only and not sanitised_path.endswith(
COMMITTED_FILE_SUFFIX
):
continue

chunk = os.path.join(directory, path)
# The same ledger file may appear multiple times in different directories
# so ignore duplicates
Expand Down
7 changes: 5 additions & 2 deletions src/consensus/aft/raft.h
Original file line number Diff line number Diff line change
Expand Up @@ -345,7 +345,10 @@ namespace aft
}

void init_as_backup(
Index index, Term term, const std::vector<Index>& term_history) override
Index index,
Term term,
const std::vector<Index>& term_history,
Index recovery_start_index = 0) override
{
// This should only be called when the node resumes from a snapshot and
// before it has received any append entries.
Expand All @@ -356,7 +359,7 @@ namespace aft

state->view_history.initialise(term_history);

ledger->init(index);
ledger->init(index, recovery_start_index);
snapshotter->set_last_snapshot_idx(index);

become_aware_of_new_term(term);
Expand Down
2 changes: 1 addition & 1 deletion src/consensus/aft/test/logging_stub.h
Original file line number Diff line number Diff line change
Expand Up @@ -23,7 +23,7 @@ namespace aft

LedgerStubProxy(const ccf::NodeId& id) : _id(id) {}

virtual void init(Index idx) {}
virtual void init(Index, Index) {}

virtual void put_entry(
const std::vector<uint8_t>& original,
Expand Down
9 changes: 6 additions & 3 deletions src/consensus/ledger_enclave.h
Original file line number Diff line number Diff line change
Expand Up @@ -121,7 +121,8 @@ namespace consensus
*/
void truncate(Index idx)
{
RINGBUFFER_WRITE_MESSAGE(consensus::ledger_truncate, to_host, idx);
RINGBUFFER_WRITE_MESSAGE(
consensus::ledger_truncate, to_host, idx, false /* no recovery */);
}

/**
Expand All @@ -138,10 +139,12 @@ namespace consensus
* Initialise ledger at a given index (e.g. after a snapshot)
*
* @param idx Index to start ledger from
* @param recovery_start_idx Index at which the recovery starts
*/
void init(Index idx)
void init(Index idx = 0, Index recovery_start_idx = 0)
{
RINGBUFFER_WRITE_MESSAGE(consensus::ledger_init, to_host, idx);
RINGBUFFER_WRITE_MESSAGE(
consensus::ledger_init, to_host, idx, recovery_start_idx);
}
};
}
9 changes: 7 additions & 2 deletions src/consensus/ledger_enclave_types.h
Original file line number Diff line number Diff line change
Expand Up @@ -29,6 +29,7 @@ namespace consensus
DEFINE_RINGBUFFER_MSG_TYPE(ledger_truncate),
DEFINE_RINGBUFFER_MSG_TYPE(ledger_commit),
DEFINE_RINGBUFFER_MSG_TYPE(ledger_init),
DEFINE_RINGBUFFER_MSG_TYPE(ledger_open),

/// Create and commit a snapshot. Enclave -> Host
DEFINE_RINGBUFFER_MSG_TYPE(snapshot),
Expand All @@ -53,15 +54,19 @@ DECLARE_RINGBUFFER_MESSAGE_PAYLOAD(
consensus::Index,
consensus::LedgerRequestPurpose);

DECLARE_RINGBUFFER_MESSAGE_PAYLOAD(consensus::ledger_init, consensus::Index);
DECLARE_RINGBUFFER_MESSAGE_PAYLOAD(
consensus::ledger_init,
consensus::Index /* start idx */,
consensus::Index /* recovery start idx */);
DECLARE_RINGBUFFER_MESSAGE_PAYLOAD(
consensus::ledger_append,
bool /* committable */,
bool /* force chunk */,
std::vector<uint8_t>);
DECLARE_RINGBUFFER_MESSAGE_PAYLOAD(
consensus::ledger_truncate, consensus::Index);
consensus::ledger_truncate, consensus::Index, bool /* recovery mode */);
DECLARE_RINGBUFFER_MESSAGE_PAYLOAD(consensus::ledger_commit, consensus::Index);
DECLARE_RINGBUFFER_MESSAGE_NO_PAYLOAD(consensus::ledger_open);
DECLARE_RINGBUFFER_MESSAGE_PAYLOAD(
consensus::snapshot,
consensus::Index /* snapshot idx */,
Expand Down
16 changes: 16 additions & 0 deletions src/ds/files.h
Original file line number Diff line number Diff line change
Expand Up @@ -12,8 +12,13 @@
#include <string>
#include <vector>

#define FMT_HEADER_ONLY
#include <fmt/format.h>

namespace files
{
namespace fs = std::filesystem;

/**
* @brief Checks if a path exists
*
Expand Down Expand Up @@ -122,4 +127,15 @@ namespace files
{
return dump(std::vector<uint8_t>(data.begin(), data.end()), file);
}

void rename(const fs::path& src, const fs::path& dst)
{
std::error_code ec;
fs::rename(src, dst, ec);
if (ec)
{
throw std::logic_error(fmt::format(
"Could not rename file {} to {}: {}", src, dst, ec.message()));
}
}
}
Loading