Make storage async #120

dan-da · 2024-03-20T02:46:49Z

This is a first cut at making the storage layer async in neptune-core.

I am making it a draft PR because I have a few remaining todos to polish things up:

add sync ~~and storage~~ [1] benches from twenty-first, modified as needed (easy)
investigate removal of impl Mmr for ArchivalMmr. done, see writeup.
investigate remaining runtime error(s). (do they also occur on master?)
rename crate::locks::sync to crate::locks::std. (easy)
rebase and squash commits, and/or improve commit messages.

stretch goal:

impl MmrAsync trait, ie approach 2 in this writeup. (done: see Make storage and mmr trait async, attempt 2. #121)

[1] storage benches put aside for now because they would require async support in divan which appears to be in progress, or else a re-write without divan.

Present status:

all tests and doctests pass.
fixed a runtime error: "In order to be transferred, a Block must have a non-None proof field." This seems to have been introduced elsewhere, not due to async mods.
One node (of 3) hits this runtime error:

thread 'main' panicked at /home/danda/dev/neptune/neptune-core/src/models/state/wallet/wallet_state.rs:559:9:
assertion `left == right` failed: Mutator set in wallet-handler must agree with that from applied block

there is a warning:

"2024-03-20T02:35:03.002579345Z  WARN ThreadId(03) neptune_core::mine_loop: Received block is timestamped in the future; mining on future-timestamped block."

This is an unmodified copy of twenty_first::storage at twenty-first revision 890c451e4e513018d8500bedcd5bf76dd0bafdd9 (master) It is not yet incorporated into the build.

fixes error: In order to be transferred, a Block must have a non-None proof field. ProofType enum enables specifying/transferring an unimplemented Proof. This is only temporary. BlockType enum enables specifying Genesis vs Standard block. A Standard block has a ProofType The Genesis block has no ProofType

Under crate::locks we previously had: - sync - tokio Each contains an impl of AtomicRw and AtomicMutex, with basically the same API. Yet: - sync refers to synchronous locks, ie sync vs async. - tokio refers to the tokio::sync lock implementation. So the names are referring to different things. Instead we change it to: - std - tokio Now each refers to a lock implementation, ie std::sync and tokio::sync. note: we could instead have changed `tokio` to `async`, but then there might be multiple async lock impls to choose from. So it seems cleanest to use the name of each impl.

Moved this benchmark over from twenty_first. Presently unable to build the twenty_first db_* bench tests because the storage layer is now async and the divan bench crate doesn't yet support async. However, it may soon, see: nvzqz/divan#39

dan-da · 2024-03-20T18:17:47Z

Regarding ArchivalMmr and the Mmr trait

Decoupling ArchivalMmr from the (non-async) Mmr trait seems complex. That trait is used in a number of places, such as:

  MutatorSetKernel<MMR: Mmr<Hash>>
  MsMembershipProof::batch_update_from_addition()
  RemovalRecord::batch_update_from_addition()
  mutator_set::insert_mock_item()
  mutator_set::remove_mock_item()

MutatorSetKernel is the most entangled. It is defined as:

    pub struct MutatorSetKernel<MMR: Mmr<Hash>> {
        pub aocl: MMR,
        pub swbf_inactive: MMR,
        pub swbf_active: ActiveWindow,
    }

It is instantiated in other locations as both ArchivalMmr (async) and MmrAccumulator (sync, from twenty_first). It has several complex methods, and is non-trivial.

Presently, I have ArchivalMmr implementing Mmr trait by spawning a new OS thread for each trait-method call which creates a new tokio runtime, which calls the ArchivalMmr async method that actually implements the functionality (and interacts with storage layer). Example:

    fn append(&mut self, new_leaf: Digest) -> MmrMembershipProof<H> {
        std::thread::scope(|s| {
            s.spawn(|| {
                let runtime = tokio::runtime::Runtime::new().unwrap();
                runtime.block_on(self.append_async(new_leaf))
            })
            .join()
            .unwrap()
        })
    }

This works, but seems quite inefficient. Unfortunately though the only way I know of to call an async method from a sync method already executing in an async runtime is to spawn another thread with a new runtime.

Worse though than inefficient, is that this actually makes the async calling fn block until the thread is finished. So for ArchivalMmr, this is no better (actually worse) than before this PR with regards to blocking on storage calls. Since ArchivalMmr is used heavily by MutatorSetKernel, this seems a big problem.

However, It may be adequate for merging this PR, and then I can optimize/improve in a follow-up.

Optimizing: Getting rid of the thread::spawn()

Brainstorming approaches.

An improvement, though not a solution, would be to create a long-lived thread for each ArchivalMmr instance with a 2-way message channel. So instead of creating a thread per method call, we would pass tasks and results back and forth. This eliminates the cost of creating an OS thread plus tokio runtime for each method call. Unfortunately however it still means that ArchivalMmr methods block on storage, and our goal is to avoid that.
An (obvious?) approach is to make a new MmrAsync trait and a new MmrAccumulatorAsync that also uses it. That should work and might be the best solution, but it forces code to become async that isn't presently, and doesn't really need to be. It just becomes forced into it because of implementing or using the Mmr trait. This also gets messy because implementors of various traits from tasm-lib are involved.

Do not use any Mmr trait and keep MmrAccumulator sync. This would require that MutatorSetKernel can somehow accomodate both sync and async. It could mean adding a MutatorSetKernelAsync that duplicates a lot of logic. Or it could mean that we define it something like:

    pub enum MmrType {
        ArchivalMmr,
        MmrAccumulator,
    }

    pub struct MutatorSetKernel {
        pub aocl: MmrType,
        pub swbf_inactive: MmrType,
        pub swbf_active: ActiveWindow,
    }

The difficulty here is that all MutatorSetKernel methods must use a ton of match statements like:

   match self.aocl {
       MmrType::ArchivalMmr(m) => m.count_leaves().await,
       MmrType::MmrAccumulator(m) => m.count_leaves(),
   }

This is unwieldy. (though probably could be made less ugly with a macro)

Both approaches 2 and 3 require MutatorSetKernel to become async, which means that anything using it must also be async. Approach 3 allows MmrAccumulator to remain sync, is the primary advantage, at cost of making MutatorSetKernel more complex.

A final approach I considered would impl ArchivalMmr with a Vec instead of a StorageVec. So it would not be database backed, but would instead load all data from DB at creation, store pending writes, and persist back on request. This should allow ArchivalMmr trait methods to remain sync. Async methods would exist for loading and persisting data.

The drawback is that all data must be loaded up-front and remain in mem. I'm unsure of how much data we are talking about, but my guess is that it may be huge eventually, and not possible to load/store all of it in RAM.

Specifically, ArchivalMmr is used for MutatorSetKernel::aocl and MutatorSetKernel::swbf_inactive fields.

All this considered, approach 2 seems to me the most straight-forward and the one I intend to pursue.

dan-da · 2024-03-22T02:11:42Z

Ok, I've made branch make_storage_and_mmr_trait_async with approach 2 (async Mmr trait) that builds cleanly, though not tests yet.

ArchivalMmr no longer needs to spawn a thread plus runtime for each trait method invocation. So that's a win.

I do have to spawn a thread in one other place though: RemovalRecordsIntegrity::rust_shadow(). This is in the impl CompiledProgram for RemovalRecordsIntegrity trait impl, and the trait is defined in tasm-lib and has default method impls, so that fn cannot be made async easily.

In order to get the program to build, I had to comment out several impls of tasm-lib traits. These seem mainly related to testing, and I think they really belong within test module anyway. I'm not certain they can be made to build again unless the tasm traits are made async as well.

my next task will be to get the tests building again.

dan-da · 2024-03-23T06:27:01Z

tests are building cleanly and passing in branch make_storage_and_mmr_trait_async.

except for some proptests I had to comment-out for now because proptest crate/macro is not async compatible. I see somebody created a proptest_async crate recently (yesterday?) but it only works with async_std, not tokio yet. The author states it would be easy to add tokio support.

The thread spawn in RemovalRecordsIntegrity::rust_shadow() remains. I don't know how often that fn will be called, or what impact it might have on performance, if any. For now I don't have a better solution, but it may be possible to refactor it out for those who understand tasm-lib well.

Anyway, I am happier with the code in make_storage_and_mmr_trait_async, so after a little more cleanup/review I will open a new PR for that branch that will obsolete this PR.

aszepieniec · 2024-03-24T09:58:07Z

This error

thread 'main' panicked at /home/danda/dev/neptune/neptune-core/src/models/state/wallet/wallet_state.rs:559:9:
assertion `left == right` failed: Mutator set in wallet-handler must agree with that from applied block

indicates that somewhere the mutator set removal records or membership proofs are being updated incorrectly. The following propositions are probably true (quantified over unknown details):

The error was introduced as a side-effect of simplifying the Block struct, specifically the Body part. Previously it held two mutator set accumulators, one before and one after applying the additions and removals given by the transaction. This is redundant, but also makes keeping track of things easier.
The error is caused by incorrectly updating removal records or membership proofs with the update contained in a block. The new block structure makes this update step rather more complex.
It is possible to reduce this error to a failing unit test, by which I mean a test function that does not require running multiple interacting nodes or even threads.

dan-da · 2024-04-02T19:54:15Z

closing in favor of #124.

dan-da added 16 commits March 17, 2024 10:16

chore: moving storage from twenty-first

8e4f3a6

This is an unmodified copy of twenty_first::storage at twenty-first revision 890c451e4e513018d8500bedcd5bf76dd0bafdd9 (master) It is not yet incorporated into the build.

wip: make storage layer async. passes all tests except doctests

37247a3

wip: remove util_types/sync

87a630b

wip. storage cleanups, and impl DbtVec stream tests

081048f

wip. fix storage_vec module tests, cleanup

a368996

wip. cargo fmt

0e61893

wip. misc cleanups. no more warnings

6b16bb7

wip. move mock mmr test code into test module

1c92969

wip. remove dep on twenty_first::mock

5579e3a

wip. lint

88a6814

wip. update to latest lib-tasm

d19dd82

doc: fix doctests. all tests now passing

70581d5

doc: update DbSchema docs, remove unused files

5429901

perf: add sync_atomic bench test

35af634

Moved this benchmark over from twenty_first. Presently unable to build the twenty_first db_* bench tests because the storage layer is now async and the divan bench crate doesn't yet support async. However, it may soon, see: nvzqz/divan#39

dan-da mentioned this pull request Mar 23, 2024

Make storage and mmr trait async, attempt 2. #121

Closed

This was referenced Mar 26, 2024

harmonize timestamp types #117

Closed

make storage async, attempt 3. #124

Merged

dan-da closed this Apr 2, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make storage async #120

Make storage async #120

dan-da commented Mar 20, 2024 •

edited

Loading

dan-da commented Mar 20, 2024 •

edited

Loading

dan-da commented Mar 22, 2024 •

edited

Loading

dan-da commented Mar 23, 2024 •

edited

Loading

aszepieniec commented Mar 24, 2024

dan-da commented Apr 2, 2024

Make storage async #120

Make storage async #120

Conversation

dan-da commented Mar 20, 2024 • edited Loading

dan-da commented Mar 20, 2024 • edited Loading

Regarding ArchivalMmr and the Mmr trait

Optimizing: Getting rid of the thread::spawn()

dan-da commented Mar 22, 2024 • edited Loading

dan-da commented Mar 23, 2024 • edited Loading

aszepieniec commented Mar 24, 2024

dan-da commented Apr 2, 2024

dan-da commented Mar 20, 2024 •

edited

Loading

dan-da commented Mar 20, 2024 •

edited

Loading

dan-da commented Mar 22, 2024 •

edited

Loading

dan-da commented Mar 23, 2024 •

edited

Loading