Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Introduce state snapshot history #698

Merged
merged 8 commits into from
Apr 12, 2022
Merged

Conversation

murerfel
Copy link
Contributor

@murerfel murerfel commented Apr 4, 2022

In order to support rollbacks on state, I introduced a new snapshot repository.

State Snapshot Repository

  • It's a layer between the state handler and the actual file operations.
  • The snapshot repository will create a new file for each state write/update.
  • It has a fixed-size circular buffer, so it can keep the last n state snapshots, where n can be configured (set to 40 currently, which is the number of sidechain blocks per parentchain block).

Each state snapshot file is prefixed by a timestamp (nanosecond ticks), so in theory we could recover the snapshot history even after a crash.

Class hierarchy overview

Shows how the state snapshot repository fits into the state handler. It's a layer between the state handler and the file I/O. The file I/O has also been abstracted to allow unit testing of these components. The StateSnapshotRepositoryLoader can load the repository from an existing file hierarchy. This is necessary for when we provision the state in a separate call first, and then start the worker, and we want those state files to be picked up.

state_snapshot_repository

Changes to state handling

  • Shard state must be initialized explicitly, using the StateHandler.
    • Before, it could be implicitly initialized upon first loading. Also, we had 3 different ways to initialize a shard.
    • An e-call was added to initialize a shard (instead of a duplicate init_shard function on the untrusted side)
  • The state handler is not a GlobalFileStateHandler with a global lock backend anymore. It is initialized as a component in the enclave and accessed just as the other components. That also means it's only instantiated once.
  • The state key does not have to be read from file for each state operation anymore. It is passed upon initialization to the state file I/O component.

Closes #688

@murerfel murerfel self-assigned this Apr 4, 2022
core-primitives/enclave-api/ffi/src/lib.rs Show resolved Hide resolved
core-primitives/settings/src/lib.rs Outdated Show resolved Hide resolved
core-primitives/stf-executor/src/executor.rs Show resolved Hide resolved
core-primitives/stf-state-handler/src/handle_state.rs Outdated Show resolved Hide resolved
core-primitives/stf-state-handler/src/file_io.rs Outdated Show resolved Hide resolved
service/src/main.rs Show resolved Hide resolved
@murerfel murerfel marked this pull request as ready for review April 5, 2022 12:30
@murerfel murerfel force-pushed the feature/fm-state-snapshots-688 branch from c4c74d9 to eb7619a Compare April 7, 2022 08:02
Copy link
Contributor

@haerdib haerdib left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I've made multiple comments about the same thing. Feel free to ignore most of them, I think to read a few of them makes it clear what I'm discontent with:
The abstraction level of the different components. All, from StateSnapshot"Repository" to StateFileIo know about Shards & repositories & files. I think we should draw a boundary somewhere: StateSnapshot does not need to know about files or repositories, while the StateFileIo does not necessarily need to know about Shards.

What do you think?

core-primitives/settings/src/lib.rs Outdated Show resolved Hide resolved
core-primitives/stf-state-handler/src/file_io.rs Outdated Show resolved Hide resolved
core-primitives/stf-state-handler/src/file_io.rs Outdated Show resolved Hide resolved
core-primitives/stf-state-handler/src/file_io.rs Outdated Show resolved Hide resolved
Copy link
Contributor

@clangenb clangenb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another big fish here. Looks like very nice and clean abstractions.

I only have a question: Have you considered how reverting to a state would work if we don't keep a snapshot for every single sidechain block? I think the current flow does not allow that because you can only find a snapshot by it's hash. Is this correct?

.github/workflows/build_and_test.yml Show resolved Hide resolved
core-primitives/settings/src/lib.rs Outdated Show resolved Hide resolved
core-primitives/stf-state-handler/src/file_io.rs Outdated Show resolved Hide resolved
core-primitives/stf-state-handler/src/file_io.rs Outdated Show resolved Hide resolved
core-primitives/stf-state-handler/src/file_io.rs Outdated Show resolved Hide resolved
@murerfel
Copy link
Contributor Author

I only have a question: Have you considered how reverting to a state would work if we don't keep a snapshot for every single sidechain block? I think the current flow does not allow that because you can only find a snapshot by it's hash. Is this correct?

That's how it's done now, correct. Based on this implementation, I think it should be reasonably simple to change to a mechanism where we don't keep all the state snapshots as files, but only every n-th state. And when we revert, we revert to the snapshot x that is exactly x or the closest older snapshot.

Copy link
Contributor

@clangenb clangenb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you very much for this clean PR, answering the questions and incorporating my feedback!

I am totally fine with the enhancement issues! I approve. 👍

@haerdib haerdib mentioned this pull request Apr 11, 2022
Copy link
Contributor

@haerdib haerdib left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks very good now. Minor remarks only.

core-primitives/sgx/io/src/lib.rs Show resolved Hide resolved
core-primitives/stf-state-handler/src/file_io.rs Outdated Show resolved Hide resolved
type StateType = StfState;
type HashType = H256;

fn load(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
fn load(
fn load_and_initialize(

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

or mabye load_or_initialize

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It does not initialize anything, it only loads. Returns an error if you try to load something that does not exist.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does this part do then?

0 => {
debug!("state at {:?} is empty. will initialize it.", state_path);
Stf::init_state().state
},

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right! I think this part is a relic from the times when this function did initialize. I'll remove it 👍

Felix Müller added 8 commits April 12, 2022 11:29
Implemented in a circular buffer to keep the amount of snapshots
kept constanst
Closes #688
file names and directory names are derived from shard id and state id
in the state file I/O
acts like a state key cache that can be updated during runtime
and is used by the state file i/o
@murerfel murerfel force-pushed the feature/fm-state-snapshots-688 branch from bcc9a2f to bd39d4a Compare April 12, 2022 09:56
Copy link
Contributor

@haerdib haerdib left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good work, thanks!

@murerfel murerfel merged commit 2523ccd into master Apr 12, 2022
@murerfel murerfel deleted the feature/fm-state-snapshots-688 branch April 12, 2022 11:11
haerdib added a commit to ajuna-network/worker that referenced this pull request May 9, 2022
* Add a local-setup config to the tutorial (integritee book) (integritee-network#666)

* Update call status to `InSidechainBlock` only after block import (integritee-network#676)

* rename on_block_created to on_block_imported

* update tx status only after block import

* fix clippy

* fix cargo test

* remove waiting time from demos

* readd removed comments

* udpate comments

* extract call removal logic from on_block_imported

* Bump substrate to commit f5f286db0da9... (integritee-network#669)

* Bump substrate

Bump substrate to commit 8df8d908c4d77a8dd19751784b6aca62159ddda8

Remove dependencies of scale-info 1.0.0 and parity-scale-codec 2.3.1

Upgrade substrate to commit f5f286d... substrate fix for : sp-core's full_crypto feature flag breaks no_std builds

* frame-metadata from crates.io

* Update CI to use updated node

Co-authored-by: echevrier <[email protected]>

* Dockerize (integritee-network#668)

* Dockerize the binaries integritee-network#579

* Add tags to run as well integritee-network#579

* Fix running binary in docker integritee-network#579

* Add more files to docker integritee-network#579

* Add conditional runs

* Add +x earlier

* Rename docker-service to integritee-demo-validateer. Fixes integritee-network#579

* Lift clap to version 3.16 and move stf cli to cli crate. (integritee-network#679)

* Lift clap to version 3.16 and move stf cli to cli crate.

* Adapt demo script parameters

* Add cli examples to README.md

* update comment descriptions

Co-authored-by: Gaudenz Kessler <[email protected]>

* Refactor global components and initialization process (integritee-network#677)

* Consistently (re-)use the component container
* RPC server is initialized before registering on the parentchain
* Fix issue with sidechain block import when latest parentchain block is already too new

Closes integritee-network#545 integritee-network#600 integritee-network#684 integritee-network#683

* create alive service to deterine that the service is up, running and registered (integritee-network#697)

* create alive service

* replace alive with initialized

Co-authored-by: Gaudenz Kessler <[email protected]>

* Add port config for the untrusted http server for `is_initialized` (integritee-network#700)

* rename wrong `signing_key` function name to `state_key` (integritee-network#704)

* Move top pool and top pool author crates to core primitives (integritee-network#705)

* Introduce state snapshot history (integritee-network#698)

Closes integritee-network#688

* Add header to sidechain block (integritee-network#699)

Closes integritee-network#686 

Co-authored-by: Gaudenz Kessler <[email protected]>

* introduce layer for indirection for sidechainblock (integritee-network#716)

Closes integritee-network#710 

Co-authored-by: Gaudenz Kessler <[email protected]>
Co-authored-by: Felix Müller <[email protected]>

* Persist web-socket connections (integritee-network#718)

Complete overhaul of the trusted web-socket server:
* using MIO to serve concurrent connections
* Server keeps connections open until a client requests a close
* Changed our clients to match this pattern

* Upgrade to polkadot v0.9.19 (integritee-network#720)

* Bump substrate to polkadot-v0.9.19
Bump RUNTIME_SPEC_VERSION to 9

Set substrate-api-client to polkadot-v0.9.19
Set integritee-node to polkadot-v0.9.19

Set integritee-node to master

* Cargo update + reabse

* Update github actions

Co-authored-by: echevrier <[email protected]>

* fix some cargo.tomls

* fix cargo.lock

* remove game engine

* update teerex module

* fix itp-registry-storage

* cargo update

* resinsert game engine

* some further clean up

* update teaclave

* add ajuna commands

* cargo update

* carog update

* update ajuna cli

* fix some thins

* Signed sidechain block

* fix shard_id getter

* fix load_initialized

* fix sgx externalities import

* fix compilation issues

* remove patches and cargo update

* update demo docu

* add bin folder to docu

* remove hard coded ports. Not necesasry

* add ./

* update pallet verions

* make tests compile again

* fix import in sgx mode

* merge from upstream number two.. remove itp storage verifier

* cargo fmt

* fix tests

* update doc once again

* fix script and queue game

* cargo update + some code clean up

Co-authored-by: Felix Müller <[email protected]>
Co-authored-by: echevrier <[email protected]>
Co-authored-by: echevrier <[email protected]>
Co-authored-by: mosonyi <[email protected]>
Co-authored-by: gaudenzkessler <[email protected]>
Co-authored-by: Gaudenz Kessler <[email protected]>
Co-authored-by: Felix Müller <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Persist state snapshots for each sidechain block
3 participants