-
Notifications
You must be signed in to change notification settings - Fork 665
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: stateless validation jobs in test mode #10248
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
overall everything makes sense, added a couple of suggestions
chain/chain/src/chain.rs
Outdated
}) | ||
.collect() | ||
.collect::<Result<Vec<Vec<UpdateShardJob>>, Error>>()? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
using functional style iterator at this point only makes it more confusing, I suggest rewriting this using plain for loop
chain/chain/src/update_shard.rs
Outdated
@@ -232,8 +252,8 @@ fn apply_old_chunk( | |||
|
|||
let storage_config = RuntimeStorageConfig { | |||
state_root: *prev_chunk_extra.state_root(), | |||
use_flat_storage: true, | |||
source: crate::types::StorageDataSource::Db, | |||
use_flat_storage: shard_context.storage_data_source == StorageDataSource::Db, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suggest to keep use_flat_storage: true
here and move this logic as part of apply_transactions
implementation:
StorageDataSource::DbTrieOnly => {
let mut trie = self.get_trie_for_shard(
shard_id,
prev_block_hash,
storage_config.state_root,
false,
);
trie.dont_charge_gas_for_trie_node_access();
trie;
}
just to avoid duplication and changing it back after we have state witness
I think I understood why some tests failed in Nayduck - because I didn't comment not supported cases well enough. New attempt is here: https://nayduck.near.org/#/run/3298 |
I've turned off automatic merging altogether until I understand this problem. |
New logic also breaks hidden assumption that order of jobs must correspond to range of shard ids, which is not desired anyway, so I remove this assumption by returning shard id with a job explicitly. |
I am not planning on reviewing this PR so removing my review request. Let me know if my input is actually needed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
|
||
/// Result for a shard update for a single block. | ||
#[derive(Debug)] | ||
pub enum ShardBlockUpdateResult { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: maybe ShardChunkUpdateResult
would be more precise
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I chose ShardBlockUpdateResult
explicitly because chunk is not necessarily present. Especially if there is a state split. So I'll bet on that for now
chain/chain/src/update_shard.rs
Outdated
@@ -106,15 +131,25 @@ pub(crate) fn process_shard_update( | |||
epoch_manager: &dyn EpochManagerAdapter, | |||
shard_update_reason: ShardUpdateReason, | |||
shard_context: ShardContext, | |||
state_patch: SandboxStatePatch, | |||
) -> Result<ApplyChunkResult, Error> { | |||
storage_context: StorageContext, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: I would move storage_context to NewChunkData and OldChunkData since it is only needed there
core/store/src/flat/manager.rs
Outdated
@@ -115,7 +115,7 @@ impl FlatStorageManager { | |||
?new_flat_head, | |||
?err, | |||
?shard_uid, | |||
block_hash = ?block.header().hash(), | |||
// block_hash = ?block.header().hash(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oops
chain/chain/src/chain.rs
Outdated
let state_patch = state_patch.take(); | ||
|
||
// Insert job into list of all jobs to run, if job is valid. | ||
let mut insert_job = |job: Result<Option<UpdateShardJob>, Error>| { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: this looks weird 🙃 can we just accumulate job_results: Vec<Result<Option<UpdateShardJob>, Error>>
and then extract jobs and invalid_chunks after this loop?
Codecov ReportAttention:
Additional details and impacted files@@ Coverage Diff @@
## master #10248 +/- ##
========================================
Coverage 71.83% 71.83%
========================================
Files 712 712
Lines 143171 143442 +271
Branches 143171 143442 +271
========================================
+ Hits 102843 103043 +200
- Misses 35627 35669 +42
- Partials 4701 4730 +29
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
This is a next step for #9982.
Here I introduce jobs which will perform stateless validation of newly received chunk by executing txs and receipts.
Later they should be executed against state witness, but for now I just set a foundation by running these jobs against state data in DB. All passing tests verify that old and new jobs generate the same result.
The final switch will happen when stateful jobs will be replaced with stateless ones.
Details
This doesn't introduce any load on stable version. On nightly version there will be
num_shards
extra jobs which will check that stateless validation results are consistent with stateful execution. But as we use nightly only for testing, it shouldn't mean much overhead.I add more fields to
ShardContext
structure to simplify code. Some of them are needed to break early if there is resharding, and the logic is the same for both kinds of jobs.StorageDataSource::DbTrieOnly
is introduced to read data only from trie in stateless jobs. This is annoying but still needed if there are a lot of missing chunks and flat storage head moved above the block at which previous chunk was created. When state witness will be implemented,Recorded
will be used instead.Testing
assertion
left == rightfailed: For stateless validation, chunk extras for block CMV88CBcnKoxa7eTnkG64psLoJzpW9JeAhFrZBVv6zDc and shard s3.v2 do not match...
tests::client::resharding::test_latest_protocol_missing_chunks_high_missing_prob
fails which was specifically introduced for that. Actually this helped to realize thatvalidate_chunk_with_chunk_extra
is still needed but I will introduce it later.https://nayduck.near.org/#/run/3293 - +10 nightly tests failing, will take a lookhttps://nayduck.near.org/#/run/3300