feat: State sync from local filesystem #8913

nikurt · 2023-04-14T13:59:06Z

Also adds a unit test for state dump and an integration test for state dump and state sync from that dump

…om the local filesystem

core/dyn-configs/src/lib.rs

Longarithm · 2023-04-19T11:34:02Z

Can I review it this week?

nikurt · 2023-04-19T11:36:12Z

Can I review it this week?

Ready for review.
Not blocking anything at the moment.
Waiting too long on the reviews will make merging unnecessarily difficult.

chain/client-primitives/src/types.rs

chain/client/src/metrics.rs

ppca · 2023-04-19T12:01:27Z

chain/client/src/sync/state.rs

-                    *lock = Some(Err(err.to_string()));
-                }
-            }
+            let mut lock = download_response.lock().unwrap();


what is this lock used for?

In case of fetching data from S3, the data is written by one of many different threads managed by actix. This access needs to be synchronized.

this may be stupid, but wouldn't the different threads be writing to different locations? Why would they need to be synchronized?

Yet another thread is going to check if the writes are complete. That's a read-write race condition.
See fn check_external_storage_part_response(). That function also clears the response after saving it to RocksDB.
Also the way timeouts are handled can be problematic. If a task was spawned to fetch an object, but hasn't done anything in the 60 seconds, a new task will be spawned and the old task isn't cancelled. That will be a write-write race condition.

hmm I think I probably missed this point in the last review, but that feels a little strange to me. Is it not possible to just coordinate the tasks so that we don't start a new one when there's already one running for that part? Each task can just write its state part to a Vec allocated for it, and then when everybody is done we signal the main thread to tell it that the results are ready, and nobody needs a lock anywhere. Unless for some reason that I'm missing that is not possible here

I agree that a mutex on every Vec is sub-optimal, but changing this type is a bigger change. When the Vec gets response, that needs to be communicated to somebody somehow. Maybe ShardSyncDownload needs to get a new AtomicI64 that counts the number of requests alive.

The current mechanism happened to exist, to send out part requests to the peers, and those requests are async.

For external storage a rayon loop could be a simpler approach. Or we can use SyncJobsActor to fetch state parts.

Let me get back to it in a follow-up PR.

nearcore/src/lib.rs

core/chain-configs/src/client_config.rs

ppca · 2023-04-19T12:22:05Z

config_validate.rs logic looks good to me. left some other comments

ppca

Looks good to me, with 2 minor questions.

core/chain-configs/src/client_config.rs

chain/client/src/metrics.rs

chain/client/src/sync/state.rs

marcelo-gonzalez · 2023-04-20T16:01:18Z

nearcore/src/state_sync.rs

@@ -59,10 +68,11 @@ pub fn spawn_state_sync_dump(
        runtime.num_shards(&epoch_id)
    }?;

+    let chain_id = client_config.chain_id.clone();
+    let keep_running = Arc::new(AtomicBool::new(true));
    // Start a thread for each shard.
    let handles = (0..num_shards as usize)


not a problem right now, but in a future world where we have 1000 shards or something, one actix_rt::Arbiter::new() per shard is probably not what we want

Acknowledged

marcelo-gonzalez · 2023-04-20T16:09:57Z

nearcore/src/state_sync.rs

 }

 impl Drop for StateSyncDumpHandle {
    fn drop(&mut self) {
+        self.keep_running.store(false, std::sync::atomic::Ordering::Relaxed);


what is the intended behavior here?

If I go:

fn make_node() -> Addr<ClientActor> { let NearNode { client, .. } = nearcore::start_with_config(..); client } fn main() { let client = make_client(); do_stuff(client); }

Then the node keeps running as usual but dumping state has stopped, which seems a bit unexpected

Suggestions welcome.
What you're describing seems acceptable. If the user needs these threads to keep working, the user needs to hold on to the state_sync_dump_handle.

what would be wrong with just not changing keep_running on drop? Is this behavior something we explicitly want for some reason? Could make it work like the join handle returned from std::thread::spawn() where dropping it does nothing, and just means that you dont have a handle to it anymore

The problem I'm solving is that the loop isn't interrupt-able while generating parts in

for part_id in parts_dumped..num_parts {

Tested that locally:

Ignore keep_running

Replace the for-loop with an infinite loop obtaining part_id=0.

Run a localnet node that dumps state to /tmp/.

Hit Ctrl+C, it stops actix actors, disables logging, but the loop keeps running, writing new files over and over.

If dumping state takes only a short time, then the dumping threads respond well to the interrupts.

keep_running stops that long-running loop and makes the threads interrupt-able.

Could make it work like the join handle returned from std::thread::spawn() where dropping it does nothing, and just means that you dont have a handle to it anymore

That would be problematic, because the for-loop would keep running.

Yeah but I think the behavior you describe occurs because StateSyncDumpHandle::stop() doesn't set keep_running, not because we need it to be set in drop(). So if you add that, it should work as you expect it to. And honestly it feels to me like StateSyncDumpHandle::stop() should probably set that variable right?

nearcore/src/state_sync.rs

marcelo-gonzalez · 2023-04-20T16:59:59Z

man... github has a bug in reviews.... this happened last time too. Quite a shocking bug tbh unless I am just doing something stupid. So posting here comments I posted that dont show up in an incognito window for me:

chain/client/src/sync/state.rs line 192:

why warn!? Used to be debug before this change, which makes more sense to me

chain/client/src/sync/state.rs line 993:

I think in the last PR that added ExternalStorage to this AccountOrPeerIdOrHash, it could just have been deleted totally since it looks like it's never even constructed anywhere. So could remove that enum variant, or could set it in request_part_from_external_storage() to print it out in format_shard_sync_phase()

chain/client/src/sync/state.rs line 150:

throughout this function, info! feels like it's probably a bit too verbose. There are many thousands of parts in mainnet right?

chain/client/src/sync/state.rs line 1509:

why not just call this function there then?

docs/misc/state_sync_from_s3.md line 11:

so we are saying it's not temporary anymore?

docs/misc/state_sync_from_s3.md line 36:

should we still tell people that you should set these if you want to dump the parts though?

integration-tests/src/tests/nearcore/sync_state_nodes.rs line 405:

when running this test, I get a lot of logspam with Check if a new complete epoch is available. Like sometimes once a millisecond or so. Should probably be cleaned up in the dump code

also I got this failure:

thread 'tests::nearcore::sync_state_nodes::sync_state_dump' panicked at 'called `Result::unwrap()` on an `Err` value: Elapsed(())', integration-tests/src/tests/nearcore/sync_state_nodes.rs:528:14

logs say:

2023-04-20T15:20:31.516862Z  WARN sync: State sync didn't download the state, sending StateRequest again shard_id=3 timeout_sec=1
...
2023-04-20T15:20:31.520113Z  WARN sync: sync_hash C5DpzjncyZo77ZfxC5g1ypW1Y9agJcd45NfYjM7rKWct didn't pass validation, possible malicious behavior

This error happens like every time I run it, so at the moment I think this will probably be failing on nayduck if merged as is. Let me know if you can't reproduce it on your end and I can debug it on my computer

integration-tests/src/tests/nearcore/sync_state_nodes.rs line 430

why Arc? same with dir2 below

nearcore/src/config_validate.rs line 100

nit: if root_dir.as_path() == Path::new("") { .. } would be cheaper. same for the other check below

core/chain-configs/src/client_config.rs line 87

nit: u64 is kinda big for this right?

nikurt · 2023-04-21T14:05:08Z

github has a bug in reviews....

Yes, last time was awkward, this time I see all comments you quoted.

nearcore/src/state_sync.rs

chain/client/src/sync/state.rs

integration-tests/src/tests/nearcore/sync_state_nodes.rs

marcelo-gonzalez · 2023-04-21T15:50:41Z

integration-tests/src/tests/nearcore/sync_state_nodes.rs

+/// Start the second node which gets state parts from that temp directory.
+#[test]
+#[cfg_attr(not(feature = "expensive_tests"), ignore)]
+fn sync_state_dump() {


yeah debug! sounds good. Although I don't think it's because of std::thread::sleep(iteration_delay), since nanosleep should sleep for at least as long as you ask it to. Actually I think the frequency of it was because of the different threads all calling check_new_epoch() at the same time. No objections to using actix_rt::time::sleep() though. Don't think it matters much in this case which one you use but might as well.

And this test still fails for me every time with the same error I pasted. Does it pass for you? maybe it's platform dependent for some reason

nikurt · 2023-04-24T13:02:42Z

this test still fails for me every time

Fixed and checked that it works on nayduck.
The issue was that the epochs complete too quickly. The test starts the second node, the second node gets headers. When it asks the first node for the sync header, the first node is so far ahead that it considers that sync_hash to be invalid.

https://nayduck.near.org/#/run/2975

chain/chain/src/chain.rs

integration-tests/src/tests/nearcore/sync_state_nodes.rs

nearcore/src/state_sync.rs

Longarithm

Approving to unblock.

Also adds a unit test for state dump and an integration test for state dump and state sync from that dump

feat: State sync can dump state parts to local filesystem and sync fr…

246ff2a

…om the local filesystem

nikurt requested review from ppca and marcelo-gonzalez April 14, 2023 13:59

nikurt requested a review from a team as a code owner April 14, 2023 13:59

nikurt requested a review from jakmeier April 14, 2023 13:59

nikurt added 4 commits April 14, 2023 16:04

fix

f4acbe2

nightly.txt

aad8864

Interrupt dumping

ca539b9

Clippy

61db921

nikurt commented Apr 17, 2023

View reviewed changes

core/dyn-configs/src/lib.rs Show resolved Hide resolved