-
Notifications
You must be signed in to change notification settings - Fork 664
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Opening database takes a very long time #3065
Comments
My $0.02:
|
There is a similar report in Ceph: https://tracker.ceph.com/issues/21092 It seems it is related to the cache size of RocksDB. |
@ailisp Was it you who tweaked RocksDB before? Maybe you have an idea where we miss the cache? |
Great finding! This sounds can't be resolved by smaller write-ahead-log size (.log file), or size of data file (.sst). But rather the node exit before sst file is properly updated and flushed.
Rocksdb might have a 30min timeout on load sst, and would abandon to avoid infinite loop like above case, we might want to abandon earlier to speed up |
@ailisp Can you take it over or help @mikhailOK with this? P.S. How critical is having a corrupted .sst file? (I just want to learn, so I can better understand how to provide useful debug information) |
I think in this case it's already corrupt, it shouldn't retry for 30 minute. In general i'm not sure. From how rocksdb works it suppose to keep the write-ahead-log file until it's fully flushed to sst. So if the last sst file isn't fully written, log file should exist and it should be okay to delete the last sst file and reconstruct from the write-ahead-log file.
Will help @mikhailOK |
I tried removing it, but then it complained that there is not such .sst file 😄 |
That's bad, probably rocksdb thought last log has applied to sst and deleted the log :( Or i think, more likely (given last sst file is much bigger than the old ones), rocksdb is doing appending into the latest sst. And compact it into several smaller ones once it grows to some limit. If this is the case we have to use rocksdb's functions to fix sst by undo the latest log. |
In fact, it was not the last sst file, and usually it is the first or close to the first (based on the numbering in the name). |
Use rocksdb merge operator for ColState. No longer need to atomically read + write on update. Fixes #3065 Test plan --------- sanity test manually check performance
Use rocksdb merge operator for ColState. No longer need to atomically read + write on update. Fixes #3065 Test plan --------- sanity test manually check performance manually check store version upgrade
After the hard-fork on testnet yesterday we observe that when restarting a node, it can potentially take more than half an hour to just open the database. To reproduce, initialize and start a testnet node by
where genesis.json can be found at https://s3-us-west-1.amazonaws.com/build.nearprotocol.com/nearcore-deploy/testnet/genesis.json.
After the node starts, you should be able to see that it is "Waiting for peers". Now shut down the node and restart by
neard --home=<home> run
and observe that it gets stuck.The text was updated successfully, but these errors were encountered: