-
Notifications
You must be signed in to change notification settings - Fork 114
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Execution can't recover after crash #1440
Comments
Alternatively, we can just never delete block_number->block_hash from the db. Clearly, not most optimized solution, but definitely the easiest one. It's only ~64 bytes per block, so it's not the end of the world (total of ~1.2 GB for entire chain at the moment). |
I think the right solution is to change from Instead of doing 1 off solutions like are listed above, which won't solve the root problem |
#1451 (comment) Additional comments I made on this problem, and why switching to an ACID database solves them |
Why can't we use RocksDB? Instead of using I think in our case, we can even use |
Erigon has a write up here https://github.com/erigontech/erigon/wiki/Choice-of-storage-engine They tried like 5 different database solutions then ended up with They say it isn't ACID,
This looks like a good initial start, as it seems to have higher reliability than our current solution, but because various projects have pointed out issues, I am inclined to think it is a bad choice long term. |
While running trin execution, it happened that era1 deserialization failed (irrelevant to this issue).
When I tried to resume running it, it would fail very soon afterwards with error:
Error: database error: not found database error block_hash
After looking a bit more into it, I found the problem.
The
BlockExecutor::manage_block_hash_serve_window
modifies the db directly after every processed block. If the execution crashes (like it happened to me) and we try to resume it, the stored block hashes will not be the correct ones (we will have 256 blocks from the moment of crash, not the saved checkpoint).Possible solutions:
The text was updated successfully, but these errors were encountered: