Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

execution: fix OOM crash due to db::Buffer::accounts realloc (#2064) #2081

Merged
merged 7 commits into from
Jun 14, 2024

Conversation

battlmonstr
Copy link
Contributor

@battlmonstr battlmonstr commented Jun 6, 2024

The flat_hash_map has a policy (in rehash_and_grow_if_necessary()) to grow 2x when its size reaches 25/32 (78%) of capacity.

Previously the batch execution logic was adding things to db::Buffer, and then testing if its size becomes bigger than batch_size. This lead to significant overshooting, because at the growth threshold it will double the memory size and make it batch_size * 2 (even more, because there's also unused 22% of capacity).

In standalone silkworm execution stage the batch_size estimation was based on a gas-based heuristic formula, which lead to significant underestimation and an OOM crash when trying to allocate 4 Gb of RAM at once.

This changes the logic from checking batch_size aposteriori to apriori. If there's a possibility to grow the buffer significantly and overshoot the limit, we shouldn't take the risk, and commit the batch.

Additional changes:

  • execution stage: flush history after each block as in CAPI
  • capi: persist the executed work before quitting on invalid block
  • remove unused db::Buffer::current_batch_history_size()
  • ExecutionProcessor: split flush_state from execute_block
  • block_buffer.terminate_and_release_all using gsl::finally

@battlmonstr battlmonstr requested a review from canepat June 6, 2024 12:49
@battlmonstr battlmonstr force-pushed the pr/crashdbg branch 6 times, most recently from 72e14b7 to 0078f27 Compare June 12, 2024 14:40
[[nodiscard]] ValidationResult execute_block(std::vector<Receipt>& receipts) noexcept;

//! \brief Flush IntraBlockState into cumulative State.
void flush_state();
Copy link
Member

@canepat canepat Jun 14, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should change this to something like:

[[nodiscard]] FlushingResult flush_state() noexcept;

in order to:

  1. make ExecutionProcessor exception-free again
  2. do not rely on db::Buffer::MemoryLimitError exception to handle an expected condition i.e. batch full

@canepat canepat added the maintenance Some maintenance work (fix, refactor, rename, test...) label Jun 14, 2024
@canepat
Copy link
Member

canepat commented Jun 14, 2024

Closes #2064

@canepat canepat merged commit ba106d1 into master Jun 14, 2024
4 checks passed
@canepat canepat deleted the pr/crashdbg branch June 14, 2024 09:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
maintenance Some maintenance work (fix, refactor, rename, test...)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants