disk accounts index performance improvements #30711

jeffwashington · 2023-03-14T17:46:56Z

Problem

When disk index is enabled, validators are having a hard time catching up initially.
Disk i/o, possibly write specifically, increases a lot.
Machines with older or only a single ssd are experiencing issues.

Proposed Solution

Here are some mitigation strategies to reduce disk i/o, roughly ordered by how easily they can be implemented:

Perform writes for entire bucket to reduce # page writes during index updates commit
Only flush buckets when contents exceed some limit (10G, etc.) branch
Initial startup does NOT write duplicate entries to disk. (at startup, keep duplicates in in-memory index since they will be cleaned shortly #30736)
a. This prevents us from creating a data file for all buckets containing duplicate entries.
b. makes the initial clean much faster (whether in bg or fg)
store single element account info in index file instead of data file (disk bucket stores single entry in index file #30750)
a. This makes the common case only need to read/write a single file instead of 2
b. It does make the index file larger
Stop storing append_vec_id for each slot info. (make store_id an Option in mem acct idx #30512)
a. This saves a u32 and maybe u64 aligned size for each entry.
b. This is in process in master.
Eliminate 8 bytes PER pubkey entry in the index file (wip: rework uid to use bits better #30563)
a. We need a single bit of this to determine occupied/empty. Alignment requires it to be 8 byte aligned. This is a high per-account cost.
Only allow storing index entries with a single slot list entry and a refcount = 1 (introduce bucket map with single data entry #30549)
a. Eliminates data files completely, reduces cost per entry, eliminates variable size
when inserting into data bucket, search the whole bucket for an empty spot. We would rather pack the data than resize unnecessarily.
a. this would hurt insertion time worst case, but since that happens in the background, that is ok and preferable to artificially larger files.

Already done in master:

Eliminate size from each slot info. This saves a u32 for each entry.
hold entries in in-memory index when ref_count != 1

KirillLykov · 2023-07-25T12:34:12Z

I wonder if this all the subtasks have been done for this issue and it can be closed?

jeffwashington self-assigned this Mar 14, 2023

This was referenced Apr 6, 2023

disk index: batch insert #31094

Merged

disk index: keep same random during resize #31095

Merged

disk index: bucket_index_ix doesn't % by capacity #31096

Merged

github-actions bot added the stale [bot only] Added to stale content; results in auto-close after a week. label Jul 25, 2024

github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Aug 2, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

disk accounts index performance improvements #30711

disk accounts index performance improvements #30711

jeffwashington commented Mar 14, 2023 •

edited

Loading

KirillLykov commented Jul 25, 2023

disk accounts index performance improvements #30711

disk accounts index performance improvements #30711

Comments

jeffwashington commented Mar 14, 2023 • edited Loading

Problem

Proposed Solution

KirillLykov commented Jul 25, 2023

jeffwashington commented Mar 14, 2023 •

edited

Loading