Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

disk accounts index performance improvements #30711

Closed
jeffwashington opened this issue Mar 14, 2023 · 1 comment
Closed

disk accounts index performance improvements #30711

jeffwashington opened this issue Mar 14, 2023 · 1 comment
Assignees
Labels
stale [bot only] Added to stale content; results in auto-close after a week.

Comments

@jeffwashington
Copy link
Contributor

jeffwashington commented Mar 14, 2023

Problem

When disk index is enabled, validators are having a hard time catching up initially.
Disk i/o, possibly write specifically, increases a lot.
Machines with older or only a single ssd are experiencing issues.

Proposed Solution

Here are some mitigation strategies to reduce disk i/o, roughly ordered by how easily they can be implemented:

  1. Perform writes for entire bucket to reduce # page writes during index updates commit
  2. Only flush buckets when contents exceed some limit (10G, etc.) branch
  3. Initial startup does NOT write duplicate entries to disk. (at startup, keep duplicates in in-memory index since they will be cleaned shortly #30736)
    a. This prevents us from creating a data file for all buckets containing duplicate entries.
    b. makes the initial clean much faster (whether in bg or fg)
  4. store single element account info in index file instead of data file (disk bucket stores single entry in index file #30750)
    a. This makes the common case only need to read/write a single file instead of 2
    b. It does make the index file larger
  5. Stop storing append_vec_id for each slot info. (make store_id an Option in mem acct idx #30512)
    a. This saves a u32 and maybe u64 aligned size for each entry.
    b. This is in process in master.
  6. Eliminate 8 bytes PER pubkey entry in the index file (wip: rework uid to use bits better #30563)
    a. We need a single bit of this to determine occupied/empty. Alignment requires it to be 8 byte aligned. This is a high per-account cost.
  7. Only allow storing index entries with a single slot list entry and a refcount = 1 (introduce bucket map with single data entry #30549)
    a. Eliminates data files completely, reduces cost per entry, eliminates variable size
  8. when inserting into data bucket, search the whole bucket for an empty spot. We would rather pack the data than resize unnecessarily.
    a. this would hurt insertion time worst case, but since that happens in the background, that is ok and preferable to artificially larger files.

Already done in master:

  1. Eliminate size from each slot info. This saves a u32 for each entry.
  2. hold entries in in-memory index when ref_count != 1
@jeffwashington jeffwashington self-assigned this Mar 14, 2023
@KirillLykov
Copy link
Contributor

I wonder if this all the subtasks have been done for this issue and it can be closed?

@github-actions github-actions bot added the stale [bot only] Added to stale content; results in auto-close after a week. label Jul 25, 2024
@github-actions github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Aug 2, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
stale [bot only] Added to stale content; results in auto-close after a week.
Projects
None yet
Development

No branches or pull requests

2 participants