Skip to content
This repository has been archived by the owner on Jan 22, 2025. It is now read-only.

Fix the inconsistency check in get_entries_in_data_block() #27195

Merged
merged 1 commit into from
Aug 18, 2022
Merged

Fix the inconsistency check in get_entries_in_data_block() #27195

merged 1 commit into from
Aug 18, 2022

Conversation

yhchiang-sol
Copy link
Contributor

@yhchiang-sol yhchiang-sol commented Aug 17, 2022

Problem

get_entries_in_data_block() panics when there's inconsistency between
slot_meta and data_shred.

However, as we don't lock on reads, reading across multiple column families is
not atomic (especially for older slots) and thus does not guarantee consistency
as the background cleanup service could purge the slot in the middle. Such
panic was reported in #26980 when the validator serves a high load of RPC calls.

Summary of Changes

This PR makes get_entries_in_data_block() panic only when the inconsistency
between slot-meta and data-shred happens on a slot newer than lowest_cleanup_slot.

@lijunwangs lijunwangs requested a review from carllin August 17, 2022 15:41
@yhchiang-sol yhchiang-sol changed the title Avoid panic in get_entries_in_data_block() Fix the inconsistency check in get_entries_in_data_block() Aug 17, 2022
lijunwangs
lijunwangs previously approved these changes Aug 17, 2022
Copy link
Contributor

@lijunwangs lijunwangs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good!

carllin
carllin previously approved these changes Aug 17, 2022
@mergify mergify bot dismissed stale reviews from carllin and lijunwangs August 18, 2022 00:42

Pull request has been modified.

lijunwangs
lijunwangs previously approved these changes Aug 18, 2022
@mergify mergify bot dismissed lijunwangs’s stale review August 18, 2022 04:32

Pull request has been modified.

@yhchiang-sol yhchiang-sol added the automerge Merge this Pull Request automatically once CI passes label Aug 18, 2022
@mergify mergify bot removed the automerge Merge this Pull Request automatically once CI passes label Aug 18, 2022
@mergify
Copy link
Contributor

mergify bot commented Aug 18, 2022

automerge label removed due to a CI failure

@yhchiang-sol yhchiang-sol merged commit 6d12bb6 into solana-labs:master Aug 18, 2022
mergify bot pushed a commit that referenced this pull request Aug 18, 2022
#### Problem
get_entries_in_data_block() panics when there's inconsistency between
slot_meta and data_shred.

However, as we don't lock on reads, reading across multiple column families is
not atomic (especially for older slots) and thus does not guarantee consistency
as the background cleanup service could purge the slot in the middle.  Such
panic was reported in #26980 when the validator serves a high load of RPC calls.

#### Summary of Changes
This PR makes get_entries_in_data_block() panic only when the inconsistency
between slot-meta and data-shred happens on a slot older than lowest_cleanup_slot.

(cherry picked from commit 6d12bb6)
mergify bot pushed a commit that referenced this pull request Aug 18, 2022
#### Problem
get_entries_in_data_block() panics when there's inconsistency between
slot_meta and data_shred.

However, as we don't lock on reads, reading across multiple column families is
not atomic (especially for older slots) and thus does not guarantee consistency
as the background cleanup service could purge the slot in the middle.  Such
panic was reported in #26980 when the validator serves a high load of RPC calls.

#### Summary of Changes
This PR makes get_entries_in_data_block() panic only when the inconsistency
between slot-meta and data-shred happens on a slot older than lowest_cleanup_slot.

(cherry picked from commit 6d12bb6)
@ryoqun
Copy link
Contributor

ryoqun commented Aug 18, 2022

no test? ;)

mergify bot added a commit that referenced this pull request Aug 18, 2022
…27195) (#27231)

Fix a corner-case panic in get_entries_in_data_block() (#27195)

#### Problem
get_entries_in_data_block() panics when there's inconsistency between
slot_meta and data_shred.

However, as we don't lock on reads, reading across multiple column families is
not atomic (especially for older slots) and thus does not guarantee consistency
as the background cleanup service could purge the slot in the middle.  Such
panic was reported in #26980 when the validator serves a high load of RPC calls.

#### Summary of Changes
This PR makes get_entries_in_data_block() panic only when the inconsistency
between slot-meta and data-shred happens on a slot older than lowest_cleanup_slot.

(cherry picked from commit 6d12bb6)

Co-authored-by: Yueh-Hsuan Chiang <[email protected]>
mergify bot added a commit that referenced this pull request Aug 19, 2022
…27195) (#27232)

Fix a corner-case panic in get_entries_in_data_block() (#27195)

#### Problem
get_entries_in_data_block() panics when there's inconsistency between
slot_meta and data_shred.

However, as we don't lock on reads, reading across multiple column families is
not atomic (especially for older slots) and thus does not guarantee consistency
as the background cleanup service could purge the slot in the middle.  Such
panic was reported in #26980 when the validator serves a high load of RPC calls.

#### Summary of Changes
This PR makes get_entries_in_data_block() panic only when the inconsistency
between slot-meta and data-shred happens on a slot older than lowest_cleanup_slot.

(cherry picked from commit 6d12bb6)

Co-authored-by: Yueh-Hsuan Chiang <[email protected]>
HaoranYi pushed a commit to HaoranYi/solana that referenced this pull request Aug 21, 2022
…7195)

#### Problem
get_entries_in_data_block() panics when there's inconsistency between
slot_meta and data_shred.

However, as we don't lock on reads, reading across multiple column families is
not atomic (especially for older slots) and thus does not guarantee consistency
as the background cleanup service could purge the slot in the middle.  Such
panic was reported in solana-labs#26980 when the validator serves a high load of RPC calls.

#### Summary of Changes
This PR makes get_entries_in_data_block() panic only when the inconsistency
between slot-meta and data-shred happens on a slot older than lowest_cleanup_slot.
HaoranYi added a commit that referenced this pull request Aug 22, 2022
* refactor: extract store_stake_accounts fn

* refactor: extract store_vote_account fn

* refactor: extract reward history update fn

* remove avg point value from pay_valiator fn. not used

* clippy: slice

* clippy: slice

* remove abort() from test-validator (#27124)

* chore: bump bytes from 1.1.0 to 1.2.1 (#27172)

* chore: bump bytes from 1.1.0 to 1.2.1

Bumps [bytes](https://github.com/tokio-rs/bytes) from 1.1.0 to 1.2.1.
- [Release notes](https://github.com/tokio-rs/bytes/releases)
- [Changelog](https://github.com/tokio-rs/bytes/blob/master/CHANGELOG.md)
- [Commits](tokio-rs/bytes@v1.1.0...v1.2.1)

---
updated-dependencies:
- dependency-name: bytes
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <[email protected]>

* [auto-commit] Update all Cargo lock files

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: dependabot-buildkite <[email protected]>

* Share Ancestors API get with contains_key (#27161)

consolidate similar fns

* Rename to `MAX_BLOCK_ACCOUNTS_DATA_SIZE_DELTA` (#27175)

* chore: bump libc from 0.2.129 to 0.2.131 (#27162)

* chore: bump libc from 0.2.129 to 0.2.131

Bumps [libc](https://github.com/rust-lang/libc) from 0.2.129 to 0.2.131.
- [Release notes](https://github.com/rust-lang/libc/releases)
- [Commits](rust-lang/libc@0.2.129...0.2.131)

---
updated-dependencies:
- dependency-name: libc
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <[email protected]>

* [auto-commit] Update all Cargo lock files

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: dependabot-buildkite <[email protected]>

* reverts wide fanout in broadcast when the root node is down (#26359)

A change included in
#20480
was that when the root node in turbine broadcast tree is down, the
leader will broadcast the shred to all nodes in the first layer.
The intention was to mitigate the impact of dead nodes on shreds
propagation, because if the root node is down, then the entire cluster
will miss out the shred.
On the other hand, if x% of stake is down, this will cause 200*x% + 1
packets/shreds ratio at the broadcast stage which might contribute to
line-rate saturation and packet drop.
To avoid this bandwidth saturation issue, this commit reverts that logic
and always broadcasts shreds from the leader only to the root node.
As before we rely on erasure codes to recover shreds lost due to staked
nodes being offline.

* add getTokenLargestAccounts rpc method to rust client (#26840)

* add get token largest accounts rpc call to client

* split to include with commitment

* Bump spl-token-2022 (#27181)

* Bump token-2022 to 0.4.3

* Allow cargo to bump stuff to v1.11.5

* VoteProgram.safeWithdraw function to safeguard against accidental vote account closures (#26586)

feat: safe withdraw function

Co-authored-by: aschonfeld <[email protected]>

* chore: bump futures from 0.3.21 to 0.3.23 (#27182)

* chore: bump futures from 0.3.21 to 0.3.23

Bumps [futures](https://github.com/rust-lang/futures-rs) from 0.3.21 to 0.3.23.
- [Release notes](https://github.com/rust-lang/futures-rs/releases)
- [Changelog](https://github.com/rust-lang/futures-rs/blob/master/CHANGELOG.md)
- [Commits](rust-lang/futures-rs@0.3.21...0.3.23)

---
updated-dependencies:
- dependency-name: futures
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <[email protected]>

* [auto-commit] Update all Cargo lock files

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: dependabot-buildkite <[email protected]>

* chore: bump nix from 0.24.2 to 0.25.0 (#27179)

* chore: bump nix from 0.24.2 to 0.25.0

Bumps [nix](https://github.com/nix-rust/nix) from 0.24.2 to 0.25.0.
- [Release notes](https://github.com/nix-rust/nix/releases)
- [Changelog](https://github.com/nix-rust/nix/blob/master/CHANGELOG.md)
- [Commits](nix-rust/nix@v0.24.2...v0.25.0)

---
updated-dependencies:
- dependency-name: nix
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <[email protected]>

* [auto-commit] Update all Cargo lock files

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: dependabot-buildkite <[email protected]>

* Parse ConfidentialTransaction instructions (#26825)

Parse ConfidentialTransfer instructions

* snapshots: serialize version file first (#27192)

serialize version file first

* serialize incremental_snapshot_hash (#26839)

* serialize incremental_snapshot_hash

* pr feedback

* derives Error trait for ClusterInfoError and core::result::Error (#27208)

* Add clean_accounts_for_tests() (#27200)

* Rust v1.63.0 (#27148)

* Upgrade to Rust v1.63.0

* Add nightly_clippy_allows

* Resolve some new clippy nightly lints

* Increase QUIC packets completion timeout

Co-authored-by: Michael Vines <[email protected]>

* docs: updated "transaction fees" page (#26861)

* docs: transaction fees, compute units, compute budget

* docs: added messages definition

* Revert "docs: added messages definition"

This reverts commit 3c56156.

* docs: added messages definition

* Update docs/src/transaction_fees.md

Co-authored-by: Jacob Creech <[email protected]>

* fix: updates from feedback

Co-authored-by: Jacob Creech <[email protected]>

* sdk: Fix args after "--" in build-bpf and test-bpf (#27221)

* Flaky Unit Test test_rpc_subscriptions (#27214)

Increase unit test timeout from 5 seconds to 10 seconds

* chore: only buildkite pipelines use sccache in docker-run.sh (#27204)

chore: only buildkite ci use sccache

* clean feature: `prevent_calling_precompiles_as_programs` (#27100)

* clean feature: prevent_calling_precompiles_as_programs

* fix tests

* fix test

* remove comment

* fix test

* feedback

* Add get_account_with_commitment to BenchTpsClient (#27176)

* Fix a corner-case panic in get_entries_in_data_block() (#27195)

#### Problem
get_entries_in_data_block() panics when there's inconsistency between
slot_meta and data_shred.

However, as we don't lock on reads, reading across multiple column families is
not atomic (especially for older slots) and thus does not guarantee consistency
as the background cleanup service could purge the slot in the middle.  Such
panic was reported in #26980 when the validator serves a high load of RPC calls.

#### Summary of Changes
This PR makes get_entries_in_data_block() panic only when the inconsistency
between slot-meta and data-shred happens on a slot older than lowest_cleanup_slot.

* Verify snapshot slot deltas (#26666)

* store-tool: log lamports for each account (#27168)

log lamports for each account

* add an assert for a debug feature to avoid wasted time (#27210)

* remove redundant call that bumps age to future (#27215)

* Use from_secs api to create duration (#27222)

use from_secs api to create duration

* reorder slot # in debug hash data path (#27217)

* create helper fn for clarity (#27216)

* Verifying snapshot bank must always specify the snapshot slot (#27234)

* Remove `Bank::ensure_no_storage_rewards_pool()` (#26468)

* cli: Add subcommands for address lookup tables (#27123)

* cli: Add subcommand for creating address lookup tables

* cli: Add additional subcommands for address lookup tables

* short commands

* adds hash domain to ping-pong protocol (#27193)

In order to maintain backward compatibility, for now the responding node
will hash the token both with and without domain so that the other node
will accept the response regardless of its upgrade status.
Once the cluster has upgraded to the new code, we will remove the legacy
domain = false case.

* Revert "Rust v1.63.0 (#27148)" (#27245)

This reverts commit a2e7bdf.

* correct double negation (#27240)

* Enable QUIC client by default. Add arg to disable QUIC client. (Forward port #26927) (#27194)

Enable QUIC client by default. Add arg to disable QUIC client.

* Enable QUIC client by default. Add arg to disable QUIC client.
* Deprecate --disable-quic-servers arg
* Add #[ignore] annotation to failing tests

* slots_connected: check if the range is connected (>= ending_slot) (#27152)

* create-snapshot check if snapshot slot exists (#27153)

* Add Bank::clean_accounts_for_tests() (#27209)

* Call `AccountsDb::shrink_all_slots()` directly (#27235)

* add ed25519_program to built-in instruction cost list (#27199)

* add ed25519_program to built-in instruction cost list

* Remove unnecessary and stale comment

* simple refactorings to disk idx (#27238)

* add _inclusive for clarity (#27239)

* eliminate unnecessary ZERO_RAW_LAMPORTS_SENTINEL (#27218)

* make test code more clear (#27260)

* banking stage: actually aggregate tracer packet stats (#27118)

* aggregated_tracer_packet_stats_option was alwasys None

* Actually accumulate tracer packet stats

* Refactor epoch reward 1 (#27253)

* refactor: extract store_stake_accounts fn

* clippy: slice

Co-authored-by: haoran <haoran@mbook>

* recovers merkle shreds from erasure codes (#27136)

The commit
* Identifies Merkle shreds when recovering from erasure codes and
  dispatches specialized code to reconstruct shreds.
* Coding shred headers are added to recovered erasure shards.
* Merkle tree is reconstructed for the erasure batch and added to
  recovered shreds.
* The common signature (for the root of Merkle tree) is attached to all
  recovered shreds.

* Simplify `Bank::clean_accounts()` by removing params (#27254)

* Account files remove (#26910)

* Create a new function cleanup_accounts_paths, a trivial change

* Remove account files asynchronously

* Update and simplify the implementation after the validator test runs.

* Fixes after testing on the dev device

* Discard tokio.  Use thread instead

* Fix comments format

* Fix config type to pass the github test

* Fix failed tests.  Handle the case of non-existing path

* Final cleanup, addressing the review comments
Avoided OsString.
Made the function more generic with "impl AsRef<Path>"

Co-authored-by: Jeff Washington <[email protected]>

* Refactor: Flattens `TransactionContext::instruction_trace` (#27109)

* Flattens TransactionContext::instruction_trace.

* Stop the search at transaction level.

* Renames get_instruction_context_at => get_instruction_context_at_nesting_level.

* Removes TransactionContext::get_instruction_trace().
Adds TransactionContext::get_instruction_trace_length() and TransactionContext::get_instruction_context_at_index().

* Have TransactionContext::instruction_accounts_lamport_sum() accept an iterator instead of a slice.

* Removes instruction_trace from ExecutionRecord.

* make InstructionContext::new() private

* Parallel insertion of dirty store keys during clean (#27058)

parallelize dirty store key insertion

* Refactor epoch reward 2 (#27257)

* refactor: extract store_stake_accounts fn

* refactor: extract store_vote_account fn

* clippy: slice

* clippy: slice

* fix merge error

Co-authored-by: haoran <haoran@mbook>

* Standardize thread names

Tenets:
1. Limit thread names to 15 characters
2. Prefix all Solana-controlled threads with "sol"
3. Use Camel case. It's more character dense than Snake or Kebab case

* cleanup comment on filter_zero_lamport_clean_for_incremental_snapshots (#27273)

* remove inaccurate log (#27255)

* patches metrics for invalid cached vote/stake accounts (#27266)

patches invalid cached vote/stake accounts metrics

Invalid cached vote accounts is overcounting actual mismatches, and
invalid cached stake accounts is undercounting.

* Refactor epoch reward 3 (#27259)

* refactor: extract store_stake_accounts fn

* refactor: extract store_vote_account fn

* refactor: extract reward history update fn

* clippy: slice

* clippy: slice

Co-authored-by: haoran <haoran@mbook>

* fix merges

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: haoran <haoran@mbook>
Co-authored-by: Jeff Biseda <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: dependabot-buildkite <[email protected]>
Co-authored-by: Brooks Prumo <[email protected]>
Co-authored-by: behzad nouri <[email protected]>
Co-authored-by: AJ Taylor <[email protected]>
Co-authored-by: Tyera Eulberg <[email protected]>
Co-authored-by: Andrew Schonfeld <[email protected]>
Co-authored-by: aschonfeld <[email protected]>
Co-authored-by: apfitzge <[email protected]>
Co-authored-by: Jeff Washington (jwash) <[email protected]>
Co-authored-by: Brennan Watt <[email protected]>
Co-authored-by: Michael Vines <[email protected]>
Co-authored-by: Nick Frostbutter <[email protected]>
Co-authored-by: Jacob Creech <[email protected]>
Co-authored-by: Jon Cinque <[email protected]>
Co-authored-by: Yihau Chen <[email protected]>
Co-authored-by: Justin Starry <[email protected]>
Co-authored-by: kirill lykov <[email protected]>
Co-authored-by: Yueh-Hsuan Chiang <[email protected]>
Co-authored-by: leonardkulms <[email protected]>
Co-authored-by: Will Hickey <[email protected]>
Co-authored-by: Tao Zhu <[email protected]>
Co-authored-by: Xiang Zhu <[email protected]>
Co-authored-by: Jeff Washington <[email protected]>
Co-authored-by: Alexander Meißner <[email protected]>
@behzadnouri
Copy link
Contributor

@yhchiang-sol This function is called from replay stage when processing shreds for completed sets for the first time.
Why should slot < self.lowest_cleanup_slot() ever in that case?
i.e. why should blockstore prematurely cleanup slots that the replay stage hasn't processed yet.

The commit is suppressing the panic, but if the slot is incorrectly cleaned up, then the culprit is somewhere else!

solana_core::replay_stage::ReplayStage::replay_active_banks::{{closure}}
solana_core::replay_stage::ReplayStage::replay_active_bank
solana_core::replay_stage::ReplayStage::replay_blockstore_into_bank
solana_ledger::blockstore_processor::confirm_slot
solana_ledger::blockstore::Blockstore::get_slot_entries_with_shred_info

@yhchiang-sol
Copy link
Contributor Author

@behzadnouri Thanks for raising it. Let me revisit the purge logic a bit.

@yhchiang-sol
Copy link
Contributor Author

@behzadnouri The purge is determined by the ledger_cleanup_service, where anything older than the max-ledger-shreds (i.e., the --limit_ledger_size argument) will be purged. Looks like it doesn't check any replay status.

Is there any atomic variable that stores the current replay status so that the purge process can honor this? Although it will then keep more shreds than the configured --limit_ledger_size.

@yhchiang-sol
Copy link
Contributor Author

Is there any atomic variable that stores the current replay status so that the purge process can honor this? Although it will then keep more shreds than the configured --limit_ledger_size.

In any case, I will do some code study and file a PR to make the cleanup service honor the replay progress.

@yhchiang-sol
Copy link
Contributor Author

Hey @behzadnouri. @steviez and I chatted a little bit about this, and we found that the replay stage isn't the only call-stack that might invoke get_slot_entries_with_shred_info(). The get_block() RPC call will also trigger this call, and the validator that hit this issue was also serving heavy load RPC calls.

rpc::get_block
Blockstore::get_complete_block
Blockstore::get_slot_entries
Blockstore::get_slot_entries_with_shred_info

We also double-checked the ledger_cleanup_service code, and it never purges anything newer than the root.

So everything is good.

@steviez
Copy link
Contributor

steviez commented Sep 1, 2022

Thanks for posting the summary @yhchiang-sol. Poking around a little more, there is another codepath that will hit Blockstore::get_slot_entries_with_shred_info():

rpc::get_block
Blockstore::get_rooted_block
Blockstore::get_complete_block
Blockstore::get_slot_entries
Blockstore::get_slot_entries_with_shred_info

From the get_block() comments, Blockstore::get_rooted_block() is called for finalized slots whereas Blockstore::get_confirmed_block() is called for committed blocks:

Here is the implementation for get_rooted_block:

    pub fn get_rooted_block(
    ...
    ) -> Result<VersionedConfirmedBlock> {
        datapoint_info!("blockstore-rpc-api", ("method", "get_rooted_block", String));
        let _lock = self.check_lowest_cleanup_slot(slot)?;

        if self.is_root(slot) {
            return self.get_complete_block(slot, require_previous_blockhash);
        }
        Err(BlockstoreError::SlotNotRooted)
    }

And the implementation for check_lowest_cleanup_slot:

    fn check_lowest_cleanup_slot(&self, slot: Slot) -> Result<std::sync::RwLockReadGuard<Slot>> {
        // lowest_cleanup_slot is the last slot that was not cleaned up by LedgerCleanupService
        let lowest_cleanup_slot = self.lowest_cleanup_slot.read().unwrap();
        if *lowest_cleanup_slot > 0 && *lowest_cleanup_slot >= slot {
            return Err(BlockstoreError::SlotCleanedUp);
        }
        // Make caller hold this lock properly; otherwise LedgerCleanupService can purge/compact
        // needed slots here at any given moment
        Ok(lowest_cleanup_slot)
    }

Note that the lock is held for get_rooted_block() but not when get_complete_block() is called directly. The instance in which the individual who provided reliable report of hitting this said they were serving RPC requests on very old slots, some details in this Discord thread. Namely, Lijun commented:

Do you have RPC calls loading relatively older slots? The slot in question 145566485 was completed 4 hours before the panic

However, if the node was serving RPC requests on very old slots, those very old slots should have been rooted, so I think the codepath would have hit the route that involves lowest_cleanup_slot lock

@behzadnouri
Copy link
Contributor

The get_block() RPC call will also trigger this call, and the validator that hit this issue was also serving heavy load RPC calls.

I have seen this panic a lot on the cluster from non-rpc nodes.
I think it is a major bug if ledger clean-up service is removing shreds prematurely.

The way this commit is silencing the panic prevents us from identifying and fixing the root cause here.

I don't know about the RPC call path, but we need to make sure the other call paths (replay and anything else) don't simply ignore the error here.

@yhchiang-sol
Copy link
Contributor Author

I don't know about the RPC call path, but we need to make sure the other call paths (replay and anything else) don't simply ignore the error here.

Sure thing. I will create follow-up PR to make sure those critical code-path panic for this type of error.

@yhchiang-sol
Copy link
Contributor Author

I have seen this panic a lot on the cluster from non-rpc nodes.

@behzadnouri: Can I know whether this is seen recently or it has been there for a while? In addition, can I know whether this happens during the initial catch-up duration or not?

I think it is a major bug if ledger clean-up service is removing shreds prematurely.

Originally I was baking #27498 that makes ledger-cleanup-service honor the confirm slot progress, but I found the ledger-cleanup-service already honors the root so it never purges anything newer than the root inside find_slots_to_clean:

    fn find_slots_to_clean(
        blockstore: &Arc<Blockstore>,
        root: Slot,
        max_ledger_shreds: u64,
    ) -> (bool, Slot, u64) {
        ...
        for (i, (slot, meta)) in blockstore.slot_meta_iterator(0).unwrap().enumerate() {
            if i == 0 {
                debug!("purge: searching from slot: {}", slot);
            }
            // Not exact since non-full slots will have holes
            total_shreds += meta.received;
            total_slots.push((slot, meta.received));
            if slot > root { // <--- this makes sure we never return the lowest_cleanup_slot that is newer than root.
                break;
            }
        }

@behzadnouri
Copy link
Contributor

@behzadnouri: Can I know whether this is seen recently or it has been there for a while? In addition, can I know whether this happens during the initial catch-up duration or not?

There were some from last 30 day which I was querying but I don't remember their exact days.
You can query panics from the metrics:
https://github.com/solana-labs/solana/blob/2da93bd45/metrics/src/metrics.rs#L452-L460

@behzadnouri
Copy link
Contributor

https://discord.com/channels/428295358100013066/439194979856809985/1070199891609014292

This is now panicking on mainnet on v1.15.

@mvines
Copy link
Contributor

mvines commented Feb 1, 2023

This may be related to a reduced --limit-ledger-size that I hacked in, it may be harder to get the stock build to panic. I'll work on some STR and open an issue once I localize

@steviez
Copy link
Contributor

steviez commented Feb 1, 2023

I started peaking around code a little bit since I had touched some of these items recently and things are still fairly fresh. As a general refresher (for my own sake too), the issue stems from inconsistency between meta data and shreds for a slot. For some slot s, we have meta information (such as from a SlotMeta) that leads us to believe shreds should be present. But, we hit the panic if those shreds aren't actually retrievable.

Here are some miscellaneous notes; LCS = LedgerCleanupService:

  • LCS idles until it observes a new root that is 512 slots newer than from the last time it ran
    pub const DEFAULT_PURGE_SLOT_INTERVAL: u64 = 512;
    • mvines' reduced --limit-ledger-size could be around this such that we're hitting some weird edge case where we're essentially purging all unrooted slot
  • LCS has a guard to avoid cleaning newer than latest root (these roots passed to LCS through a crossbeam channel)
    // Ensure we don't cleanup anything past the last root we saw
    let lowest_cleanup_slot = std::cmp::min(lowest_slot + num_slots_to_clean - 1, root);
    • As a result of this check, the issue should NOT be occurring from ReplayStage (through blockstore_processor), as a slot will not be rooted (or at least not known to be rooted to us if we're catching up) until after we've replayed it

Reading the meta information and reading the shreds are separate queries to rocks, so it is seemingly the case that a slot is getting cleaned up between reads. We could grab a lowest_cleanup_slot read-lock after querying the meta information, but the function is called so often, that this could potentially interfere with LCS (multiple threads may be calling this function, can't recall if heavy/overlapping read-locks would starve the write-lock needed by LCS).

If we get steps to repro, I think the following things would be helpful:

  • Full backtrace so we can see which service is panicking (ie confirm if my theory about it not being ReplayStage is true)
  • Logs from shortly before the panic, specifically, those from solana_core::ledger_cleanup_service.
    • This would confirm if the slot we panicked on was just cleaned up.

At the moment, my money is on CompletedDataSetService; it calls Blockstore::get_entries_in_data_block() directly with meta information that it pulls out of a crossbeam channel, so a decent opportunity for latency to creep in between reads.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants