Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add ability to output Bank hash details #32632

Merged
merged 23 commits into from
Aug 15, 2023

Conversation

steviez
Copy link
Contributor

@steviez steviez commented Jul 26, 2023

Problem

When a consensus mismatch occurs, the current workflow involves a handful of manual steps to isolate down to the offending slot/transaction. This process is loosely defined below, and while not difficult to execute, it is somewhat tedious and involves arbitrarily creating and parsing some logs:
https://github.com/solana-labs/solana/wiki/Debugging-Consensus-Failures

Summary of Changes

Add functionality to dump the information that goes into a bank hash to a file. Additionally, generate that file when a node deviates from consensus (purges a slot in ReplayStage). With this file in hand, someone who has observed a node deviating only needs to generate the same file via solana-ledger-tool with a known good version to serve as a comparison to the one generated from validator. The file is currently printed as json and the accounts sorted which makes it both 1) human-readable and 2) diff friendly.

@steviez steviez force-pushed the bank_hash_capture branch 2 times, most recently from d7c883b to b9cd4cb Compare July 26, 2023 15:54
@steviez
Copy link
Contributor Author

steviez commented Jul 26, 2023

@ripatel-fd - Here is the base PR for creating these files in the labs client. I'll post a sample file later there, but given some DM's and Discord, I know you had been exploring a binary format for storing these files and it also seems like you all were potentially interested in streaming these files for every slot from a labs validator for the sake of creating a set to compare against in order to trackdown incompatibilities in firedancer

@codecov
Copy link

codecov bot commented Jul 26, 2023

Codecov Report

Merging #32632 (6aa7140) into master (21c6ac6) will decrease coverage by 0.1%.
The diff coverage is 87.8%.

@@            Coverage Diff            @@
##           master   #32632     +/-   ##
=========================================
- Coverage    82.0%    82.0%   -0.1%     
=========================================
  Files         784      785      +1     
  Lines      211897   212069    +172     
=========================================
+ Hits       173851   173971    +120     
- Misses      38046    38098     +52     

runtime/src/accounts_db.rs Outdated Show resolved Hide resolved
runtime/src/bank.rs Outdated Show resolved Hide resolved
@steviez
Copy link
Contributor Author

steviez commented Jul 27, 2023

Here are some file sizes

$ ls -hl
-rw-rw-r-- 1 sol sol 8.2M Jul 27 09:19 203546517-G1ThhNa1e2LqaFaV4MEVssXe32EyCnFedYySrpWt1LHg_base64.json
-rw-rw-r-- 1 sol sol 761K Jul 27 08:35 203546517-G1ThhNa1e2LqaFaV4MEVssXe32EyCnFedYySrpWt1LHg_no_data.json
-rw-rw-r-- 1 sol sol  64M Jul 27 08:50 203546517-G1ThhNa1e2LqaFaV4MEVssXe32EyCnFedYySrpWt1LHg_u8.json

GitHub won't let me load up the .json files, but here is a snippet from the base64 one:

{
  "version": "1.17.0 (src:00000000; feat:461617547, client:SolanaLabs)",
  "slot": 203546517,
  "hash": "G1ThhNa1e2LqaFaV4MEVssXe32EyCnFedYySrpWt1LHg",
  "parent_hash": "FdCQJsKX1cWzKyVyjMqcvJeDmCW46tCA2MSXVNwobcNf",
  "accounts_delta_hash": "3AceoCg7mGo3G6Z4e7iWFpFc4Nf2aWUZEqNT65QaMSej",
  "signature_count": 1363,
  "last_blockhash": "94tm1SDn34iKWjgLsQCGoXkUpvSmMZAJ9eTPkH6RfbAo",
  "accounts": [
    {
      "pubkey": "1M5USfamd1N4i1z6UZeECrWeu2VfrxjYMBSXThu6TqB",
      "hash": "Ggea5CLgo53Zxuw6HXAuJUdq4dD64y5uZfXhrYt7SVJ5",
      "lamports": 44526227501,
      "rent_epoch": 0,
      "executable": false,
      "data": "AQAAAAQLOJOsu3tFbQhfZiz+y/Z0TLwW7WNn61hQk6BpWzrJBGaqyVrGvP7InUJygJLMO2NqEDF0ZB2f1aiAQ90NcxEKHwAAAAAAAAB23yEMAAAAAB8AAAB33yEMAAAAAB4AAAB43yEMAAAAAB0AAAB53yEMAAAAABwAAAB63yEMAAAAABsAAAB73yEMAAAAABoAAAB83yEMAAAAABkAAAB93yEMAAAAABgAAAB+3yEMAAAAABcAAAB/3yEMAAAAABYAAACA3yEMAAAAABUAAACB3yEMAAAAABQAAACC3yEMAAAAABMAAACD3yEMAAAAABIAAACE3yEMAAAAABEAAACF3yEMAAAAABAAAACG3yEMAAAAAA8AAACH3yEMAAAAAA4AAACI3yEMAAAAAA0AAACJ3yEMAAAAAAwAAACK3yEMAAAAAAsAAACL3yEMAAAAAAoAAACM3yEMAAAAAAkAAACN3yEMAAAAAAgAAACO3yEMAAAAAAcAAACP3yEMAAAAAAYAAACQ3yEMAAAAAAUAAACR3yEMAAAAAAQAAACS3yEMAAAAAAMAAACT3yEMAAAAAAIAAACU3yEMAAAAAAEAAAABdd8hDAAAAAABAAAAAAAAANcBAAAAAAAABAs4k6y7e0VtCF9mLP7L9nRMvBbtY2frWFCToGlbOskfAAAAAAAAAAFAAAAAAAAAAJgBAAAAAAAAHlncAAAAAACymdYAAAAAAJkBAAAAAAAAFK/hAAAAAAAeWdwAAAAAAJoBAAAAAAAAWD7nAAAAAAAUr+EAAAAAAJsBAAAAAAAAAojsAAAAAABYPucAAAAAAJwBAAAAAAAAF7nxAAAAAAACiOwAAAAAAJ0BAAAAAAAAzwr3AAAAAAAXufEAAAAAAJ4BAAAAAAAAm0P8AAAAAADPCvcAAAAAAJ8BAAAAAAAAW2sBAQAAAACbQ/wAAAAAAKABAAAAAAAA6lgHAQAAAABbawEBAAAAAKEBAAAAAAAAQ0MNAQAAAADqWAcBAAAAAKIBAAAAAAAA41oTAQAAAABDQw0BAAAAAKMBAAAAAAAAFmsZAQAAAADjWhMBAAAAAKQBAAAAAAAA/2UfAQAAAAAWaxkBAAAAAKUBAAAAAAAAwIwlAQAAAAD/ZR8BAAAAAKYBAAAAAAAAjaMrAQAAAADAjCUBAAAAAKcBAAAAAAAAd7cxAQAAAACNoysBAAAAAKgBAAAAAAAAawc4AQAAAAB3tzEBAAAAAKkBAAAAAAAA/lo+AQAAAABrBzgBAAAAAKoBAAAAAAAADqhEAQAAAAD+Wj4BAAAAAKsBAAAAAAAALA5LAQAAAAAOqEQBAAAAAKwBAAAAAAAA1HdRAQAAAAAsDksBAAAAAK0BAAAAAAAA3NBXAQAAAADUd1EBAAAAAK4BAAAAAAAACideAQAAAADc0FcBAAAAAK8BAAAAAAAAwIxkAQAAAAAKJ14BAAAAALABAAAAAAAASfVqAQAAAADAjGQBAAAAALEBAAAAAAAAPWBxAQAAAABJ9WoBAAAAALIBAAAAAAAAPsh3AQAAAAA9YHEBAAAAALMBAAAAAAAADyJ+AQAAAAA+yHcBAAAAALQBAAAAAAAAUH2EAQAAAAAPIn4BAAAAALUBAAAAAAAAZWaKAQAAAABQfYQBAAAAALYBAAAAAAAAfWWQAQAAAABlZooBAAAAALcBAAAAAAAAppyWAQAAAAB9ZZABAAAAALgBAAAAAAAAZuScAQAAAACmnJYBAAAAALkBAAAAAAAA2jKjAQAAAABm5JwBAAAAALoBAAAAAAAA1YmpAQAAAADaMqMBAAAAALsBAAAAAAAA+9avAQAAAADViakBAAAAALwBAAAAAAAAN/q1AQAAAAD71q8BAAAAAL0BAAAAAAAABy+8AQAAAAA3+rUBAAAAAL4BAAAAAAAA6m3CAQAAAAAHL7wBAAAAAL8BAAAAAAAANoDIAQAAAADqbcIBAAAAAMABAAAAAAAAErnOAQAAAAA2gMgBAAAAAMEBAAAAAAAAQfzUAQAAAAASuc4BAAAAAMIBAAAAAAAAM0nbAQAAAABB/NQBAAAAAMMBAAAAAAAA0YThAQAAAAAzSdsBAAAAAMQBAAAAAAAAELrnAQAAAADRhOEBAAAAAMUBAAAAAAAA4+jtAQAAAAAQuucBAAAAAMYBAAAAAAAAbUH0AQAAAADj6O0BAAAAAMcBAAAAAAAAVpn6AQAAAABtQfQBAAAAAMgBAAAAAAAAtuIAAgAAAABWmfoBAAAAAMkBAAAAAAAAizMHAgAAAAC24gACAAAAAMoBAAAAAAAA0nYNAgAAAACLMwcCAAAAAMsBAAAAAAAAYrgTAgAAAADSdg0CAAAAAMwBAAAAAAAAvK4ZAgAAAABiuBMCAAAAAM0BAAAAAAAARPEfAgAAAAC8rhkCAAAAAM4BAAAAAAAAiEcmAgAAAABE8R8CAAAAAM8BAAAAAAAAlKosAgAAAACIRyYCAAAAANABAAAAAAAAOwAzAgAAAACUqiwCAAAAANEBAAAAAAAAOkc5AgAAAAA7ADMCAAAAANIBAAAAAAAAxJs/AgAAAAA6RzkCAAAAANMBAAAAAAAAgu1FAgAAAADEmz8CAAAAANQBAAAAAAAAEuxLAgAAAACC7UUCAAAAANUBAAAAAAAAl0ZSAgAAAAAS7EsCAAAAANYBAAAAAAAAXKNYAgAAAACXRlICAAAAANcBAAAAAAAAtr1ZAgAAAABco1gCAAAAAJTfIQwAAAAAYVSlZAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA="
    },
    {
      "pubkey": "1bB6Z9iaL1TM4VTo54B5w1FgCXFDbbtVkVJu4VbiNNc",
      "hash": "EeszVJLmgBwf8JncRjvD7pY6z8dyfNeKXrXhSwAxxK7g",
      "lamports": 2808284745,
      "rent_epoch": 361,
      "executable": false,
      "data": "AQAAAKLwte7F6IHPUBvvLVgsXjIZdw9IL574LpFDumh4Ruofrbq6hujHGziQEhBQufWTJTHc/Y1QxdtZIImoMH4/GOIKHwAAAAAAAAB23yEMAAAAAB8AAAB33yEMAAAAAB4AAAB43yEMAAAAAB0AAAB53yEMAAAAABwAAAB63yEMAAAAABsAAAB73yEMAAAAABoAAAB83yEMAAAAABkAAAB93yEMAAAAABgAAAB+3yEMAAAAABcAAAB/3yEMAAAAABYAAACA3yEMAAAAABUAAACB3yEMAAAAABQAAACC3yEMAAAAABMAAACD3yEMAAAAABIAAACE3yEMAAAAABEAAACF3yEMAAAAABAAAACG3yEMAAAAAA8AAACH3yEMAAAAAA4AAACI3yEMAAAAAA0AAACJ3yEMAAAAAAwAAACK3yEMAAAAAAsAAACL3yEMAAAAAAoAAACM3yEMAAAAAAkAAACN3yEMAAAAAAgAAACO3yEMAAAAAAcAAACP3yEMAAAAAAYAAACQ3yEMAAAAAAUAAACR3yEMAAAAAAQAAACS3yEMAAAAAAMAAACT3yEMAAAAAAIAAACU3yEMAAAAAAEAAAABdd8hDAAAAAABAAAAAAAAANcBAAAAAAAAovC17sXogc9QG+8tWCxeMhl3D0gvnvgukUO6aHhG6hfAAAAAAAAAAFAAAAAAAAAAJgBAAAAAAAAat1GAQAAAABdKUEBAAAAAJkBAAAAAAAApClMAQAAAABq3UYBAAAAAJoBAAAAAAAAK7BRAQAAAACkKUwBAAAAAJsBAAAAAAAAxu5WAQAAAAArsFEBAAAAAJwBAAAAAAAAGRBcAQAAAADG7lYBAAAAAJ0BAAAAAAAA3FFhAQAAAAAZEFwBAAAAAJ4BAAAAAAAAAmhmAQAAAADcUWEBAAAAAJ8BAAAAAAAACKxrAQAAAAACaGYBAAAAAKABAAAAAAAAu5dxAQAAAAAIrGsBAAAAAKEBAAAAAAAAfn93AQAAAAC7l3EBAAAAAKIBAAAAAAAATJV9AQAAAAB+f3cBAAAAAKMBAAAAAAAAAaCDAQAAAABMlX0BAAAAAKQBAAAAAAAArJiJAQAAAAABoIMBAAAAAKUBAAAAAAAAar2PAQAAAACsmIkBAAAAAKYBAAAAAAAA1NGVAQAAAABqvY8BAAAAAKcBAAAAAAAAq7WbAQAAAADU0ZUBAAAAAKgBAAAAAAAA2wKiAQAAAACrtZsBAAAAAKkBAAAAAAAAYVaoAQAAAADbAqIBAAAAAKoBAAAAAAAAu5WuAQAAAABhVqgBAAAAAKsBAAAAAAAATfu0AQAAAAC7la4BAAAAAKwBAAAAAAAAf2W7AQAAAABN+7QBAAAAAK0BAAAAAAAA7r3BAQAAAAB/ZbsBAAAAAK4BAAAAAAAApxLIAQAAAADuvcEBAAAAAK8BAAAAAAAAGHbOAQAAAACnEsgBAAAAALABAAAAAAAAxNfUAQAAAAAYds4BAAAAALEBAAAAAAAA0UHbAQAAAADE19QBAAAAALIBAAAAAAAAiKnhAQAAAADRQdsBAAAAALMBAAAAAAAAOwLoAQAAAACIqeEBAAAAALQBAAAAAAAA4lnuAQAAAAA7AugBAAAAALUBAAAAAAAAoZfzAQAAAADiWe4BAAAAALYBAAAAAAAA5n/5AQAAAAChl/MBAAAAALcBAAAAAAAAgIr/AQAAAADmf/kBAAAAALgBAAAAAAAANM8FAgAAAACAiv8BAAAAALkBAAAAAAAA0R0MAgAAAAA0zwUCAAAAALoBAAAAAAAAinUSAgAAAADRHQwCAAAAALsBAAAAAAAAN8EYAgAAAACKdRICAAAAALwBAAAAAAAAoeIeAgAAAAA3wRgCAAAAAL0BAAAAAAAASxUlAgAAAACh4h4CAAAAAL4BAAAAAAAAeFIrAgAAAABLFSUCAAAAAL8BAAAAAAAAp1cxAgAAAAB4UisCAAAAAMABAAAAAAAAVY83AgAAAACnVzECAAAAAMEBAAAAAAAAX9E9AgAAAABVjzcCAAAAAMIBAAAAAAAAhBtEAgAAAABf0T0CAAAAAMMBAAAAAAAAmTtKAgAAAACEG0QCAAAAAMQBAAAAAAAA0oBQAgAAAACZO0oCAAAAAMUBAAAAAAAA9qtWAgAAAADSgFACAAAAAMYBAAAAAAAARQFdAgAAAAD2q1YCAAAAAMcBAAAAAAAAiVZjAgAAAABFAV0CAAAAAMgBAAAAAAAA5plpAgAAAACJVmMCAAAAAMkBAAAAAAAAS+dvAgAAAADmmWkCAAAAAMoBAAAAAAAAsyV2AgAAAABL528CAAAAAMsBAAAAAAAAsGR8AgAAAACzJXYCAAAAAMwBAAAAAAAAFHOCAgAAAACwZHwCAAAAAM0BAAAAAAAACrKIAgAAAAAUc4ICAAAAAM4BAAAAAAAAiAWPAgAAAAAKsogCAAAAAM8BAAAAAAAAgGWVAgAAAACIBY8CAAAAANABAAAAAAAAObabAgAAAACAZZUCAAAAANEBAAAAAAAAD4ChAgAAAAA5tpsCAAAAANIBAAAAAAAA4NOnAgAAAAAPgKECAAAAANMBAAAAAAAAoyKuAgAAAADg06cCAAAAANQBAAAAAAAAHXK0AgAAAACjIq4CAAAAANUBAAAAAAAAkMi6AgAAAAAdcrQCAAAAANYBAAAAAAAAliLBAgAAAACQyLoCAAAAANcBAAAAAAAAwzvCAgAAAACWIsECAAAAAJTfIQwAAAAAYVSlZAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA="
    },

I'll give the base64+zstd a try too; even though we'll probably land on a single encoding here, probably makes sense to include what the account data encoding format is at the top too.

I'll probably write Deserialize trait implementation for BankHashAccounts too for the sake of being able to have a sanity check unit test.

@jumpsiegel
Copy link

Various diffs for how we capture useful information while debugging firedancer verses solana...

• diff against master
firedancer-io@985d447
▪ This is incomplete (looks like I didn't port various parts across
• diff against v1.14
firedancer-io@250b3e7

▪ More actively in use
• modified hash_account_data_compare:
firedancer-io@985d447#diff-1090394420d51617f3233275c2b65ed706b35b53b115fe65f82c682af8134a6f
▪ This change allows us easily to look for where specific accounts started to hash differently and what fields might have caused it

• banks.rs
firedancer-io@985d447#diff-ed47b4a0198313377e091bb3957bbbc63d937805426d1b2b6de39d0a50d32a0c
▪ All deposits now have a amount and a back trace bank frozen also include the parent hash

• src/account.rs
firedancer-io@985d447#diff-4b6f7fd329b8ff5deee39fc5a1933ab7a6588bbdc4cdef96a82874336ce6204f
▪ All calls to set_lamports in general

• src/fee.rs
firedancer-io@985d447#diff-dc3100b413f7846958dc91a974f3ee159f5e20f191defc4160eb96fe43fe8d45
▪ Parts of all fee calculations... Would be nice if this had the txn it was associated with

• ledger/src/leader_schedule.rs
firedancer-io@250b3e7#diff-53a46a830d3ec984d0198218aae02bb07f1a7b3feb42eb65ea911db56e75478d
▪ What went into the leader schedules

• programs/bpf_loader/src/lib.rs
firedancer-io@250b3e7#diff-b225f839559f40bfb9322cde13dbf4cb0516567faa672daf63dc637bdd9e0cd0
▪ full program traces

• runtime/src/accounts_hash.rs
firedancer-io@250b3e7#diff-a1403578b5edda892f67cf8a6850fb0f330cf3c1d532601b828a53225cdb2680
▪ Specifically each and every hash, in order, used to create the account_delta_hash

• invoke_context.rs
firedancer-io@250b3e7#diff-9ac236825ce9b87946d6820e3ec3096bc841818c66037b38b6d98142318f4b0a
▪ How we extract the input/output of every test as json so that we can reproduce it
firedancer-io@985d447#diff-9ac236825ce9b87946d6820e3ec3096bc841818c66037b38b6d98142318f4b0a
▪ WIP of the same change on master but also capturing out-of-band changes to the SysvarCache used in the tests

• system_instruction_processor.rs
firedancer-io@250b3e7#diff-3f8645400d0e866a22f752c366c32d95e1fab1320db79bbda3c699da1b4e39af
▪ Tests we have not ported to master yet so that they can be PR’d back into solana

@steviez
Copy link
Contributor Author

steviez commented Jul 31, 2023

Summing up a call:

  • As-is, this PR provides value to Solana Labs in reducing the amount of manual steps required to triage
  • This PR is nominally useful for Firedancer; however, MUCH more granular data is desired/required
    • FWIW, this more granular data will be of value to debug the Labs client as well
  • Given the above, I think a reasonable outcome is to ship this PR as-is and to continue extending this struct/file with more details
    • The idea would be to add more optional fields; a validator that panics at runtime may not emit these, but they could be enabled running ledger-tool to give the extra detail when desired

@steviez steviez marked this pull request as ready for review July 31, 2023 19:54
@steviez steviez requested a review from t-nelson July 31, 2023 19:54
@steviez steviez changed the title (WIP) Bank hash capture Add ability to output Bank hash details Jul 31, 2023
@ripatel-fd
Copy link
Contributor

This looks great. Do you have an estimate when this PR will land master?

@steviez steviez force-pushed the bank_hash_capture branch 2 times, most recently from 3f8d9d2 to 65134ac Compare August 9, 2023 22:46
runtime/src/bank/bank_hash_details.rs Outdated Show resolved Hide resolved
runtime/src/bank/bank_hash_details.rs Outdated Show resolved Hide resolved
runtime/src/bank/bank_hash_details.rs Outdated Show resolved Hide resolved
runtime/src/bank/bank_hash_details.rs Outdated Show resolved Hide resolved
ledger-tool/src/main.rs Outdated Show resolved Hide resolved
@t-nelson
Copy link
Contributor

can you rebase on master since 52616cf7aa4 to pickup the rustsec ignore?

@t-nelson
Copy link
Contributor

r+ rebase

@steviez
Copy link
Contributor Author

steviez commented Aug 15, 2023

can you rebase on master since 52616cf7aa4 to pickup the rustsec ignore?

Done!

Copy link
Contributor

@t-nelson t-nelson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:shipit:

@steviez steviez merged commit 6bbf514 into solana-labs:master Aug 15, 2023
6 checks passed
@steviez steviez deleted the bank_hash_capture branch August 15, 2023 05:12
steviez added a commit to steviez/solana that referenced this pull request Nov 16, 2023
…32632)

When a consensus divergance occurs, the current workflow involves a
handful of manual steps to hone in on the offending slot and
transaction. This process isn't overly difficult to execute; however, it
is tedious and currently involves creating and parsing logs.

This change introduces functionality to output a debug file that
contains the components go into the bank hash. The file can be generated
in two ways:
- Via solana-validator when the node realizes it has diverged
- Via solana-ledger-tool verify by passing a flag

When a divergance occurs now, the steps to debug would be:
- Grab the file from the node that diverged
- Generate a file for the same slot with ledger-tool with a known good
  version
- Diff the files, they are pretty-printed json
steviez added a commit to steviez/solana that referenced this pull request Nov 28, 2023
…32632)

When a consensus divergance occurs, the current workflow involves a
handful of manual steps to hone in on the offending slot and
transaction. This process isn't overly difficult to execute; however, it
is tedious and currently involves creating and parsing logs.

This change introduces functionality to output a debug file that
contains the components go into the bank hash. The file can be generated
in two ways:
- Via solana-validator when the node realizes it has diverged
- Via solana-ledger-tool verify by passing a flag

When a divergance occurs now, the steps to debug would be:
- Grab the file from the node that diverged
- Generate a file for the same slot with ledger-tool with a known good
  version
- Diff the files, they are pretty-printed json
@steviez steviez added the v1.16 PRs that should be backported to v1.16 label Nov 28, 2023
Copy link
Contributor

mergify bot commented Nov 28, 2023

Backports to the stable branch are to be avoided unless absolutely necessary for fixing bugs, security issues, and perf regressions. Changes intended for backport should be structured such that a minimum effective diff can be committed separately from any refactoring, plumbing, cleanup, etc that are not strictly necessary to achieve the goal. Any of the latter should go only into master and ride the normal stabilization schedule.

mergify bot pushed a commit that referenced this pull request Nov 28, 2023
When a consensus divergance occurs, the current workflow involves a
handful of manual steps to hone in on the offending slot and
transaction. This process isn't overly difficult to execute; however, it
is tedious and currently involves creating and parsing logs.

This change introduces functionality to output a debug file that
contains the components go into the bank hash. The file can be generated
in two ways:
- Via solana-validator when the node realizes it has diverged
- Via solana-ledger-tool verify by passing a flag

When a divergance occurs now, the steps to debug would be:
- Grab the file from the node that diverged
- Generate a file for the same slot with ledger-tool with a known good
  version
- Diff the files, they are pretty-printed json

(cherry picked from commit 6bbf514)

# Conflicts:
#	Cargo.lock
#	ledger-tool/src/args.rs
#	ledger-tool/src/main.rs
#	programs/sbf/Cargo.lock
#	runtime/Cargo.toml
#	runtime/src/accounts_db.rs
#	validator/src/main.rs
steviez added a commit that referenced this pull request Dec 1, 2023
…34257)

* Add ability to output components that go into Bank hash (#32632)

When a consensus divergance occurs, the current workflow involves a
handful of manual steps to hone in on the offending slot and
transaction. This process isn't overly difficult to execute; however, it
is tedious and currently involves creating and parsing logs.

This change introduces functionality to output a debug file that
contains the components go into the bank hash. The file can be generated
in two ways:
- Via solana-validator when the node realizes it has diverged
- Via solana-ledger-tool verify by passing a flag

When a divergance occurs now, the steps to debug would be:
- Grab the file from the node that diverged
- Generate a file for the same slot with ledger-tool with a known good
  version
- Diff the files, they are pretty-printed json

(cherry picked from commit 6bbf514)

# Conflicts:
#	Cargo.lock
#	ledger-tool/src/args.rs
#	ledger-tool/src/main.rs
#	programs/sbf/Cargo.lock
#	runtime/Cargo.toml
#	runtime/src/accounts_db.rs
#	validator/src/main.rs

* Merge conflict

* Reorder base_wotking with accounts_hash to match other branches

---------

Co-authored-by: steviez <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
v1.16 PRs that should be backported to v1.16
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants