Skip to content
This repository has been archived by the owner on Jan 22, 2025. It is now read-only.

Add address lookup tables to minimized snapshot #30158

Merged
merged 4 commits into from
Feb 10, 2023

Conversation

ryoqun
Copy link
Contributor

@ryoqun ryoqun commented Feb 7, 2023

Problem

Currently, we can't create a minimized snapshot for mainnet-beta due to wide-spread use of address-lookup-table txes which is currently-unsupported for the minimized snapshot.

Summary of Changes

Support it. Needed bunch of seemingly unrelated changes...

@ryoqun ryoqun requested a review from apfitzge February 7, 2023 06:06
@ryoqun ryoqun added v1.14 v1.15 (abandoned) The v1.15 branch has been abandoned and removed v1.14 labels Feb 7, 2023
@@ -1918,6 +1918,7 @@ fn main() {
.arg(&no_snapshot_arg)
.arg(&account_paths_arg)
.arg(&accounts_db_skip_initial_hash_calc_arg)
.arg(&accountsdb_skip_shrink)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it looks like this speeds up a bit? Feature parity with other solana-ledger-tool subcommands.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah probably a good option to allow - nice.

&self,
tx: VersionedTransaction,
) -> Result<SanitizedTransaction> {
self.verify_transaction(tx, TransactionVerificationMode::FullVerification)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i love type safety. but it's too verbose to impose this fn signature at call-site, imo.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also, these small wrapper fns avoid new use TransactionVerificationMode-ing only for the sake of this fn invocation at call-site too

@@ -1912,6 +1912,7 @@ impl Bank {
additional_builtins,
debug_do_not_add_builtins,
);
bank.fill_missing_sysvar_cache_entries();
Copy link
Contributor Author

@ryoqun ryoqun Feb 7, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so, this single new line will be most objectionable change in this pr without in-source comments... Arguably, this is touching an important code path.

after some research, in short, i think this is a simple overlook at both https://github.com/solana-labs/solana/pull/21108/files#r1098224198 #21108 and https://github.com/solana-labs/solana/pull/22455/files#r1098222965 #22455

So, how this unintentional omission relates to this pr?:

bank fails to create sanitized tx from valid VersionedTransaction if sysvar isn't properly populated due to the internal use of slothashes. And this is a edge case (took a while to figure out because sometimes it worked by random args/ledgers). That's because normally we replay a bunch of slots after snapshot load. so, these new child banks are filled with sysvar caches via (new_from_parent). This bug only manifests when we immediately consume the snapshot bank without any replay.

As for justification of touching this code path, I considered to relax pub(crate) on fill_missing_sysvar_cache_entries() and call it inside get_accounts_used_in_range or minimize_bank_for_snapshot, under this pr's strong backport desire.

However, I concluded the risk is small overall (use of sysvar cache on snapshot bank is VERY rare as this isn't noticed for months) then, that this correct (yet broad) fix should instead be preferred to prevent potential future subtle bugs and another dev suffering from this unhappy rabbit hole.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cc: @Lichtso @jstarry as respective referenced pr authors.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

However, I concluded the risk is small overall (use of sysvar cache on snapshot bank is VERY rare as this isn't noticed for months)

I think normal validator operations would never hit this case since we replay and create child banks.

If there were a network outage (plz no) and we needed to restart from a snapshot, would we hit this case and be unable to restart w/o this patch?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If there were a network outage (plz no) and we needed to restart from a snapshot, would we hit this case and be unable to restart w/o this patch?

fortunately, it's not. the pesky hard network outage (hard-fork) case is rigorously tested by local-cluster tests.

as far as i checked, sysvar_cache is only used for tx processing. And the snapshot bank itself is frozen/rooted by definition. So, there should be no tx processing. I bet some frozen assert will be triggered, if ever done.

However, we need populated sysvar_cache for minimized snapshot as a kind of unusual use case here.

So, the other component which is affected would be rpc's tx preflight, with alt tx and with air-gapped node. (this is little known trick, but you can inspect account state at single point of time with rich set of rpc functionality if you make machine stand-alone with empty ledger with single snapshot).

On the other hand, starting to populate sysvar_cache shouldn't cause any bad effects as well. After all, it's rather harmless operation.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it; it does seem like it was an oversight from the previous PRs on just not including this code on the initialization from fields.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes you are right. Can't think of any reason why it shouldn't be called here as well. Seems like I simply missed it and it did not break any tests.

result.insert(lookup.account_key);
});
}
let sanitized_tx = bank.fully_verify_transaction(tx).unwrap();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Definitely better than it was, as when I wrote this originally I didn't even know lookup tables were a thing - but I'm not 100% convinced this is correct either. If a lookup table is extended in the slot range, then a tx after that could fail to verify this check (bank is looking at beginning of range) and we wouldn't add the associated keys for the transaction.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@apfitzge hehe, i know there's buried compromise here: how about this helpful, cheerful, encouraging comment?: db26aed

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

patches are welcome but it'll take some good deal of efforts to support this fully....

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

^ agreed. I'm not sure how many people actually use this subcommand - but not sure its worth distracting us over scheduler.

I'd update the comment to include this issue I just made: #30165 😄

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

patches are welcome but it'll take some good deal of efforts to support this fully....

Yeah, it might be that we just have to replay the slot range and add accounts as we see them that way which is a bit of effort.
I hadn't taken that approach initially because we need to keep account state at the beginning of the snapshot range in order to create it, and process of (unpacking + replay + delete + unpacking + minimize + snapshot) seemed too long.
And also the fact ALTs weren't in (to my knowledge) at the time I started on the project as a total Solana noob haha.

With @xiangzhu70's change for keeping and loading from snapshot files will allow us to do this in a timely manner!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd update the comment to include this issue I just made: #30165 😄

thanks for creating issue; just done: a2b8565

With @xiangzhu70's change for keeping and loading from snapshot files will allow us to do this in a timely manner!

Oh sounds promising. :)

@ryoqun ryoqun force-pushed the alt-in-minimized-snapshot branch from a2b8565 to a25ba03 Compare February 8, 2023 00:59
apfitzge
apfitzge previously approved these changes Feb 8, 2023
Copy link
Contributor

@apfitzge apfitzge left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Happy with the actual changes, with some nits on comments. Would be nice to get confirmation from @Lichtso that we're not overlooking a reason we shouldn't fill missing sysvar on loading from fields.

ledger/src/blockstore.rs Outdated Show resolved Hide resolved
ledger/src/blockstore.rs Outdated Show resolved Hide resolved
@@ -1912,6 +1912,7 @@ impl Bank {
additional_builtins,
debug_do_not_add_builtins,
);
bank.fill_missing_sysvar_cache_entries();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it; it does seem like it was an oversight from the previous PRs on just not including this code on the initialization from fields.

Co-authored-by: Andrew Fitzgerald <[email protected]>
Copy link
Contributor

@apfitzge apfitzge left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@ryoqun ryoqun merged commit 3e6162e into solana-labs:master Feb 10, 2023
mergify bot pushed a commit that referenced this pull request Feb 10, 2023
* Add address lookup tables to minimized snapshot

* Add comment for future posterity

* Add reference to the issue

* Adjust comment a bit

Co-authored-by: Andrew Fitzgerald <[email protected]>

---------

Co-authored-by: Andrew Fitzgerald <[email protected]>
(cherry picked from commit 3e6162e)
});
entries.into_par_iter().for_each(|entry| {
entry.transactions.into_iter().for_each(|tx| {
if let Some(lookups) = tx.message.address_table_lookups() {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@t-nelson here

nickfrosty pushed a commit to nickfrosty/solana that referenced this pull request Mar 12, 2023
* Add address lookup tables to minimized snapshot

* Add comment for future posterity

* Add reference to the issue

* Adjust comment a bit

Co-authored-by: Andrew Fitzgerald <[email protected]>

---------

Co-authored-by: Andrew Fitzgerald <[email protected]>
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
v1.15 (abandoned) The v1.15 branch has been abandoned
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants