-
Notifications
You must be signed in to change notification settings - Fork 4.5k
Add address lookup tables to minimized snapshot #30158
Conversation
@@ -1918,6 +1918,7 @@ fn main() { | |||
.arg(&no_snapshot_arg) | |||
.arg(&account_paths_arg) | |||
.arg(&accounts_db_skip_initial_hash_calc_arg) | |||
.arg(&accountsdb_skip_shrink) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it looks like this speeds up a bit? Feature parity with other solana-ledger-tool
subcommands.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah probably a good option to allow - nice.
&self, | ||
tx: VersionedTransaction, | ||
) -> Result<SanitizedTransaction> { | ||
self.verify_transaction(tx, TransactionVerificationMode::FullVerification) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i love type safety. but it's too verbose to impose this fn signature at call-site, imo.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also, these small wrapper fns avoid new use TransactionVerificationMode
-ing only for the sake of this fn invocation at call-site too
@@ -1912,6 +1912,7 @@ impl Bank { | |||
additional_builtins, | |||
debug_do_not_add_builtins, | |||
); | |||
bank.fill_missing_sysvar_cache_entries(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so, this single new line will be most objectionable change in this pr without in-source comments... Arguably, this is touching an important code path.
after some research, in short, i think this is a simple overlook at both https://github.com/solana-labs/solana/pull/21108/files#r1098224198 #21108 and https://github.com/solana-labs/solana/pull/22455/files#r1098222965 #22455
So, how this unintentional omission relates to this pr?:
bank fails to create sanitized tx from valid VersionedTransaction
if sysvar isn't properly populated due to the internal use of slothashes. And this is a edge case (took a while to figure out because sometimes it worked by random args/ledgers). That's because normally we replay a bunch of slots after snapshot load. so, these new child banks are filled with sysvar caches via (new_from_parent
). This bug only manifests when we immediately consume the snapshot bank without any replay.
As for justification of touching this code path, I considered to relax pub(crate)
on fill_missing_sysvar_cache_entries()
and call it inside get_accounts_used_in_range
or minimize_bank_for_snapshot
, under this pr's strong backport desire.
However, I concluded the risk is small overall (use of sysvar cache on snapshot bank is VERY rare as this isn't noticed for months) then, that this correct (yet broad) fix should instead be preferred to prevent potential future subtle bugs and another dev suffering from this unhappy rabbit hole.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
However, I concluded the risk is small overall (use of sysvar cache on snapshot bank is VERY rare as this isn't noticed for months)
I think normal validator operations would never hit this case since we replay and create child banks.
If there were a network outage (plz no) and we needed to restart from a snapshot, would we hit this case and be unable to restart w/o this patch?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If there were a network outage (plz no) and we needed to restart from a snapshot, would we hit this case and be unable to restart w/o this patch?
fortunately, it's not. the pesky hard network outage (hard-fork) case is rigorously tested by local-cluster tests.
as far as i checked, sysvar_cache
is only used for tx processing. And the snapshot bank itself is frozen/rooted by definition. So, there should be no tx processing. I bet some frozen assert will be triggered, if ever done.
However, we need populated sysvar_cache for minimized snapshot as a kind of unusual use case here.
So, the other component which is affected would be rpc's tx preflight, with alt tx and with air-gapped node. (this is little known trick, but you can inspect account state at single point of time with rich set of rpc functionality if you make machine stand-alone with empty ledger with single snapshot).
On the other hand, starting to populate sysvar_cache shouldn't cause any bad effects as well. After all, it's rather harmless operation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it; it does seem like it was an oversight from the previous PRs on just not including this code on the initialization from fields.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes you are right. Can't think of any reason why it shouldn't be called here as well. Seems like I simply missed it and it did not break any tests.
result.insert(lookup.account_key); | ||
}); | ||
} | ||
let sanitized_tx = bank.fully_verify_transaction(tx).unwrap(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Definitely better than it was, as when I wrote this originally I didn't even know lookup tables were a thing - but I'm not 100% convinced this is correct either. If a lookup table is extended in the slot range, then a tx after that could fail to verify this check (bank is looking at beginning of range) and we wouldn't add the associated keys for the transaction.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
patches are welcome but it'll take some good deal of efforts to support this fully....
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
^ agreed. I'm not sure how many people actually use this subcommand - but not sure its worth distracting us over scheduler.
I'd update the comment to include this issue I just made: #30165 😄
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
patches are welcome but it'll take some good deal of efforts to support this fully....
Yeah, it might be that we just have to replay the slot range and add accounts as we see them that way which is a bit of effort.
I hadn't taken that approach initially because we need to keep account state at the beginning of the snapshot range in order to create it, and process of (unpacking + replay + delete + unpacking + minimize + snapshot) seemed too long.
And also the fact ALTs weren't in (to my knowledge) at the time I started on the project as a total Solana noob haha.
With @xiangzhu70's change for keeping and loading from snapshot files will allow us to do this in a timely manner!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd update the comment to include this issue I just made: #30165 😄
thanks for creating issue; just done: a2b8565
With @xiangzhu70's change for keeping and loading from snapshot files will allow us to do this in a timely manner!
Oh sounds promising. :)
a2b8565
to
a25ba03
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Happy with the actual changes, with some nits on comments. Would be nice to get confirmation from @Lichtso that we're not overlooking a reason we shouldn't fill missing sysvar on loading from fields.
@@ -1912,6 +1912,7 @@ impl Bank { | |||
additional_builtins, | |||
debug_do_not_add_builtins, | |||
); | |||
bank.fill_missing_sysvar_cache_entries(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it; it does seem like it was an oversight from the previous PRs on just not including this code on the initialization from fields.
Co-authored-by: Andrew Fitzgerald <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
* Add address lookup tables to minimized snapshot * Add comment for future posterity * Add reference to the issue * Adjust comment a bit Co-authored-by: Andrew Fitzgerald <[email protected]> --------- Co-authored-by: Andrew Fitzgerald <[email protected]> (cherry picked from commit 3e6162e)
}); | ||
entries.into_par_iter().for_each(|entry| { | ||
entry.transactions.into_iter().for_each(|tx| { | ||
if let Some(lookups) = tx.message.address_table_lookups() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@t-nelson here
* Add address lookup tables to minimized snapshot * Add comment for future posterity * Add reference to the issue * Adjust comment a bit Co-authored-by: Andrew Fitzgerald <[email protected]> --------- Co-authored-by: Andrew Fitzgerald <[email protected]>
Problem
Currently, we can't create a minimized snapshot for mainnet-beta due to wide-spread use of address-lookup-table txes which is currently-unsupported for the minimized snapshot.
Summary of Changes
Support it. Needed bunch of seemingly unrelated changes...