Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

v1.16: removes outdated check for merkle shreds (backport of #33088) #33136

Merged
merged 4 commits into from
Sep 5, 2023

Conversation

mergify[bot]
Copy link
Contributor

@mergify mergify bot commented Sep 4, 2023

This is an automatic backport of pull request #33088 done by Mergify.

Motivation for v1.16 Backport

ShredFetchStage implements some basic verification in modify_packets. Some of this verification currently relies on having a (root) bank in order to check feature status as well as to check number of slots in an epoch (number of slots in an epoch could be calculated and then Arc<Bank> could be dropped). The root bank is fetched here:

let (mut root_bank, mut last_slot) = {
let bank_forks_r = bank_forks.read().unwrap();
(bank_forks_r.root_bank(), bank_forks_r.highest_slot())

At the time that the bank is fetched, the node just unpacked its' snapshot. So, this initial root bank corresponds to the load-snapshot-slot. Later in the function, we currently use a blocking method to pull shred out of the crossbeam receiver:

for mut packet_batch in recvr {

Since we're not getting any packets from TVU forwards, execution is blocked waiting for a shred to show up, and the following code that updates the root bank is never executed:

let bank_forks_r = bank_forks.read().unwrap();
root_bank = bank_forks_r.root_bank();

As a result, the initial root bank at startup (ie the snapshot slot) is held for the duration of the process. Holding this bank is a waste of resources and could interfere with background cleanup that AccountsDB would otherwise be performing.

In #33078, I identified a very similar problem on master with the QUIC receiver. Instead of adjusting the impl, we realized that we could cleanup the code that needed the root bank. Hence, we shipped #33088 to remove outdated feature status check, and then #33105 to remove holding the root bank altogether.

Additional v1.16 Backport

In order to fully fix the bug of the process holding the initial root bank for process lifetime, we will also need a backport of #33105. I plan on pushing that BP after this one lands. I'll also add that I recently performed some separate cleanup on that function in #33066. If we want to optimize for 1-to-1 mapping between master and v1.16 commits, backporting that one first will make #33105 go in cleaner.

Same bug exists on v1.14

Lastly, this same problem exists on v1.14. At the time of writing this, we're on the cusp of upgrading mnb to v1.16 pending some debugging efforts. This isn't a critical bug (ie it will not crash nodes) so I think we can probably skip the v1.14 backport. However, this bank being present will obfuscate other potential debugging efforts such as mem profiling.


Cherry-pick of 1431275 has failed:

On branch mergify/bp/v1.16/pr-33088
Your branch is up to date with 'origin/v1.16'.

You are currently cherry-picking commit 1431275328.
  (fix conflicts and run "git cherry-pick --continue")
  (use "git cherry-pick --skip" to skip this patch)
  (use "git cherry-pick --abort" to cancel the cherry-pick operation)

Changes to be committed:
	modified:   ledger/src/shred.rs
	modified:   sdk/src/feature_set.rs

Unmerged paths:
  (use "git add <file>..." to mark resolution)
	both modified:   core/src/cluster_nodes.rs
	both modified:   core/src/shred_fetch_stage.rs

To fix up this pull request, you can check it out locally. See documentation: https://docs.github.com/en/github/collaborating-with-pull-requests/reviewing-changes-in-pull-requests/checking-out-pull-requests-locally


Mergify commands and options

More conditions and actions can be found in the documentation.

You can also trigger Mergify actions by commenting on this pull request:

  • @Mergifyio refresh will re-evaluate the rules
  • @Mergifyio rebase will rebase this PR on its base branch
  • @Mergifyio update will merge the base branch into this PR
  • @Mergifyio backport <destination> will backport this PR on <destination> branch

Additionally, on Mergify dashboard you can:

  • look at your merge queues
  • generate the Mergify configuration with the config editor.

Finally, you can contact us on https://mergify.com

(cherry picked from commit 1431275)

# Conflicts:
#	core/src/cluster_nodes.rs
#	core/src/shred_fetch_stage.rs
Steven Czabaniuk added 3 commits September 4, 2023 22:24
This is more proper and was cleaned up in master, but doing minimal
cleanup to keep BP diff minimal.
@steviez steviez assigned steviez and unassigned behzadnouri Sep 4, 2023
@codecov
Copy link

codecov bot commented Sep 4, 2023

Codecov Report

Merging #33136 (208d80b) into v1.16 (0c211bd) will increase coverage by 0.0%.
The diff coverage is 66.6%.

@@           Coverage Diff           @@
##            v1.16   #33136   +/-   ##
=======================================
  Coverage    81.9%    81.9%           
=======================================
  Files         762      762           
  Lines      207829   207798   -31     
=======================================
+ Hits       170370   170393   +23     
+ Misses      37459    37405   -54     

@steviez
Copy link
Contributor

steviez commented Sep 5, 2023

Double checked that this branch was feature "compatible" (ie same feature set id) with the last couple v1.16 releases:

$ git status
On branch mergify/bp/v1.16/pr-33088
Your branch is up to date with 'origin/mergify/bp/v1.16/pr-33088'.
$ cargo run --release -- feature status
    Finished release [optimized] target(s) in 0.54s
     Running `/home/sol/src/solana/target/release/solana feature status`
...
Tool Feature Set: 3949673676
Software Version                                                        Feature Set   Stake     RPC
1.17.0                                                                   4115909905   0.02%   3.14%  
1.17.0                                                                    461617547   0.00%   0.31%  
1.16.13, 1.16.12, 1.16.11                                                3949673676   9.24%   6.60%  <-- me
...

@steviez steviez merged commit ea03d6c into v1.16 Sep 5, 2023
@steviez steviez deleted the mergify/bp/v1.16/pr-33088 branch September 5, 2023 20:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants