Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Full sync with logging for very slow state block commits #4952

Closed
wants to merge 6 commits into from

Conversation

teor2345
Copy link
Contributor

@teor2345 teor2345 commented Aug 25, 2022

Motivation

We want to find out which parts of state block commits are slow in CI, because we can't reproduce them locally.

These are diagnostics for ticket #4937, where some blocks take 15 minutes to write to the state.

Solution

  • change info threshold to 5 minutes and warn to 9 minutes
  • add slow code logging for block commits, block writes, and note commitment tree updates

Review

This PR doesn't need review.

Reviewer Checklist

  • Logging works

Follow Up Work

Read the logs to work out why #4937 is happening

@teor2345 teor2345 added C-enhancement Category: This is an improvement P-High 🔥 I-slow Problems with performance or responsiveness A-diagnostics Area: Diagnosing issues or monitoring performance A-state Area: State / database changes labels Aug 25, 2022
@teor2345 teor2345 self-assigned this Aug 25, 2022
@teor2345 teor2345 requested a review from a team as a code owner August 25, 2022 22:04
@teor2345 teor2345 requested review from arya2 and removed request for a team August 25, 2022 22:04
arya2
arya2 previously approved these changes Aug 26, 2022
Copy link
Contributor

@arya2 arya2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me

zebra-state/src/service/tests.rs Outdated Show resolved Hide resolved
@codecov
Copy link

codecov bot commented Aug 29, 2022

Codecov Report

Merging #4952 (89c8cad) into main (c322533) will increase coverage by 0.01%.
The diff coverage is 100.00%.

@@            Coverage Diff             @@
##             main    #4952      +/-   ##
==========================================
+ Coverage   79.20%   79.21%   +0.01%     
==========================================
  Files         310      310              
  Lines       38883    39073     +190     
==========================================
+ Hits        30796    30952     +156     
- Misses       8087     8121      +34     

Copy link
Contributor

@oxarbitrage oxarbitrage left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code looks good for sure but i dont think we should be merging this things into the main branch.

This seems more suitable for a local test or for a specific branch outside main.

@teor2345
Copy link
Contributor Author

The code looks good for sure but i dont think we should be merging this things into the main branch.

This seems more suitable for a local test or for a specific branch outside main.

Unfortunately, I can't reproduce this issue on my local machine, so I need to add this debugging to the main branch.
Once we know what the issue is, we can remove it.

@oxarbitrage
Copy link
Contributor

Sorry, still dont understand why you need to add it to the main branch. Can't you run the CI without merging ?

@teor2345
Copy link
Contributor Author

Sorry, still dont understand why you need to add it to the main branch. Can't you run the CI without merging ?

I'm not sure if it will work, but I can try.

I'll open a new PR to disable the logs for #4932, because that needs to go into main. Then I can make this PR and draft and do a full sync on it.

@teor2345 teor2345 removed the request for review from arya2 August 30, 2022 05:00
@teor2345 teor2345 marked this pull request as draft August 30, 2022 05:00
@teor2345 teor2345 changed the title change(log): Add logging for very slow state block commits Full sync with logging for very slow state block commits Aug 30, 2022
@teor2345
Copy link
Contributor Author

I'm running a full sync here with these extra logs:
https://github.com/ZcashFoundation/zebra/actions/runs/2953663804

@teor2345
Copy link
Contributor Author

@Mergifyio update

@mergify
Copy link
Contributor

mergify bot commented Aug 30, 2022

update

✅ Branch has been successfully updated

@teor2345
Copy link
Contributor Author

Let's try that full sync again, after the concurrency fixes in PR #5000:
https://github.com/ZcashFoundation/zebra/actions/runs/2959955933

@teor2345
Copy link
Contributor Author

teor2345 commented Sep 1, 2022

The logs I added didn't actually show any extra information, which is a bit of a surprise. It's possible we're just committing a large number of slow blocks at a time.

I added some more logging and started another full sync:
https://github.com/ZcashFoundation/zebra/actions/runs/2968581615

@teor2345
Copy link
Contributor Author

teor2345 commented Sep 2, 2022

It appears that committing blocks hangs because we sometimes commit over 1000 blocks at the same time, then return the result to the state caller (checkpointer or non-finalized state block committer):

2022-09-01T22:50:25.650944Z WARN state: run time: very long queue_and_commit_finalized: 1774457..=1775638, 1181 blocks time=11m 24s module=zebra_state::service::finalized_state line=173
2022-09-01T22:50:25.650981Z WARN state: run time: very long CommitFinalizedBlock time=11m 24s module=zebra_state::service line=697

https://github.com/ZcashFoundation/zebra/runs/8147047219?check_suite_focus=true#step:6:165

I'll update ticket #4937 with the details.

@teor2345
Copy link
Contributor Author

teor2345 commented Sep 2, 2022

This PR has achieved its goals, but we'll need the extra logging to test any fixes.

@teor2345
Copy link
Contributor Author

teor2345 commented Sep 5, 2022

This PR was only used for testing slow block commits, we can re-open it if we need to re-test.

@teor2345 teor2345 closed this Sep 5, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-diagnostics Area: Diagnosing issues or monitoring performance A-state Area: State / database changes C-enhancement Category: This is an improvement I-slow Problems with performance or responsiveness
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants