Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

state_sync tests fail with RPC timeouts #3129

Closed
SkidanovAlex opened this issue Aug 11, 2020 · 0 comments
Closed

state_sync tests fail with RPC timeouts #3129

SkidanovAlex opened this issue Aug 11, 2020 · 0 comments
Assignees
Labels
A-chain Area: Chain, client & related

Comments

@SkidanovAlex
Copy link
Collaborator

Rather consistently, e.g:

state_sync.py: http://52.149.162.182:3000/#/test/7604
state_sync_routed.py: http://52.149.162.182:3000/#/test/7611
state_sync_late.py: http://52.149.162.182:3000/#/test/7612

@SkidanovAlex SkidanovAlex added the A-chain Area: Chain, client & related label Aug 11, 2020
@SkidanovAlex SkidanovAlex self-assigned this Aug 12, 2020
SkidanovAlex added a commit that referenced this issue Aug 14, 2020
…t the head update, and test fixes

Making doomslug wait for 600ms from the last time the node sent an endorsement, not from the head update. Under load this will remove unnecessary idleness.
This change breaks catchup tests that do not use doomslug, because they rely on blocks being produced without skips and forks, and without doomslug and the wait not starting from the head update the timing is less predictable.
I address it by making the tests actually use doomslug (and upping the block prod times where necessary). Since they rely on having no skips and forks, there's no reason not to use doomslug.
For the only test that had two separate versions with and without doomslug, I removed the version without.

Separately and unrelatedly to the above, removed backtrace from the formatted client errors. On my (very beefy) machine fetching the backtrace takes 800ms, and on nayduck runners it was more than 2s. Since we format errors in many places (including responces to RPC, though that will change), such delays are not acceptable.
This fixes state_sync tests.

Fixes: #3139, #3129

Test plan:
----------
For #3139: http://nayduck.eastus.cloudapp.azure.com:3000/#/run/88
For #3129: TODO
Catchup tests are split accross many runs, but I ensured they are not flaky after this change.
SkidanovAlex added a commit that referenced this issue Aug 14, 2020
…t the head update, and test fixes

Making doomslug wait for 600ms from the last time the node sent an endorsement, not from the head update. Under load this will remove unnecessary idleness.
This change breaks catchup tests that do not use doomslug, because they rely on blocks being produced without skips and forks, and without doomslug and the wait not starting from the head update the timing is less predictable.
I address it by making the tests actually use doomslug (and upping the block prod times where necessary). Since they rely on having no skips and forks, there's no reason not to use doomslug.
For the only test that had two separate versions with and without doomslug, I removed the version without.

Separately and unrelatedly to the above, removed backtrace from the formatted client errors. On my (very beefy) machine fetching the backtrace takes 800ms, and on nayduck runners it was more than 2s. Since we format errors in many places (including responces to RPC, though that will change), such delays are not acceptable.
This fixes state_sync tests.

Fixes: #3139, #3129

Test plan:
----------
For #3139: http://nayduck.eastus.cloudapp.azure.com:3000/#/run/88
For #3129: (still running, will update the description)
Catchup tests are split accross many runs, but I ensured they are not flaky after this change.
bowenwang1996 pushed a commit that referenced this issue Aug 14, 2020
…t the head update, and test fixes (#3164)

Making doomslug wait for 600ms from the last time the node sent an endorsement, not from the head update. Under load this will remove unnecessary idleness.
This change breaks catchup tests that do not use doomslug, because they rely on blocks being produced without skips and forks, and without doomslug and the wait not starting from the head update the timing is less predictable.
I address it by making the tests actually use doomslug (and upping the block prod times where necessary). Since they rely on having no skips and forks, there's no reason not to use doomslug.
For the only test that had two separate versions with and without doomslug, I removed the version without.

Separately and unrelatedly to the above, removed backtrace from the formatted client errors. On my (very beefy) machine fetching the backtrace takes 800ms, and on nayduck runners it was more than 2s. Since we format errors in many places (including responces to RPC, though that will change), such delays are not acceptable.
This fixes state_sync tests.

Fixes: #3139, #3129

Test plan:
----------
For #3139: http://nayduck.eastus.cloudapp.azure.com:3000/#/run/88
For #3129: (still running, will update the description)
Catchup tests are split accross many runs, but I ensured they are not flaky after this change.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-chain Area: Chain, client & related
Projects
None yet
Development

No branches or pull requests

1 participant