state_sync tests fail with RPC timeouts #3129

SkidanovAlex · 2020-08-11T17:43:11Z

Rather consistently, e.g:

state_sync.py: http://52.149.162.182:3000/#/test/7604
state_sync_routed.py: http://52.149.162.182:3000/#/test/7611
state_sync_late.py: http://52.149.162.182:3000/#/test/7612

…t the head update, and test fixes Making doomslug wait for 600ms from the last time the node sent an endorsement, not from the head update. Under load this will remove unnecessary idleness. This change breaks catchup tests that do not use doomslug, because they rely on blocks being produced without skips and forks, and without doomslug and the wait not starting from the head update the timing is less predictable. I address it by making the tests actually use doomslug (and upping the block prod times where necessary). Since they rely on having no skips and forks, there's no reason not to use doomslug. For the only test that had two separate versions with and without doomslug, I removed the version without. Separately and unrelatedly to the above, removed backtrace from the formatted client errors. On my (very beefy) machine fetching the backtrace takes 800ms, and on nayduck runners it was more than 2s. Since we format errors in many places (including responces to RPC, though that will change), such delays are not acceptable. This fixes state_sync tests. Fixes: #3139, #3129 Test plan: ---------- For #3139: http://nayduck.eastus.cloudapp.azure.com:3000/#/run/88 For #3129: TODO Catchup tests are split accross many runs, but I ensured they are not flaky after this change.

…t the head update, and test fixes Making doomslug wait for 600ms from the last time the node sent an endorsement, not from the head update. Under load this will remove unnecessary idleness. This change breaks catchup tests that do not use doomslug, because they rely on blocks being produced without skips and forks, and without doomslug and the wait not starting from the head update the timing is less predictable. I address it by making the tests actually use doomslug (and upping the block prod times where necessary). Since they rely on having no skips and forks, there's no reason not to use doomslug. For the only test that had two separate versions with and without doomslug, I removed the version without. Separately and unrelatedly to the above, removed backtrace from the formatted client errors. On my (very beefy) machine fetching the backtrace takes 800ms, and on nayduck runners it was more than 2s. Since we format errors in many places (including responces to RPC, though that will change), such delays are not acceptable. This fixes state_sync tests. Fixes: #3139, #3129 Test plan: ---------- For #3139: http://nayduck.eastus.cloudapp.azure.com:3000/#/run/88 For #3129: (still running, will update the description) Catchup tests are split accross many runs, but I ensured they are not flaky after this change.

…t the head update, and test fixes (#3164) Making doomslug wait for 600ms from the last time the node sent an endorsement, not from the head update. Under load this will remove unnecessary idleness. This change breaks catchup tests that do not use doomslug, because they rely on blocks being produced without skips and forks, and without doomslug and the wait not starting from the head update the timing is less predictable. I address it by making the tests actually use doomslug (and upping the block prod times where necessary). Since they rely on having no skips and forks, there's no reason not to use doomslug. For the only test that had two separate versions with and without doomslug, I removed the version without. Separately and unrelatedly to the above, removed backtrace from the formatted client errors. On my (very beefy) machine fetching the backtrace takes 800ms, and on nayduck runners it was more than 2s. Since we format errors in many places (including responces to RPC, though that will change), such delays are not acceptable. This fixes state_sync tests. Fixes: #3139, #3129 Test plan: ---------- For #3139: http://nayduck.eastus.cloudapp.azure.com:3000/#/run/88 For #3129: (still running, will update the description) Catchup tests are split accross many runs, but I ensured they are not flaky after this change.

SkidanovAlex added the A-chain Area: Chain, client & related label Aug 11, 2020

SkidanovAlex self-assigned this Aug 12, 2020

weekly-digest bot mentioned this issue Aug 14, 2020

Weekly Digest (7 August, 2020 - 14 August, 2020) #3163

Closed

SkidanovAlex mentioned this issue Aug 14, 2020

feat: Doomslug endorsement delay starts from the last endorsement, no… #3164

Merged

SkidanovAlex closed this as completed Aug 14, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

state_sync tests fail with RPC timeouts #3129

state_sync tests fail with RPC timeouts #3129

SkidanovAlex commented Aug 11, 2020

state_sync tests fail with RPC timeouts #3129

state_sync tests fail with RPC timeouts #3129

Comments

SkidanovAlex commented Aug 11, 2020