DR6 performance issues #7753

xtrapower · 2020-01-10T16:28:33Z

Problem

Our performance in the recent DryRun6 was bad and we ended up unhealthy during most of the 'Ramp TPS' rounds. Currently, I'm at a loss for what might have caused us (Staking Facilities) missing so many slots.

I believe our machine isn't the problem (32 core, 96GB RAM, NVMe storage, 3x 2080Ti). I ran benchmarks today (v0.21.5) and was able to squeeze out 150k Max TPS with almost 90k sustained average TPS. No additional software was running on that machine & I stopped RPC'ing the node early on.

So basically, peering/latency/networking issues remain as a potential error cause. Our machine is co-located in an Equinix DC with a 100Mbit connection which can be bursted without limit though - according to Equinix.

Our log files: Google Drive

Let's look at epoch 82 (a Ramp TPS round). We were scheduled for 64 slots and missed 22 (see epoch82.log). I think 8 missed slots can be attributed to issue #7588

But our validator missed quite a lot of its slots on its own and contributed the the above problem. A typical pattern seems to be that it produces 1-2 slots and then misses the other 3-4. This indicates that we were timed out by the next leader. Why? And what can we do about it?

Two examples from epoch82.log:

312848 55nmQ8gdWpNW5tLPoBPsqDkLm1W24cmY5DbMMXZKSP8U
312849 55nmQ8gdWpNW5tLPoBPsqDkLm1W24cmY5DbMMXZKSP8U SKIPPED
312850 55nmQ8gdWpNW5tLPoBPsqDkLm1W24cmY5DbMMXZKSP8U SKIPPED
312851 55nmQ8gdWpNW5tLPoBPsqDkLm1W24cmY5DbMMXZKSP8U SKIPPED

312516 55nmQ8gdWpNW5tLPoBPsqDkLm1W24cmY5DbMMXZKSP8U
312517 55nmQ8gdWpNW5tLPoBPsqDkLm1W24cmY5DbMMXZKSP8U
312518 55nmQ8gdWpNW5tLPoBPsqDkLm1W24cmY5DbMMXZKSP8U SKIPPED
312519 55nmQ8gdWpNW5tLPoBPsqDkLm1W24cmY5DbMMXZKSP8U SKIPPED

Looking at the validator log file, can you tell what went wrong in those two examples?

The text was updated successfully, but these errors were encountered:

xtrapower · 2020-01-14T14:48:41Z

Appearently our node scored pretty good in the 'TdS DR6 winner-tool results'

That seems counterintuitive to what the logs suggest (being timed out by the next leader)

stale · 2021-01-30T04:03:43Z

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs.

stale · 2021-02-07T18:22:16Z

This stale issue has been automatically closed. Thank you for your contributions.

mvines added this to the Supertubes v0.22.3 milestone Jan 10, 2020

mvines modified the milestones: Supertubes v0.22.3, Supertubes v0.22.4 Jan 15, 2020

mvines modified the milestones: Supertubes v0.22.4, Supertubes v0.22.5, Tofino v0.23.1, The Future! Jan 25, 2020

stale bot added the stale [bot only] Added to stale content; results in auto-close after a week. label Jan 30, 2021

stale bot closed this as completed Feb 7, 2021

github-actions bot locked as resolved and limited conversation to collaborators Mar 30, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DR6 performance issues #7753

DR6 performance issues #7753

xtrapower commented Jan 10, 2020 •

edited

Loading

xtrapower commented Jan 14, 2020

stale bot commented Jan 30, 2021

stale bot commented Feb 7, 2021

DR6 performance issues #7753

DR6 performance issues #7753

Comments

xtrapower commented Jan 10, 2020 • edited Loading

Problem

xtrapower commented Jan 14, 2020

stale bot commented Jan 30, 2021

stale bot commented Feb 7, 2021

xtrapower commented Jan 10, 2020 •

edited

Loading