Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Restore DEFAULT_TICKS_PER_SLOT to 8 for a ~800ms voting interval #2675

Closed
mvines opened this issue Feb 6, 2019 · 7 comments · Fixed by #2766
Closed

Restore DEFAULT_TICKS_PER_SLOT to 8 for a ~800ms voting interval #2675

mvines opened this issue Feb 6, 2019 · 7 comments · Fixed by #2766
Assignees

Comments

@mvines
Copy link
Member

mvines commented Feb 6, 2019

#2563 set DEFAULT_TICKS_PER_SLOT = 32 to stabilize integration tests and solana-web3.js tests.

Debug why the clients can't handle a leader bouncing around every 800ms, fix it then revert 979ae88

@mvines mvines added this to the v0.12 Beacons milestone Feb 6, 2019
@mvines
Copy link
Member Author

mvines commented Feb 7, 2019

The ./ci/test-stable-perf.sh CI job appears to be a little flaky due to this issue, as well as the nightly iterations tests (./ci/iterations-localnet.sh)

@mvines
Copy link
Member Author

mvines commented Feb 7, 2019

ci/localnet-sanity.sh -b -i 100 is a great way to easily reproduce, even with DEFAULT_TICKS_PER_SLOT at 32.

@mvines
Copy link
Member Author

mvines commented Feb 8, 2019

Playing a little with the STR at #2693, I see that the leader bank processes the transaction but then fails to record with PohRecorderError(MaxHeightReached) since it rotated out while the bank was processing. In this case it seems like the former-leader could attempt to forward the transactions to the new leader for re-processing

@mvines
Copy link
Member Author

mvines commented Feb 8, 2019

I have a WIP patch that makes the banking_stage forward any remaining inflight transactions to the new leader on PohRecorderError(MaxHeightReached) failures. Logging suggests that it's working fine and inflight transaction is successfully reprocessed. But still #2693 reproduces so seems like there's at least one more problem here. 🕳

@mvines
Copy link
Member Author

mvines commented Feb 8, 2019

Looks like with a low ticks_per_slot value the replay stage never gets a chance to run. Without the replay stage running the TVU bank is not updated, and the RPC API responds from the TVU bank.

@mvines
Copy link
Member Author

mvines commented Feb 12, 2019

Progress at #2730, but #2735 demonstrates another issue with low ticks_per_slot. Once that PR is resolved, need to revisit #2675 (comment) at minimum (but perhaps there'll be more problems lingering here too, we'll see)

@mvines
Copy link
Member Author

mvines commented Feb 13, 2019

Leader rotation with ticks_per_slot=1 seems to be in decent shape now, per multinode tests. Next up: start sending transactions at a cluster rotating on every tick...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant