Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

les: move client pool to les/vflux/server #22495

Merged
merged 27 commits into from
Apr 6, 2021

Conversation

zsfelfoldi
Copy link
Contributor

@zsfelfoldi zsfelfoldi commented Mar 15, 2021

This PR refactors the client peer logic for LES servers similarly to how server peer logic was recently refactored on the client side; now all complex peer logic is separated from the protocol implementation and implemented in the vflux subpackages. The NodeStateMachine is only used in these packages and all references have been removed from package les where a very simple serverPeerSet and clientPeerSet are representing the currently active peers. These changes also prepare the codebase in order to make the next steps (specifically the capacity update logic of les/5) easier to deploy.
A fuzzer for vfs.ClientPool is also implemented in this PR which covers most of the vfs package. This fuzzer does much more meaningful testing of the peer logic than fuzzing the NodeStateMachine alone.

Note: vfs.BalanceTracker.BalanceOperation allows atomic balance operations in the sense that the in-memory peer structure will not change between operations. The database storage of the changed balance is not atomic yet, this will be implemented in the next PR that adds the token sale mechanism. This is where this feature will really make sense because the ether balance and token balance needs to be updated atomically.

@ethereum ethereum deleted a comment from samlavery Mar 24, 2021
@ethereum ethereum deleted a comment from samlavery Mar 24, 2021
@ethereum ethereum deleted a comment from samlavery Mar 24, 2021
@ethereum ethereum deleted a comment from samlavery Mar 24, 2021
@@ -224,13 +216,13 @@ func (pp *PriorityPool) RequestCapacity(node *enode.Node, targetCap uint64, bias
_, minPriority = pp.enforceLimits()
// if capacity update is possible now then minPriority == math.MinInt64
// if it is not possible at all then minPriority == math.MaxInt64
allowed = priority > minPriority
allowed = priority >= minPriority
Copy link
Member

@rjl493456442 rjl493456442 Mar 25, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible to return the update list from the enforceLimits for further processing? It's weird to save it in the global priority queue object.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe it could be made nicer, I need to think a bit about it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now I think that the temporary state does make sense because in some cases I do call enforceLimits multiple times and decide the next action based on the temporary state resulting from the previous iteration. It was probably hard to follow though how the temporary state works so I added a commit that does some supposedly identical changes that hopefully makes it cleaner. Now the capacity field is not changed until the temp state is committed and there is a tempCapacity so it's always clear what we are operating on.

les/server.go Outdated Show resolved Hide resolved
@@ -94,7 +96,11 @@ func (s *sstack) Len() int {
// Compares the priority of two elements of the stack (higher is first).
// Required by sort.Interface.
func (s *sstack) Less(i, j int) bool {
return (s.blocks[i/blockSize][i%blockSize].priority - s.blocks[j/blockSize][j%blockSize].priority) > 0
a, b := s.blocks[i/blockSize][i%blockSize].priority, s.blocks[j/blockSize][j%blockSize].priority
if s.wrapAround {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the difference between the a>b and a-b>0?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The way they handle numeric overflows. For example, MaxInt64 > MinInt64 but MaxInt64-MinInt64 < 0. If we use the latter type of comparison then the individual priority values are allowed to overflow and "wrap around" the 64 bit range many times as long as the biggest difference between two priorities in the queue at any moment can be expressed in the int64 range. flowcontrol.ClientManager is a good example for this mode where there is a cumulative "buffer recharge integrator" that is continuously being increased and does overflow. rcQueue item priorities are all calculated relative to this value and the difference between any two item priorities fits safely into int64. The first (free clients only) client pool design also did take advantage of this feature but the current priority pool scheme uses absolute values.
This "wrap around" feature requires care to use and until now all prque.Prque-s operated in this mode. Now it's only used if the queue is explicitly created with the NewWrapAround constructor.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks dangerous to me. Firstly it introduces this trick into stack, prque very implicitly. Besides are we sure that flowcontrol.ClientManager can handle the overflow properly?

cm.rcLastIntValue += int64(bonusRatio * float64(dt))

It's indeed increased all the time. But when it's overflow, all the relative calculations will be affected.

For example

dtNext := mclock.AbsTime(float64(rcqNode.rcFullIntValue-cm.rcLastIntValue) / bonusRatio)

This value can be a large negative value if the rcqNode.rcFullIntValue is overflowed.

// check whether it has already finished
		dtNext := mclock.AbsTime(float64(rcqNode.rcFullIntValue-cm.rcLastIntValue) / bonusRatio)
		if dt < dtNext {
			// not finished yet, put it back, update integrator according
			// to current bonusRatio and return
			cm.rcQueue.Push(rcqNode, -rcqNode.rcFullIntValue)
			cm.rcLastIntValue += int64(bonusRatio * float64(dt))
			return
		}
		lastUpdate += dtNext
		// finished recharging, update corrBufValue and sumRecharge if necessary and do next step
		if rcqNode.corrBufValue < int64(rcqNode.params.BufLimit) {
			rcqNode.corrBufValue = int64(rcqNode.params.BufLimit)
			cm.sumRecharge -= rcqNode.params.MinRecharge
		}
		cm.rcLastIntValue = rcqNode.rcFullIntValue

Then the lastUpdate will also be the large negative.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO, we should fix the overflow in the flowcontrol manager instead of introducing the tricks into the common data structure.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR reverts to the non-tricky version in most of the cases and does not change the behavior of the client manager. I will change client manager's priority calculation and get rid of thip wrap-around trick altogether in a next PR.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@holiman
Copy link
Contributor

holiman commented Mar 30, 2021

@alisscco please stop adding random approvals to PRs

les/vflux/server/prioritypool.go Outdated Show resolved Hide resolved
les/vflux/server/prioritypool.go Outdated Show resolved Hide resolved
Copy link
Member

@rjl493456442 rjl493456442 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@zsfelfoldi
Copy link
Contributor Author

@fjl PTAL at the GetState function in NodeStateMachine. It works similarly to the already existing GetField.

@fjl fjl self-assigned this Apr 6, 2021
@fjl fjl modified the milestone: 1.10.2 Apr 6, 2021
@zsfelfoldi zsfelfoldi merged commit 2d89fe0 into ethereum:master Apr 6, 2021
atif-konasl pushed a commit to frozeman/pandora-execution-engine that referenced this pull request Oct 15, 2021
* les: move client pool to les/vflux/server

* les/vflux/server: un-expose NodeBalance, remove unused fn, fix bugs

* tests/fuzzers/vflux: add ClientPool fuzzer

* les/vflux/server: fixed balance tests

* les: rebase fix

* les/vflux/server: fixed more bugs

* les/vflux/server: unexported NodeStateMachine fields and flags

* les/vflux/server: unexport all internal components and functions

* les/vflux/server: fixed priorityPool test

* les/vflux/server: polish balance

* les/vflux/server: fixed mutex locking error

* les/vflux/server: priorityPool bug fixed

* common/prque: make Prque wrap-around priority handling optional

* les/vflux/server: rename funcs, small optimizations

* les/vflux/server: fixed timeUntil

* les/vflux/server: separated balance.posValue and negValue

* les/vflux/server: polish setup

* les/vflux/server: enforce capacity curve monotonicity

* les/vflux/server: simplified requestCapacity

* les/vflux/server: requestCapacity with target range, no iterations in SetCapacity

* les/vflux/server: minor changes

* les/vflux/server: moved default factors to balanceTracker

* les/vflux/server: set inactiveFlag in priorityPool

* les/vflux/server: moved related metrics to vfs package

* les/vflux/client: make priorityPool temp state logic cleaner

* les/vflux/server: changed log.Crit to log.Error

* add vflux fuzzer to oss-fuzz

Co-authored-by: rjl493456442 <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants