Skip to content
This repository has been archived by the owner on Nov 6, 2020. It is now read-only.

Node permission contract enabling leads to CPU overload and peers loosing #10121

Closed
VladLupashevskyi opened this issue Jan 2, 2019 · 3 comments
Assignees
Labels
F3-annoyance 💩 The client behaves within expectations, however this “expected behaviour” itself is at issue. F7-optimisation 💊 An enhancement to provide better overall performance in terms of time-to-completion for a task. M4-core ⛓ Core client code / Rust.
Milestone

Comments

@VladLupashevskyi
Copy link
Contributor

  • Parity Ethereum version: 1.11.11 and 2.1.10
  • Operating system: MacOS / Linux
  • Installation: homebrew and one-line installer
  • Fully synchronized: yes
  • Restarted: yes

After setting up the chain with node permission contract all parity nodes are loosing all peers within a couple of hours. The contract allows access only to few enode addresses and has the following logic:

function connectionAllowed(bytes32 /*sl*/, bytes32 /*sh*/, bytes32 pl, bytes32 ph) public view returns (bool) {
        bool plAllowed = peerAllowedL[pl];
        bool phAllowed = peerAllowedH[ph];
        return plAllowed && phAllowed;
}

While debugging this issue I found out that when parity has zero peers, one of threads eats 100% CPU, which means that parity does not have enough time to react on trying to connect nodes (which was learned here from similar issues about 0 peers).

After removing nodePermissionContract from the json spec - the connections get stabilized as well as CPU usage.

I went further to the source code and tried to figure out what's going wrong.

I have removed the caching part from the code (LruCache) and the node started to work stable even with node permission contract enabled:

https://github.com/paritytech/parity-ethereum/blob/c077dc652d92535cff0afc9ae096c795c4cf39e3/ethcore/node-filter/src/lib.rs#L87-L90

I do not have too much experience with caching, but it seems to me that either the problem starts to appear when cache gets full or some kind of deadlock happens.

I would add a boot flag like --disable-node-permission-caching in order to provide a quick fix for blockchains that currently rely on this contract.

What do you think about that?

P.S.: During debugging I found out that the connection allowed check happens BEFORE network ID checking, which leads to huge amount of unnecessary calls to the contract. Was this done intentionally? I would consider move the check to the place when network ID is already checked, if that is possible of course!

@jam10o-new jam10o-new added F7-optimisation 💊 An enhancement to provide better overall performance in terms of time-to-completion for a task. F3-annoyance 💩 The client behaves within expectations, however this “expected behaviour” itself is at issue. M4-core ⛓ Core client code / Rust. labels Jan 2, 2019
@jam10o-new jam10o-new added this to the 2.4 milestone Jan 2, 2019
@jam10o-new
Copy link
Contributor

cc @grbIzl?

@tomusdrw
Copy link
Collaborator

tomusdrw commented Jan 2, 2019

@VladLupashevskyi

  1. Moving the check after Status message seems reasonable to me, I think we don't exchange any info with uncofirmed peers anyway.
  2. Regarding the cache - I think the issue is caused by the Mutex - it requires exclusive access to check the cache, while the rest of the code could easily run in parallel. Instead of disalbing the cache via CLI flag I would either:
    2.1 Disable it completely
    2.2. Introduce a different cache behind RwLock:
cache: RwLock<BTreeMap<BlockNumber, HashSet<ConnectionIds>>>,

This would allow us to easily drop entries for old blocks and also drop random entries from HashSet shall it grow too big.
Perhaps we could even parametrize this and allow the cache to persist for multiple blocks (i.e. the on-chain changes would be allowed to take action with a delay). In your case this could be set to infiity drastically increasing the cache hit ratio.

@grbIzl
Copy link
Collaborator

grbIzl commented Jan 7, 2019

I have created a PR for removing cache, see #10143
I don't see the significant difference with proposed by @tomusdrw structure (we will still need read and write lock per call).
As for moving this check behind STATUS packet handling: I understand the reasoning, that it will improve the performance. At the same time, it won't comply with general idea of connection filter. Because it's a CONNECTION filter :-) Status packet exchange happens AFTER connection's establishment. I can't predict what happens or can happen after establishing the connection but before STATUS exchange. In general we want to prevent any exchange with such filter.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
F3-annoyance 💩 The client behaves within expectations, however this “expected behaviour” itself is at issue. F7-optimisation 💊 An enhancement to provide better overall performance in terms of time-to-completion for a task. M4-core ⛓ Core client code / Rust.
Projects
None yet
Development

No branches or pull requests

4 participants