Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Besu cannot find peers, Geth runs fine #3939

Closed
Cryptizer69 opened this issue Jun 6, 2022 · 32 comments
Closed

Besu cannot find peers, Geth runs fine #3939

Cryptizer69 opened this issue Jun 6, 2022 · 32 comments
Assignees
Labels
peering TeamRevenant GH issues worked on by Revenant Team

Comments

@Cryptizer69
Copy link

Cryptizer69 commented Jun 6, 2022

I'm running Besu through Rocketpool on a machine with more than enough CPU and bandwidth to spare but it cannot find any peers. Ports are open and when I run Geth it does so without a problem.

Edit: I'm running Besu on Goerli

Screenshot 2022-06-06 at 12 38 40

@poupas
Copy link

poupas commented Jun 6, 2022

I'm having the same issues on Goerli. Mainnet runs fine.

Some potentially helpful logs

2022-06-06 11:37:36.677+00:00 | vert.x-worker-thread-9 | DEBUG | PingPacketData | read PING enr as long scalar
2022-06-06 11:37:36.678+00:00 | vert.x-worker-thread-7 | DEBUG | PingPacketData | read PING enr as long scalar
2022-06-06 11:37:36.723+00:00 | vert.x-worker-thread-16 | DEBUG | PongPacketData | read PONG enr from scalar
2022-06-06 11:37:36.724+00:00 | vert.x-worker-thread-12 | DEBUG | PingPacketData | read PING enr as long scalar
2022-06-06 11:37:36.735+00:00 | nioEventLoopGroup-3-9 | DEBUG | HandshakeHandlerOutbound | Wrote initial crypto handshake message to /124.156.238.97:30303.
2022-06-06 11:37:36.777+00:00 | nioEventLoopGroup-3-3 | DEBUG | HandshakeHandlerOutbound | Wrote initial crypto handshake message to /89.7.22.176:30322.
2022-06-06 11:37:36.835+00:00 | nioEventLoopGroup-3-3 | DEBUG | DeFramer | Invalid incoming message
io.netty.handler.codec.DecoderException: org.hyperledger.besu.ethereum.rlp.RLPException: Expected current item to be a list, but it is: BYTE_ELEMENT (at bytes 0-1: [04])
	at io.netty.handler.codec.ByteToMessageDecoder.callDecode(ByteToMessageDecoder.java:480)
	at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:279)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365)
	at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357)
	at io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1410)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365)
	at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:919)
	at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:166)
	at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:722)
	at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:658)
	at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:584)
	at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:496)
	at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:986)
	at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
	at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
	at java.base/java.lang.Thread.run(Thread.java:833)
Caused by: org.hyperledger.besu.ethereum.rlp.RLPException: Expected current item to be a list, but it is: BYTE_ELEMENT (at bytes 0-1: [04])
	at org.hyperledger.besu.ethereum.rlp.AbstractRLPInput.error(AbstractRLPInput.java:202)
	at org.hyperledger.besu.ethereum.rlp.AbstractRLPInput.enterList(AbstractRLPInput.java:448)
	at org.hyperledger.besu.ethereum.rlp.BytesValueRLPInput.enterList(BytesValueRLPInput.java:23)
	at org.hyperledger.besu.ethereum.rlp.AbstractRLPInput.enterList(AbstractRLPInput.java:432)
	at org.hyperledger.besu.ethereum.rlp.BytesValueRLPInput.enterList(BytesValueRLPInput.java:23)
	at org.hyperledger.besu.ethereum.p2p.rlpx.wire.messages.DisconnectMessage$Data.readFrom(DisconnectMessage.java:83)
	at org.hyperledger.besu.ethereum.p2p.rlpx.wire.messages.DisconnectMessage.getReason(DisconnectMessage.java:61)
	at org.hyperledger.besu.ethereum.p2p.rlpx.connections.netty.DeFramer.decode(DeFramer.java:180)
	at io.netty.handler.codec.ByteToMessageDecoder.decodeRemovalReentryProtection(ByteToMessageDecoder.java:510)
	at io.netty.handler.codec.ByteToMessageDecoder.callDecode(ByteToMessageDecoder.java:449)
	... 17 more
2022-06-06 11:37:36.836+00:00 | nioEventLoopGroup-3-3 | DEBUG | RlpxAgent | Failed to connect to peer 0xb79d86a4ee9887f5f6db2f344534cd5719fe946103ec7c54f159d1b284141692ebd8f3b1e3927225ff983bde2994099b5e2929db7a5bd22613207085c4344044: io.netty.handler.codec.DecoderException: org.hyperledger.besu.ethereum.rlp.RLPException: Expected current item to be a list, but it is: BYTE_ELEMENT (at bytes 0-1: [04])
2022-06-06 11:37:36.842+00:00 | vert.x-worker-thread-5 | DEBUG | PingPacketData | read PING enr as long scalar
2022-06-06 11:37:36.948+00:00 | nioEventLoopGroup-3-6 | DEBUG | AbstractHandshakeHandler | Handshake error:
org.hyperledger.besu.ethereum.p2p.rlpx.handshake.HandshakeException: Decrypting an incoming handshake message failed
	at org.hyperledger.besu.ethereum.p2p.rlpx.handshake.ecies.ECIESHandshaker.handleMessage(ECIESHandshaker.java:211)
	at org.hyperledger.besu.ethereum.p2p.rlpx.connections.netty.HandshakeHandlerInbound.nextHandshakeMessage(HandshakeHandlerInbound.java:60)
	at org.hyperledger.besu.ethereum.p2p.rlpx.connections.netty.AbstractHandshakeHandler.channelRead0(AbstractHandshakeHandler.java:92)
	at org.hyperledger.besu.ethereum.p2p.rlpx.connections.netty.AbstractHandshakeHandler.channelRead0(AbstractHandshakeHandler.java:44)
	at io.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChannelInboundHandler.java:99)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365)
	at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357)
	at io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1410)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365)
	at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:919)
	at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:166)
	at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:722)
	at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:658)
	at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:584)
	at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:496)
	at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:986)
	at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
	at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
	at java.base/java.lang.Thread.run(Thread.java:833)
Caused by: org.bouncycastle.crypto.InvalidCipherTextException: Invalid MAC.
	at org.hyperledger.besu.ethereum.p2p.rlpx.handshake.ecies.ECIESEncryptionEngine.decrypt(ECIESEncryptionEngine.java:277)
	at org.hyperledger.besu.ethereum.p2p.rlpx.handshake.ecies.ECIESEncryptionEngine.decrypt(ECIESEncryptionEngine.java:215)
	at org.hyperledger.besu.ethereum.p2p.rlpx.handshake.ecies.EncryptedMessage.decryptMsgEIP8(EncryptedMessage.java:94)
	at org.hyperledger.besu.ethereum.p2p.rlpx.handshake.ecies.ECIESHandshaker.handleMessage(ECIESHandshaker.java:203)
	... 20 more

Edit: this issue may be related to #3323

@iamhsk iamhsk added the TeamRevenant GH issues worked on by Revenant Team label Jun 6, 2022
@iamhsk
Copy link

iamhsk commented Jun 7, 2022

User reported they were running V.22.4.2 of Besu

@macfarla
Copy link
Contributor

macfarla commented Jun 8, 2022

ok it looks like there was an issue with this https://github.com/ethereum/discv4-dns-lists repo - which normally gets updated daily but was not updated for a few days - has been resolved (at least it was committed to on June 8th)

and I have started a gorli node locally, with latest besu, besu --network=goerli --Xdns-enabled=true and I have peers and started downloading within 10 minutes. so perhaps that was the issue.

@macfarla
Copy link
Contributor

macfarla commented Jun 8, 2022

Also note that 22.4.2 does not include the goerli bootnodes update from #3857

@macfarla
Copy link
Contributor

macfarla commented Jun 8, 2022

@Cryptizer69 and @poupas can you try again and let us know if you're still having this same issue?

@poupas
Copy link

poupas commented Jun 8, 2022

Version: besu/v22.4.2/linux-x86_64/openjdk-java-17
Command (running in Docker container): /opt/besu/bin/besu --network=goerli --Xdns-enabled=true --data-path=/ethclient/besu --rpc-http-enabled --rpc-http-host=0.0.0.0 --rpc-http-port=8545 --rpc-ws-enabled --rpc-ws-host=0.0.0.0 --rpc-ws-port=8546 --host-allowlist=* --revert-reason-enabled --rpc-http-max-active-connections=65536 --data-storage-format=bonsai --sync-mode=X_SNAP --nat-method=docker --p2p-host=<REDACTED> --max-peers=50 --p2p-port=4500 --bonsai-maximum-back-layers-to-load=512

After 10 minutes, Besu seems to be keeping ~ 10 peers. It's currently syncing.

Thank you!

@poupas
Copy link

poupas commented Jun 8, 2022

ok it looks like there was an issue with this https://github.com/ethereum/discv4-dns-lists repo - which normally gets updated daily but was not updated for a few days - has been resolved (at least it was committed to on June 8th)

and I have started a gorli node locally, with latest besu, besu --network=goerli --Xdns-enabled=true and I have peers and started downloading within 10 minutes. so perhaps that was the issue.

Is DNS discovery enabled by default? Rocket Pool does not enable it specifically. Should it?

@Cryptizer69
Copy link
Author

Cryptizer69 commented Jun 8, 2022

@poupas helped me try some commands since he got his setup working. The problem still persists in my setup. Here are some more of the same logs:

eth1_1 | 2022-06-08 14:54:39.365+00:00 | EthScheduler-Timer-0 | INFO | AbstractRetryingPeerTask | No useful peer found, checking remaining current peers for usefulness: 3
eth1_1 | 2022-06-08 14:55:07.203+00:00 | Timer-0 | INFO | DNSTimerTask | Refreshing DNS records with enrtree://AKA3AM6LPBYEUDMVNU3BSVQJ5AD45Y7YPOHJLEF6W26QOE4VTUDPE@all.goerli.ethdisco.net
eth1_1 | 2022-06-08 14:55:43.157+00:00 | nioEventLoopGroup-3-8 | INFO | SyncTargetManager | Found common ancestor with peer Peer 0x860f6852a7890332d5... PeerReputation 100, validated? true, disconnected? false at block 7003909
eth1_1 | 2022-06-08 14:55:59.579+00:00 | EthScheduler-Timer-0 | INFO | AbstractRetryingPeerTask | No useful peer found, checking remaining current peers for usefulness: 1
eth1_1 | 2022-06-08 14:55:59.580+00:00 | EthScheduler-Timer-0 | INFO | AbstractRetryingPeerTask | No useful peer found, checking remaining current peers for usefulness: 1
eth1_1 | 2022-06-08 14:55:59.582+00:00 | EthScheduler-Workers-3 | INFO | AbstractRetryingPeerTask | No useful peer found, checking remaining current peers for usefulness: 1
eth1_1 | 2022-06-08 14:56:00.583+00:00 | EthScheduler-Timer-0 | INFO | AbstractRetryingPeerTask | No useful peer found, checking remaining current peers for usefulness: 1
eth1_1 | 2022-06-08 14:56:04.581+00:00 | EthScheduler-Timer-0 | INFO | AbstractRetryingPeerTask | No useful peer found, checking remaining current peers for usefulness: 2
eth1_1 | 2022-06-08 14:56:04.582+00:00 | EthScheduler-Timer-0 | INFO | AbstractRetryingPeerTask | No useful peer found, checking remaining current peers for usefulness: 2
eth1_1 | 2022-06-08 14:56:04.584+00:00 | EthScheduler-Timer-0 | INFO | AbstractRetryingPeerTask | No useful peer found, checking remaining current peers for usefulness: 2
eth1_1 | 2022-06-08 14:56:05.584+00:00 | EthScheduler-Timer-0 | INFO | AbstractRetryingPeerTask | No useful peer found, checking remaining current peers for usefulness: 1
eth1_1 | 2022-06-08 14:56:07.204+00:00 | Timer-0 | INFO | DNSTimerTask | Refreshing DNS records with enrtree://AKA3AM6LPBYEUDMVNU3BSVQJ5AD45Y7YPOHJLEF6W26QOE4VTUDPE@all.goerli.ethdisco.net
eth1_1 | 2022-06-08 14:56:09.582+00:00 | EthScheduler-Timer-0 | INFO | AbstractRetryingPeerTask | No useful peer found, checking remaining current peers for usefulness: 1
eth1_1 | 2022-06-08 14:56:09.585+00:00 | EthScheduler-Timer-0 | INFO | AbstractRetryingPeerTask | No useful peer found, checking remaining current peers for usefulness: 1
eth1_1 | 2022-06-08 14:56:10.586+00:00 | EthScheduler-Timer-0 | INFO | AbstractRetryingPeerTask | No useful peer found, checking remaining current peers for usefulness: 1

@macfarla
Copy link
Contributor

macfarla commented Jun 8, 2022

@poupas DNS is not enabled by default. Technically it's still an experimental feature so we want folks to think about it before turning it on.

@macfarla
Copy link
Contributor

macfarla commented Jun 8, 2022

@Cryptizer69 it's normal to see some log messages like that especially when your node first starts, it has to find peers via discovery. From my local testing just now it does seem to be taking a bit longer to find peers with SNAP sync (as opposed to default FAST sync) - I think because there are fewer compatible peers. Does your node get past that stage and actually get to doing the sync?

If you are still having issues - can you share your config, if it's working for @poupas maybe there is something different with your setup?

@macfarla macfarla self-assigned this Jun 8, 2022
@Cryptizer69
Copy link
Author

Cryptizer69 commented Jun 9, 2022

The problem still persists.

I have a dedicated NUC 10 i7 32GB RAM, 2TB SSD. 1GB glassfiber bandwidth. Linux 20.04. Running Besu through the Rocketpool Smart Node stack

edit: ports are open. Geth worked fine when I switched from Geth to Besu

Keeps showing these logs:
eth1_1 | 2022-06-09 09:08:00.842+00:00 | Timer-0 | INFO | DNSTimerTask | Refreshing DNS records with enrtree://AKA3AM6LPBYEUDMVNU3BSVQJ5AD45Y7YPOHJLEF6W26QOE4VTUDPE@all.goerli.ethdisco.net
eth1_1 | 2022-06-09 09:08:04.993+00:00 | EthScheduler-Timer-0 | INFO | FullSyncTargetManager | No sync target, waiting for peers. Current peers: 1
eth1_1 | 2022-06-09 09:08:09.994+00:00 | EthScheduler-Timer-0 | INFO | FullSyncTargetManager | No sync target, waiting for peers. Current peers: 2
eth1_1 | 2022-06-09 09:08:14.996+00:00 | EthScheduler-Timer-0 | INFO | FullSyncTargetManager | No sync target, waiting for peers. Current peers: 0
eth1_1 | 2022-06-09 09:08:19.997+00:00 | EthScheduler-Timer-0 | INFO | FullSyncTargetManager | No sync target, waiting for peers. Current peers: 0
eth1_1 | 2022-06-09 09:08:24.401+00:00 | nioEventLoopGroup-3-9 | INFO | FullSyncTargetManager | No sync target, waiting for peers. Current peers: 1
eth1_1 | 2022-06-09 09:08:29.402+00:00 | EthScheduler-Timer-0 | INFO | FullSyncTargetManager | No sync target, waiting for peers. Current peers: 0
eth1_1 | 2022-06-09 09:08:34.403+00:00 | EthScheduler-Timer-0 | INFO | FullSyncTargetManager | No sync target, waiting for peers. Current peers: 0
eth1_1 | 2022-06-09 09:08:39.404+00:00 | EthScheduler-Timer-0 | INFO | FullSyncTargetManager | No sync target, waiting for peers. Current peers: 0
eth1_1 | 2022-06-09 09:08:44.405+00:00 | EthScheduler-Timer-0 | INFO | FullSyncTargetManager | No sync target, waiting for peers. Current peers: 0
eth1_1 | 2022-06-09 09:08:49.406+00:00 | EthScheduler-Timer-0 | INFO | FullSyncTargetManager | No sync target, waiting for peers. Current peers: 0

@macfarla
Copy link
Contributor

Thanks for the update @Cryptizer69. Will investigate further.

@macfarla
Copy link
Contributor

Are you still having the same problem @Cryptizer69 - it seems like goerli has been a bit up and down. Also there might be some helpful suggestions here on DNS issues #3290 (comment)

@Cryptizer69
Copy link
Author

Yes, I let Besu run for 3 hours this afternoon and it still won't sync. I'll drop Besu for now and will check again in the future. Thanks

@lfinbob
Copy link

lfinbob commented Jul 3, 2022

I have the same issue on goerli as @Cryptizer69: Not syncing, even after 24 hours.

Version: hyperledger/besu:22.4.3-openjdk-latest (via rocketpool 1.5.0 alpha 3).
Command line:

--network=goerli --data-path=/ethclient/besu --rpc-http-enabled --rpc-http-host=0.0.0.0 --rpc-http-port=8545 --host-allowlist=* --revert-reason-enabled --rpc-http-max-active-connections=65536 --data-storage-format=bonsai --sync-mode=X_SNAP --nat-method=docker --p2p-host= --Xmerge-support --engine-rpc-enabled --engine-rpc-port=8551 --engine-host-allowlist=* --engine-jwt-enabled --engine-jwt-secret=/secrets/jwtsecret --Xdns-enabled=true --override-genesis-config=terminalTotalDifficulty=100000000000000000000 --max-peers=25 --metrics-enabled --metrics-host=0.0.0.0 --metrics-port=9105 --p2p-port=30304 --bonsai-maximum-back-layers-to-load=512

I tried adding --Xdns-enabled=true but it doesn't help.

Port 30304 TCP and UDP is open, verified with tcpdump and nmap from an outside host.

Some log excerpts:

2022-07-03 20:09:40.424+00:00 | nioEventLoopGroup-3-5 | INFO | AbstractRetryingPeerTask | No useful peer found, checking remaining curren
t peers for usefulness: 1
2022-07-03 20:09:40.424+00:00 | nioEventLoopGroup-3-5 | INFO | AbstractRetryingPeerTask | No useful peer found, checking remaining curren
t peers for usefulness: 1
2022-07-03 20:09:40.424+00:00 | EthScheduler-Services-2 (batchDownloadAccountData) | INFO | AbstractRetryingPeerTask | No useful peer fou
nd, checking remaining current peers for usefulness: 1
2022-07-03 20:09:40.424+00:00 | EthScheduler-Services-2 (batchDownloadAccountData) | INFO | AbstractRetryingPeerTask | No useful peer fou
nd, checking remaining current peers for usefulness: 1
2022-07-03 20:09:40.424+00:00 | EthScheduler-Services-2 (batchDownloadAccountData) | INFO | AbstractRetryingPeerTask | No useful peer fou
nd, checking remaining current peers for usefulness: 1
2022-07-03 20:09:40.424+00:00 | nioEventLoopGroup-3-5 | INFO | WaitForPeersTask | Waiting for 5 total peers to connect. 1 peers currently
connected.
2022-07-03 20:09:40.424+00:00 | nioEventLoopGroup-3-5 | INFO | WaitForPeersTask | Waiting for 5 total peers to connect. 1 peers currently
connected.
2022-07-03 20:09:40.424+00:00 | nioEventLoopGroup-3-5 | INFO | WaitForPeersTask | Waiting for 5 total peers to connect. 1 peers currently
connected.
2022-07-03 20:09:40.424+00:00 | nioEventLoopGroup-3-5 | INFO | WaitForPeersTask | Waiting for 5 total peers to connect. 1 peers currently
connected.
2022-07-03 20:09:45.424+00:00 | EthScheduler-Services-1 (checkNewPivotBlock) | INFO | WaitForPeersTask | Waiting for 5 total peers to con
nect. 1 peers currently connected.
2022-07-03 20:09:45.859+00:00 | nioEventLoopGroup-3-5 | INFO | SyncTargetManager | Found common ancestor with peer Peer 0xc4e684d46054bac
e26... PeerReputation 100, validated? true, disconnected? false at block 3720129
2022-07-03 20:09:58.097+00:00 | nioEventLoopGroup-3-5 | INFO | AbstractRetryingPeerTask | No useful peer found, checking remaining curren
t peers for usefulness: 1
2022-07-03 20:09:58.097+00:00 | nioEventLoopGroup-3-5 | INFO | AbstractRetryingPeerTask | No useful peer found, checking remaining curren
t peers for usefulness: 1
2022-07-03 20:09:58.097+00:00 | nioEventLoopGroup-3-5 | INFO | AbstractRetryingPeerTask | No useful peer found, checking remaining curren
t peers for usefulness: 1
2022-07-03 20:09:58.097+00:00 | nioEventLoopGroup-3-5 | INFO | AbstractRetryingPeerTask | No useful peer found, checking remaining curren
t peers for usefulness: 1
2022-07-03 20:09:58.097+00:00 | nioEventLoopGroup-3-5 | INFO | AbstractRetryingPeerTask | No useful peer found, checking remaining current peers for usefulness: 1
2022-07-03 20:09:58.098+00:00 | nioEventLoopGroup-3-5 | INFO | AbstractRetryingPeerTask | No useful peer found, checking remaining current peers for usefulness: 0
2022-07-03 20:09:58.098+00:00 | nioEventLoopGroup-3-5 | INFO | AbstractRetryingPeerTask | No useful peer found, checking remaining current peers for usefulness: 0
2022-07-03 20:09:58.098+00:00 | nioEventLoopGroup-3-5 | INFO | AbstractRetryingPeerTask | No useful peer found, checking remaining current peers for usefulness: 0
2022-07-03 20:09:58.098+00:00 | nioEventLoopGroup-3-5 | INFO | AbstractRetryingPeerTask | No useful peer found, checking remaining current peers for usefulness: 0

And this WARN, related to IPv6 I guess, which is disabled on my machine:

2022-07-03 18:01:56.899+00:00 | vert.x-eventloop-thread-2 | WARN | VertxPeerDiscoveryAgent | Sending to peer DiscoveryPeer{status=bonding, enode=enode://949423caff724a5e787b5c3622fe6f20635a9228d68aa90ac926b5859c04d6a8fbc26963314108243c591a867a294fc778bfb45b9ce16902ce958619bf85570f@[2a01:7a7:2:1467:ec4:7aff:fee2:568e]:30303, firstDiscovered=1656871290518, lastContacted=0, lastSeen=0} failed, native error code -97, packet: 0x7703b472c73b8f8e44c7767f3dd77f151247612240167e824f09877144653d822bb924c58b0085b8b33223c309be1631d61ab83cf96bc4bfb5b7edcd745ba4341812cfa6d9c088bba30dd07035c93dde3b03e79f1d653d4e7f341b3f19fc8d840101eb05cb845dc89bc2827660827660d7902a0107a7000214670ec47afffee2568e82765f82765f8462c1d9d006, stacktrace: io.netty.channel.unix.Errors$NativeIoException: sendToAddress(..) failed: Address family not supported by protocol

The DNSTimerTask seems to find a validated peer, which is soon again dropped somehow:

2022-07-03 20:06:03.859+00:00 | Timer-0 | INFO | DNSTimerTask | Refreshing DNS records with enrtree://AKA3AM6LPBYEUDMVNU3BSVQJ5AD45Y7YPOH
[email protected]
2022-07-03 20:06:08.774+00:00 | nioEventLoopGroup-3-1 | INFO | SyncTargetManager | Found common ancestor with peer Peer 0x860f6852a789033
2d5... PeerReputation 100, validated? true, disconnected? false at block 3720129
2022-07-03 20:06:13.541+00:00 | EthScheduler-Timer-0 | INFO | AbstractRetryingPeerTask | No useful peer found, checking remaining current
peers for usefulness: 0

@lfinbob
Copy link

lfinbob commented Jul 4, 2022

Since @macfarla mentioned that the SNAP syncmode can find less peers, I tried --sync-mode=fast as well. Still same problem, after 16 hours still stuck at this:

2022-07-04 17:51:43.344+00:00 | EthScheduler-Timer-0 | INFO | AbstractRetryingPeerTask | No useful peer found, checking remaining current peers for usefulness: 3
2022-07-04 17:51:43.345+00:00 | EthScheduler-Timer-0 | INFO | AbstractRetryingPeerTask | No useful peer found, checking remaining current peers for usefulness: 3
2022-07-04 17:51:45.329+00:00 | EthScheduler-Timer-0 | INFO | FastSyncTargetManager | No sync target, checking current peers for usefulness: 0
2022-07-04 17:51:48.334+00:00 | EthScheduler-Timer-0 | INFO | AbstractRetryingPeerTask | No useful peer found, checking remaining current peers for usefulness: 0

@macfarla
Copy link
Contributor

macfarla commented Jul 6, 2022

we are taking a general look at peering with the aim of improving how quickly and efficiently besu finds peers. Having said that though, two things to note

  1. Goerli can be a bit funky with peering. It behaves differently to mainnet and other testnets.
  2. the "No useful peer found, checking remaining ... 3" that then drops to 0 - with FAST/SNAP sync this basically means that it's checking each of those 3 potential peers for the chosen pivot block and they don't have it. So they are disconnected as "useless" peers

If you are still having this issue, could you enable DEBUG logs for a time and see if there is any more useful info in there?

@lfinbob
Copy link

lfinbob commented Jul 6, 2022

Since it's still not syncing after I posted 2 days ago I enabled DEBUG. These messages stood out to me:

Confirmation that inbound connections do work (ports are open):

DEBUG | RlpxAgent | Inbound connection established with 0x2afb79935de1b1e6be8217468708f9f2eafd09c25e11e13d0606ccc54402bcd968c4bfbf5237156392a700597cf5f004dc61a7f19c6db731d0dd52f3399f606d

The same peer is disconnected with SUBPROTOCOL_TRIGGERED.
This disconnect reason is by far the most frequent reason and happens with up-to-date peers (like geth 1.10.19). Is that normal?

DEBUG | EthProtocolManager | Disconnect - Outbound - 0x10 SUBPROTOCOL_TRIGGERED - PeerInfo{version=5, clientId='Geth/v1.10.19-stable-23bee162/linux-amd64/go1.18.1', capabilities=[eth/66, eth/67, snap/1], port=0, nodeId=0x2afb79935de1b1e6be8217468708f9f2eafd09c25e11e13d0606ccc54402bcd968c4bfbf5237156392a700597cf5f004dc61a7f19c6db731d0dd52f3399f606d} - 1 peers left

Other disconnect reasons in the log:

0x04 TOO_MANY_PEERS
0x UNKNOWN (Full msg: DEBUG | EthProtocolManager | Disconnect - Inbound - 0x UNKNOWN - PeerInfo{version=5, clientId='', capabilities=[eth/66], port=0, nodeId=0x51492877f2ea50a5d10e70f5ac2757763325e54e7694a62b000aadfca1af4a3bd4f5c4eafdff8f7f525d38b9734dfad983c05b442b908949ada6670c3861b339} - 10 peers left)

Last there are various log messages due to what seems to be old/strange/misbehaving peers:

nioEventLoopGroup-3-3 | DEBUG | RlpxAgent | Failed to connect to peer 0xc8cfe118ed2afdfed216406390c2aec6719511f749a662da9e6d6ea16457b731d9597738a7b8d98a29b3d3f9639fe21a2623a5ffebd6bfef1eef4c05615eb46f: io.netty.handler.codec.DecoderException: org.hyperledger.besu.ethereum.rlp.RLPException: Expected current item to be a list, but it is: BYTE_ELEMENT (at bytes 0-1: [04])
DEBUG | DeFramer | Disconnecting because no capabilities are shared: PeerInfo{version=5, clientId='Quai/v1.10.10-unstable-5e01626a-20220220/linux-amd64/go1.17.7', capabilities=[qsnap/1, quai/66], port=0, nodeId=0x0a89df3fa5b4f97d87b3edc7035679345c1f70d1d2ff92daa386dbc48839b14e02fd95a58133a3eec709d0c201c147e61e3ba04181b2509a3717bfa842b98782}
at org.hyperledger.besu.ethereum.p2p.rlpx.wire.messages.DisconnectMessage$Data.readFrom(DisconnectMessage.java:83)
at org.hyperledger.besu.ethereum.p2p.rlpx.wire.messages.DisconnectMessage.getReason(DisconnectMessage.java:61)
at org.hyperledger.besu.ethereum.p2p.rlpx.wire.messages.DisconnectMessage$Data.readFrom(DisconnectMessage.java:83)
at org.hyperledger.besu.ethereum.p2p.rlpx.wire.messages.DisconnectMessage.getReason(DisconnectMessage.java:61)

@SeaMonkey82
Copy link

I'm experiencing this with checkpoint sync on Ropsten. One client pair worked fine, but nimbus-besu only ever manages to get 4/5 required peers.

@lfinbob
Copy link

lfinbob commented Jul 11, 2022

I did fix the problem by switching from a slower internet connection (6 mbps down, 1mbps up) to a faster one (100 mbps down, 30 mbps up).
After the switch besu immediately found peers and is syncing fine now.
Is there some mechanism that prevents peer connection on slower connections?
While 6/1 mbps is slow, I thought it would still sync a prater test note. The beacon node (lighthouse) did sync just fine on the connection.
Latency of slow connection is 60ms, fast one is 25ms.
There was no other traffic on the slow connection besides this rocketpool prater node.

@macfarla
Copy link
Contributor

That's interesting. There is a timeout mechanism - if peers repeatedly don't respond within a specific time, they can be rejected. Will add this to the list of things to investigate. Thank you for the extra info.

@macfarla
Copy link
Contributor

Closing this one as you've got a way forward. We're working on getting some recommended specs for running Besu which might help this type of issue in the future.

@PerfectDark99
Copy link

I'm trying to switch from geth to besu and running into this issue on mainnet. besu doesn't seem to find any peers. I get this message repeatedly in the logs:

No sync target, checking current peers for usefulness:

Using default port 30303 (like geth) for peer discovery and geth worked just fine. Port is open both in ufw and NAT router.

Internet is fast enough, 1gbps down/250mbps up. Hardware is also "overkill". 32 gb RAM, 6-core [email protected].


Now while waiting a long time and writing above paragraph, something seems to finally happen after 45 min:

Waiting for 5 total peers to connect. 2 peers currently connected
Waiting for valid peers with chain height information.  1 / 5 required peers currently available    

Resource usage went up and I now see log entries like below:

Completed importing chain segment 11073785 to 11075584 (1800 blocks in 10033ms), Peers: 1

I assume it is syncing correctly now regardless of the waiting for peers messages?

I'm using --sync-mode=X_CHECKPOINT. Could that cause the large wait time? Maybe some kind of warning/tip in the docs about possible long time to find peers?

@macfarla
Copy link
Contributor

Thanks for your report @PerfectDark99 - yes if you're seeing "importing chain" messages, your node is syncing. And yes with checkpoint sync, you need peers with that exact checkpoint. Good idea to add a note to the docs.

@timjrobinson
Copy link

I'm having the same issues syncing with latest Rocketpool (1.6.0) + Besu (22.7.1). I'm trying to switch from Geth to Besu. Geth was working fine. It's been 24 hours and my logs are still:

eth1_1           | 2022-08-30 02:05:29.848+00:00 | EthScheduler-Timer-0 | INFO  | WaitForPeersTask | Waiting for 3 total peers to connect. 2 peers currently connected.
eth1_1           | 2022-08-30 02:05:31.209+00:00 | EthScheduler-Timer-0 | INFO  | FastSyncTargetManager | No sync target, checking current peers for usefulness: 3
eth1_1           | 2022-08-30 02:05:34.869+00:00 | nioEventLoopGroup-3-2 | INFO  | FastSyncTargetManager | No sync target, checking current peers for usefulness: 4
eth1_1           | 2022-08-30 02:05:38.222+00:00 | nioEventLoopGroup-3-1 | INFO  | FastSyncTargetManager | No sync target, checking current peers for usefulness: 5
eth1_1           | 2022-08-30 02:05:39.869+00:00 | EthScheduler-Timer-0 | INFO  | FastSyncActions | Waiting for valid peers with chain height information.  1 / 3 required peers currently available.
eth1_1           | 2022-08-30 02:05:44.869+00:00 | EthScheduler-Timer-0 | INFO  | WaitForPeersTask | Waiting for 3 total peers to connect. 2 peers currently connected.
eth1_1           | 2022-08-30 02:05:51.463+00:00 | nioEventLoopGroup-3-8 | INFO  | FastSyncActions | Waiting for valid peers with chain height information.  1 / 3 required peers currently available.
eth1_1           | 2022-08-30 02:05:56.464+00:00 | EthScheduler-Timer-0 | INFO  | FastSyncActions | Waiting for valid peers with chain height information.  1 / 3 required peers currently available.
eth1_1           | 2022-08-30 02:06:01.464+00:00 | EthScheduler-Timer-0 | INFO  | WaitForPeersTask | Waiting for 3 total peers to connect. 1 peers currently connected.
eth1_1           | 2022-08-30 02:06:06.911+00:00 | EthScheduler-Timer-0 | INFO  | FastSyncTargetManager | No sync target, checking current peers for usefulness: 4
eth1_1           | 2022-08-30 02:06:09.107+00:00 | nioEventLoopGroup-3-7 | INFO  | FastSyncTargetManager | No sync target, checking current peers for usefulness: 4
eth1_1           | 2022-08-30 02:06:10.579+00:00 | nioEventLoopGroup-3-8 | INFO  | FastSyncTargetManager | No sync target, checking current peers for usefulness: 4
eth1_1           | 2022-08-30 02:06:14.777+00:00 | nioEventLoopGroup-3-8 | INFO  | FastSyncTargetManager | No sync target, checking current peers for usefulness: 4
eth1_1           | 2022-08-30 02:06:14.913+00:00 | EthScheduler-Timer-0 | INFO  | WaitForPeersTask | Waiting for 3 total peers to connect. 2 peers currently connected.

Always 1 or 2 peers connected, it never gets to 3. Is there some way I can further debug this?

@PavelNiedoba
Copy link

getting no peer, port is forwarded, besu is literary stuck, 1Gbit in serverhouse, 16 core xeon, running docker image

@PavelNiedoba
Copy link

consider reopening this

@icemagno
Copy link

icemagno commented Jan 23, 2024

I have this message with a brand new private network. Is that normal? I can see all peers ( 5 ) are connected and communicating. Some nodes (3 of them) are X_SNAP and are talking each other. The other 2 FULL nodes are tell me they have peers but still waiting for sync ...

@macfarla
Copy link
Contributor

@icemagno suggest you try adding fast-sync-min-peers=1 to your config (works for snap sync as well) - if that doesn't help, suggest asking your q on discord or opening a new issue with more info to help us investigate

@icemagno
Copy link

icemagno commented Feb 22, 2024

Ok. Seems I've figured out. In my case I just create more node keys and by accident regenerate the genesis file. While the genesis configuration still the same, the VALIDATORS addresses in "extraData" was changed causing the blockchain having no validators because I actually have not put the new nodes (generated keys) to run yet. By keeping the old 5 nodes running without any new ones the block generation stucks even having 4 peers connected because no one is a validator.

I think this may affect many people because it is common (when starting a new blockchain) to test and regenerate nodes without changing the extraData of the genesis file. So in the end of day we lost the first nodes wich are validators.

That's what I think was happened

I think the Grafana panel may have the indication of how many validators exists or a new metric must be created.
https://besu.hyperledger.org/public-networks/how-to/monitor/metrics

Yeah I know about the RPC method

@icemagno
Copy link

In the genesis block, the important information in the extra data is the list of validators. All other details have empty values. Formally, extraData in the genesis block contains RLP([32 bytes Vanity, List, No Vote, Round=Int(0), 0 Seals]).

From https://besu.hyperledger.org/23.4.0/private-networks/how-to/configure/consensus/ibft

@stefanopineda
Copy link

Had no peers (over 10 hours) connecting with my node connected via ethernet powerline adapter (~30Mbs download speed). Connected directly to the router (moved the node) and the 300+ Mbs speed started syncing Besu immediately. 7% complete within 30 minutes of connecting directly to gateway router. I did also open ports 30303 and 9001 prior to moving the node.

Node Specs
Dell Optiplex 3060, i5 8500 6 core / 6 thread, 32 GB DDR4, 2 TB m.2 nvme

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
peering TeamRevenant GH issues worked on by Revenant Team
Projects
None yet
Development

No branches or pull requests