Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Empty result for eth_call? #67

Open
MysticRyuujin opened this issue Nov 21, 2020 · 42 comments
Open

Empty result for eth_call? #67

MysticRyuujin opened this issue Nov 21, 2020 · 42 comments

Comments

@MysticRyuujin
Copy link
Contributor

I've had a user report that they're having issues with eth_call being reverted again.

I see this in the logs:

WARN  |      QuorumRpcReader | Empty result for eth_call as io.emeraldpay.dshackle.quorum.AlwaysQuorum@84b8c47

I'm wondering if it's related?

We recently started adding Geth and Turbo-Geth into our setup and I noticed that when making eth_call that gets reverted, Nethermind has a "data" field, but Geth and Turbo-Geth do not and I wonder if that has something to do with it?

Just seems weird.

The user is getting back:

"error": {
        "code": -32002,
        "message": "No response or no available upstream for eth_call"
    }

Which I know is a dshackle error message, even though all of our nodes are configured for eth_call and making manual calls works fine.

Example curl used to test with:

curl -s -H 'Content-Type: application/json' --data '{"jsonrpc":"2.0","method":"eth_call","params":[{"data":"0x06fdde03","to":"0x1f0d3048b3d49de0ed6169a443dbb049e6daa6ce"},{"blockHash":"0xa5626dc20d3a0a209b1de85521717a3e859698de8ce98bca1b16822b7501f74b"}],"id":1}'

Geth and Turbo-Geth:

{
  "jsonrpc": "2.0",
  "id": 1,
  "error": {
    "code": -32000,
    "message": "execution reverted"
  }
}

Nethermind:

{
  "jsonrpc": "2.0",
  "error": {
    "code": -32015,
    "message": "VM execution error.",
    "data": "revert"
  },
  "id": 1
}
@MysticRyuujin
Copy link
Contributor Author

MysticRyuujin commented Nov 22, 2020

I did some more testing, and as far as I can tell, this is related. Every time I make the call to Dshackle, I get the error in the subject. All nodes return error revert every time (of course) but dshackle doesn't seem to be processing it correctly and returns the bogus response, instead of the revert error.

I'm not sure but there MIGHT be something related to having multiple nodes capable of serving eth_call? One of our users suggested that this used to work when he hit a dshackle instance that only had 1 target/upstream. I don't know if that' useful information, but it could be related.

OR it has something to do with the QuorumRpcReader trying to come to a conclusion about the error/revert but being unable to because of the differences between the responses of Geth/TG and Nethermind?

@MysticRyuujin
Copy link
Contributor Author

Maybe related:
ethereum/go-ethereum#21886

@MysticRyuujin
Copy link
Contributor Author

Ok I've done even more testing...and I think this more likely a bug in dshackle than it is a multi-client compatibility issue...

I put eth_call into the "disabled" methods list for all of the nodes except our Turbo-Geth and Geth nodes, which return the exact same thing for this call, and dshackle still doesn't return correctly...

Proof:


$ curl -H 'Content-Type: application/json' --data '{"jsonrpc":"2.0","method":"eth_call","params":[{"data":"0x06fdde03","to":"0x1f0d3048b3d49de0ed6169a443dbb049e6daa6ce"},{"blockHash":"0xa5626dc20d3a0a209b1de85521717a3e859698de8ce98bca1b16822b7501f74b"}],"id":1}' 

Turbo-Geth:
{"jsonrpc":"2.0","id":1,"error":{"code":-32000,"message":"execution reverted"}}

Geth:
{"jsonrpc":"2.0","id":1,"error":{"code":-32000,"message":"execution reverted"}}

Dshackle:
{"jsonrpc":"2.0","id":1,"error":{"code":-32002,"message":"No response or no available upstream for eth_call"}}

@MysticRyuujin
Copy link
Contributor Author

Have you had any chance to look at this one?

@splix
Copy link
Member

splix commented Dec 8, 2020

Not sure. I found a similar issue with eth_call when it refers a specific height but routed to an upstream that didn't catch it yet. Fixed in [90c3976]

But I'm not sure it fixes this issue, will need additional debug.

Btw, are you sure that you can pass {"blockHash":"0xa5626dc20d3a0a209b1de85521717a3e859698de8ce98bca1b16822b7501f74b"}? Following https://eth.wiki/json-rpc/API#the-default-block-parameter it should be a string and supposed to be a height if hex.

@MysticRyuujin
Copy link
Contributor Author

I'm as sure as the fact that I can copy and paste the above and it works... 🤷‍♂️

I didn't create the call above, I all I know is what the nodes output and what dshackle outputs.

@MysticRyuujin
Copy link
Contributor Author

Looks like it's probably related to this

https://eips.ethereum.org/EIPS/eip-1898

@splix
Copy link
Member

splix commented Dec 8, 2020

Oh, I see. That's interesting, didn't see this EIP, thank you.

I'll do some debugging and hope to fix it asap

@MysticRyuujin
Copy link
Contributor Author

FYI, someone reported getting a similar error with eth_getLogs

I assume it's related (dshackle thinks there are errors, but in reality it's just some kind of revert or other EVM error?)

@MysticRyuujin
Copy link
Contributor Author

poke

@MysticRyuujin
Copy link
Contributor Author

This seems to be happening a lot for us these days, which results in very poor performance for queries
image

I'm guessing it has to do with the fact that the nodes are returning some kind of error, but not an ACTUAL error, and dshackle is retrying across both nodes.

@MysticRyuujin
Copy link
Contributor Author

MysticRyuujin commented Mar 15, 2021

Here's an example:

curl -H "Content-Type: application/json" http://localhost:8544 -d '{"jsonrpc":"2.0","id":115,"method":"eth_call","params":[{"to":"0xdAC17F958D2ee523a2206206994597C13D831ec7","from":"0xEF65ffB384c99a00403EAa22115323a555700D79","data":"0xa9059cbb0000000000000000000000003f5ce5fbfe3e9af3971dd833d26ba9b5c936f0be000000000000000000000000000000000000000000000000000000004856fb60"},"0xb7ceb9"]}'

Node:

{"jsonrpc":"2.0","id":115,"error":{"code":-32000,"message":"invalid opcode: opcode 0xfe not defined"}}

Dshackle:

{"jsonrpc":"2.0","id":115,"error":{"code":-32002,"message":"No response or no available upstream for eth_call"}}

I also checked with Alchemy and they return the same thing our node returns from that call.

@splix
Copy link
Member

splix commented Mar 15, 2021

Hi Chase! Thank you, working on it right now, I can reproduce it and going to fix, and make a release (in in a couple of days at max).

@splix
Copy link
Member

splix commented Mar 16, 2021

@MysticRyuujin please try version 0.9.2

@MysticRyuujin
Copy link
Contributor Author

There was no change...same result...

@MysticRyuujin
Copy link
Contributor Author

Log:
image

Docker Host:
image

Response:

{"jsonrpc":"2.0","id":115,"error":{"code":-32002,"message":"No response or no available upstream for eth_call"}}

@MysticRyuujin
Copy link
Contributor Author

Oh, I think maybe you fixed the blockHash issues and NOT the error return issue?

@MysticRyuujin
Copy link
Contributor Author

Either way, the original query referenced in the first post, the example query, still doesn't work.

@splix
Copy link
Member

splix commented Mar 16, 2021

yeah, I actually fixed the issue with block height/hash in eth_call, and it was the main cause of that issue.

I made an integration test with exactly your code, and it works now for that example, at least in my integration test. So you're sure that it's still not fixed in production?

@MysticRyuujin
Copy link
Contributor Author

MysticRyuujin commented Mar 16, 2021

100%

curl -s -H 'Content-Type: application/json' --data '{"jsonrpc":"2.0","method":"eth_call","params":[{"data":"0x06fdde03","to":"0x1f0d3048b3d49de0ed6169a443dbb049e6daa6ce"},{"blockHash":"0xa5626dc20d3a0a209b1de85521717a3e859698de8ce98bca1b16822b7501f74b"}],"id":1}' http://docker02.archivenode.io:8545/eth

{"jsonrpc":"2.0","id":1,"error":{"code":-32002,"message":"No response or no available upstream for eth_call"}}

image

dshackle - 2021-16-03 22:11:14.359 | WARN  |      QuorumRpcReader | Empty result for eth_call as io.emeraldpay.dshackle.quorum.AlwaysQuorum@75d9772a

@splix
Copy link
Member

splix commented Mar 16, 2021

what if you try without blockHash, or with referencing it just by height?

@MysticRyuujin
Copy link
Contributor Author

MysticRyuujin commented Mar 16, 2021

Let's see that block translates to: 0x9a268c and it results in the same thing:

curl -s -H 'Content-Type: application/json' --data '{"jsonrpc":"2.0","method":"eth_call","params":[{"data":"0x06fdde03","to":"0x1f0d3048b3d49de0ed6169a443dbb049e6daa6ce"},"0x9a268c"],"id":1}' http://docker02.archivenode.io:8545/eth

{"jsonrpc":"2.0","id":1,"error":{"code":-32002,"message":"No response or no available upstream for eth_call"}

Actual Node:

{"jsonrpc":"2.0","id":1,"error":{"code":-32000,"message":"execution reverted"}}

Error:

dshackle - 2021-16-03 22:18:32.726 | WARN  |      QuorumRpcReader | Empty result for eth_call as io.emeraldpay.dshackle.quorum.AlwaysQuorum@670c8d3f

@splix
Copy link
Member

splix commented Mar 16, 2021

ok, thank you. that's different from what I get with my test env, but I'll try to reproduce. will keep you posted

@splix
Copy link
Member

splix commented Mar 18, 2021

@MysticRyuujin i'm still trying to reproduce and now i'm thinking that it's something else causing the error. Is it possible that all of the upstreams are syncing, or something like that? I mean is it possible that the error is real, i.e. Dshackle cannot find an upstream with the specified height?

@MysticRyuujin
Copy link
Contributor Author

MysticRyuujin commented Mar 18, 2021

I am 98% certain that dshackle is not properly handling the response from the node, it reads what the node is returning (
{"jsonrpc":"2.0","id":1,"error":{"code":-32000,"message":"execution reverted"}} ) and interprets that as an error.

Am I wrong?

I think I can produce this with any "vm execution error" response.

If you take the curl command above and change the block to "latest" it still happens, and that block is quite old now, the 2 nodes are in sync and "OK" according to dshackle.

@splix
Copy link
Member

splix commented Mar 22, 2021

@MysticRyuujin can you try with v0.9.3?

@splix
Copy link
Member

splix commented Mar 22, 2021

In general, the error "Empty result for eth_call as io.emeraldpay.dshackle.quorum.AlwaysQuorum" in log means that the Dshackle didn't receive any result (so it's "empty") from any of the upstreams. But the reason of that could be different.

Previously, dshackle was ignoring upstream errors. The idea was that an another upstream could answer with no-error. But when you have errors like "vm execution error", which is same on any upstream, that approach leads to that incorrect behaviour.

There were other issues that lead to a situation when the Dshackle wasn't able to find any valid upstream for a query. So again, was replying with that error. I thought I fixed all of the possible scenarios when it cannot find upstream, so it's unclear what's going on with that call.

With the last version, if the error is still there, it should provide a better logging that I can use for further debugging. Also, if it's possible to wiretap communication between Dshackle and upstreams, please check if it actually makes any request to the upstreams

@MysticRyuujin
Copy link
Contributor Author

MysticRyuujin commented Mar 23, 2021

Ok...here we go.

dshackle.yaml:

host: 0.0.0.0
port: 2449
tls:
  enabled: false

cache:
  redis:
    enabled: false

proxy:
  host: 0.0.0.0
  port: 8080
  tls:
    enabled: false
  routes:
    - id: eth
      blockchain: ethereum

cluster:
  defaults:
    - chains:
        - ethereum
  upstreams:
    - id: archive01
      chain: ethereum
      methods:
        enabled:
          - name: eth_call
      connection:
        ethereum:
          rpc:
            url: "http://archive01.archivenode.io:8545"
    - id: archive04
      chain: ethereum
      methods:
        enabled:
          - name: eth_call
      connection:
        ethereum:
          rpc:
            url: "http://192.168.10.50:8544"

Dockerfile:

FROM emeraldpay/dshackle:0.9.3
COPY ./dshackle.yaml /etc/dshackle/dshackle.yaml

Bash / Docker Logs:

$ docker build .
Sending build context to Docker daemon  3.584kB
Step 1/2 : FROM emeraldpay/dshackle:0.9.3
0.9.3: Pulling from emeraldpay/dshackle
cd17e56c322c: Already exists 
ecdd73bb9922: Already exists 
78a00902ec9e: Already exists 
c7c69b879765: Already exists 
d9e6b841c03b: Pull complete 
7cacfeb427da: Pull complete 
Digest: sha256:55f6db3ca47c498767324ab31e108df16bfc0538d6931a71ee9d37b8ed9a0dc0
Status: Downloaded newer image for emeraldpay/dshackle:0.9.3
 ---> 0e4671b798f9
Step 2/2 : COPY ./dshackle.yaml /etc/dshackle/dshackle.yaml
 ---> ebdc874bea5d
Successfully built ebdc874bea5d

$ docker run -p 8080:8080 ebdc874bea5d
                                _     _     __ _     _                _    _
                               | |   | |   / /| |   | |              | |  | |
   ___ _ __ ___   ___ _ __ __ _| | __| |  / /_| |___| |__   __ _  ___| | _| | ___
  / _ \ '_ ` _ \ / _ \ '__/ _` | |/ _` | / / _` / __| '_ \ / _` |/ __| |/ / |/ _ \
 |  __/ | | | | |  __/ | | (_| | | (_| |/ / (_| \__ \ | | | (_| | (__|   <| |  __/
  \___|_| |_| |_|\___|_|  \__,_|_|\__,_/_/ \__,_|___/_| |_|\__,_|\___|_|\_\_|\___|
   Emerald Dshackle - Fault Tolerant Load Balancer for Blockchain API
   https://github.com/emeraldpay/dshackle
   v0.9.3 built from 14be4e5134 on 2021-03-22T23:15:20 UTC

2021-23-03 00:44:06.769 | INFO  |            StarterKt | Starting StarterKt on 847395fa1708 with PID 1 (/app/classes started by root in /)
2021-23-03 00:44:06.773 | INFO  |            StarterKt | No active profile set, falling back to default profiles: default
2021-23-03 00:44:08.737 | INFO  |               Config | Using config: /etc/dshackle/dshackle.yaml
2021-23-03 00:44:09.117 | INFO  |           GrpcServer | Starting gRPC Server...
2021-23-03 00:44:09.117 | INFO  |           GrpcServer | Listening Native gRPC on 0.0.0.0:2449
2021-23-03 00:44:09.285 | WARN  |             TlsSetup | Using insecure transport for Native gRPC
2021-23-03 00:44:09.546 | INFO  |           GrpcServer | GRPC Server started
2021-23-03 00:44:09.566 | INFO  |          ProxyServer | Listening Proxy on 0.0.0.0:8080
2021-23-03 00:44:09.585 | WARN  |             TlsSetup | Using insecure transport for proxy
2021-23-03 00:44:09.755 | INFO  |  ConfiguredUpstreams | Using Ethereum upstream, at http://archive01.archivenode.io:8545
2021-23-03 00:44:09.760 | WARN  |  EthereumRpcUpstream | Setting up upstream archive01 with RPC-only access, less effective than WS+RPC
2021-23-03 00:44:09.791 | INFO  |  EthereumRpcUpstream | Configured for Ethereum
2021-23-03 00:44:09.845 | INFO  | entMultistreamHolder | Upstream archive01 with chain ETHEREUM has been added
2021-23-03 00:44:09.850 | INFO  |  ConfiguredUpstreams | Using Ethereum upstream, at http://192.168.10.50:8544
2021-23-03 00:44:09.851 | WARN  |  EthereumRpcUpstream | Setting up upstream archive04 with RPC-only access, less effective than WS+RPC
2021-23-03 00:44:09.855 | INFO  |  EthereumRpcUpstream | Configured for Ethereum
2021-23-03 00:44:09.899 | INFO  | entMultistreamHolder | Upstream archive04 with chain ETHEREUM has been added
2021-23-03 00:44:10.368 | INFO  |            StarterKt | Started StarterKt in 4.006 seconds (JVM running for 5.077)
2021-23-03 00:44:11.367 | INFO  |          Multistream | State of ETH: height=?, status=UNAVAILABLE/2, lag=[0, 9223372036854775807]
2021-23-03 00:44:25.364 | INFO  |          Multistream | State of ETH: height=12091958, status=UNAVAILABLE/1,OK/1, lag=[0, 0]
2021-23-03 00:44:40.364 | INFO  |          Multistream | State of ETH: height=12091959, status=OK/2, lag=[0, 0]
2021-23-03 00:44:55.364 | INFO  |          Multistream | State of ETH: height=12091959, status=OK/2, lag=[0, 0]
2021-23-03 00:45:10.364 | INFO  |          Multistream | State of ETH: height=12091961, status=OK/2, lag=[0, 0]
2021-23-03 00:45:25.364 | INFO  |          Multistream | State of ETH: height=12091963, status=SYNCING/1,OK/1, lag=[0, 0]
2021-23-03 00:45:40.364 | INFO  |          Multistream | State of ETH: height=12091965, status=OK/2, lag=[0, 0]
2021-23-03 00:45:54.350 | WARN  |      QuorumRpcReader | Did get any result from upstream. Method [eth_call] using [Quorum: Accept Any]
2021-23-03 00:45:55.364 | INFO  |          Multistream | State of ETH: height=12091965, status=OK/2, lag=[0, 0]
2021-23-03 00:46:10.364 | INFO  |          Multistream | State of ETH: height=12091965, status=OK/2, lag=[0, 0]
2021-23-03 00:46:25.364 | INFO  |          Multistream | State of ETH: height=12091965, status=OK/2, lag=[0, 0]
2021-23-03 00:46:40.364 | INFO  |          Multistream | State of ETH: height=12091965, status=OK/2, lag=[0, 0]
2021-23-03 00:46:55.364 | INFO  |          Multistream | State of ETH: height=12091967, status=OK/2, lag=[0, 0]
2021-23-03 00:46:59.553 | WARN  |      QuorumRpcReader | Did get any result from upstream. Method [eth_call] using [Quorum: Accept Any]
2021-23-03 00:47:10.364 | INFO  |          Multistream | State of ETH: height=12091967, status=OK/2, lag=[0, 0]
2021-23-03 00:47:25.364 | INFO  |          Multistream | State of ETH: height=12091967, status=OK/2, lag=[0, 0]
2021-23-03 00:47:40.364 | INFO  |          Multistream | State of ETH: height=12091969, status=OK/1,SYNCING/1, lag=[0, 0]
2021-23-03 00:47:54.606 | WARN  |      QuorumRpcReader | Did get any result from upstream. Method [eth_call] using [Quorum: Accept Any]
2021-23-03 00:47:55.364 | INFO  |          Multistream | State of ETH: height=12091972, status=OK/2, lag=[0, 0]
2021-23-03 00:48:10.364 | INFO  |          Multistream | State of ETH: height=12091975, status=SYNCING/1,OK/1, lag=[0, 0]
2021-23-03 00:48:25.364 | INFO  |          Multistream | State of ETH: height=12091975, status=OK/2, lag=[0, 0]
2021-23-03 00:48:40.364 | INFO  |          Multistream | State of ETH: height=12091975, status=OK/2, lag=[0, 0]

Guessing the part you care about is this:

2021-23-03 00:47:54.606 | WARN | QuorumRpcReader | Did get any result from upstream. Method [eth_call] using [Quorum: Accept Any]

curl command:

curl -H 'Content-Type: application/json' --data '{"jsonrpc":"2.0","method":"eth_call","params":[{"data":"0x06fdde03","to":"0x1f0d3048b3d49de0ed6169a443dbb049e6daa6ce"},"0x9a268c"],"id":1}' http://nginx01:8080/eth

Order of operations witnessed in Wireshark:

  1. dshackle receives HTTP POST

  2. dshackle forwards request upstream

  3. dshackle gets response from upstream
    {"jsonrpc":"2.0","id":1,"error":{"code":-32000,"message":"execution reverted"}}

This is where it gets weird...

  1. dshackle seems to ask the node if it's syncing with eth_syncing

  2. node returns false

  3. dshackle asks for peer count with net_peerCount

  4. node returns 0x63

  5. dshackle tries again with the same request

  6. Again, node returns:
    {"jsonrpc":"2.0","id":1,"error":{"code":-32000,"message":"execution reverted"}}

It seems to repeat this several more times, I'm not sure why dshackle isn't asking the 2nd node at this point...according to my wireshark logs it never asks the 2nd node...

Anyway, eventually I get back the same error:
{"jsonrpc":"2.0","id":1,"error":{"code":-32002,"message":"No response or no available upstream for eth_call"}}

@splix
Copy link
Member

splix commented Mar 23, 2021

Thank you!
I see what it does wrong now, and I think I found the issue. Going to test it and make another release tomorrow (if that's it).

splix added a commit that referenced this issue Mar 23, 2021
…ds; allowing an existing method redefines its logic

rel: #67
splix added a commit that referenced this issue Mar 23, 2021
…ds; allowing an existing method redefines its logic

rel: #67
@splix
Copy link
Member

splix commented Mar 23, 2021

Please try v0.9.4

The root cause was enabling eth_call method in your Dshackle config. This method is configured by default, and worked fine alone, but enabling it again caused such weird behavior (also that's why I wasn't able to reproduce it).

The first bug was that the redefining of an existing method breaks quorum logic, which it shouldn't, and is fixed now. And the second effect that the redefined quorum had another bug with handling errors, now is fixed too.

@splix
Copy link
Member

splix commented Mar 23, 2021

You can keep

methods:
  enabled:
    - name: eth_call

if you want, and it shouldn't cause any errors now. But it's unnecessary, and I suggest to remove it, just in case.

@MysticRyuujin
Copy link
Contributor Author

I'll give it a shot here in a little bit but I just wanted to clarify that I explicitly set all of the enabled methods on my nodes that way I have a clear list of methods that are supported by each endpoint. I know I can omit the default methods - but this is cleaner for me to keep track of.

@MysticRyuujin
Copy link
Contributor Author

MysticRyuujin commented Mar 23, 2021

Ok...so...that KIND of worked...

When both nodes are up and online (in the OK/2 state), I now get back:
{"jsonrpc":"2.0","id":1,"error":{"code":-32000,"message":"execution reverted"}}

Yay!

However...

Because of the behavior I witness in wireshark yesterday, where dshackle was only talking to one of the nodes (archive04, the second node in the list) and wasn't asking the other node (archive01, the first node in the list) for the eth_call data - I decided I would try to shutdown archive04.

Now, dshackle is in the status=OK/1,UNAVAILABLE/1 state...where archive01 is OK and archive04 is UNAVAILABLE and I get back:
{"jsonrpc":"2.0","id":1,"error":{"code":-32002,"message":"No response or no available upstream for eth_call"}}

I verified with wireshark AGAIN that dshackle is not sending this request to archive01 for some reason...

So, I went back into the dshackle.yaml file and removed the -name: eth_call line from both nodes just to make sure nothing funky was going on like you suggested above, but that did not change the behavior. For some reason dshackle refuses to ask the first node in the list (archive01) for eth_call data...

To make sure that nothing was blocking the communications, I removed archive04 from the dshackle.yaml file completely and it started working and communicating with archive01 and returned:
{"jsonrpc":"2.0","id":1,"error":{"code":-32000,"message":"execution reverted"}}

So something funky is going on with routing / node selection here.

@MysticRyuujin
Copy link
Contributor Author

Does what I wrote above make sense? Can I provide any additional details or logs somehow?

@splix
Copy link
Member

splix commented Mar 26, 2021

Sorry for the delay. Yes, it all makes sense, I'm just trying to understand what's going on. Initially I thought It's a simple mistake that leads to starting from same upstream all the time (i.e. each request starts from the same loop), but after checking the code and test it looks like something different. I mean I found the same issue with uneven distribution of load to upstreams with my prod setup too, but still cannot reproduce it with any simple test setup.

splix added a commit that referenced this issue Mar 27, 2021
solution: ensure all heads are started on multihead start
rel: #67
@splix
Copy link
Member

splix commented Mar 27, 2021

Fixed that issue with the uneven dispatching of the load. Going to fix #82 and make a new release

splix added a commit that referenced this issue Mar 27, 2021
solution: ensure all heads are started on multihead start
rel: #67
@splix
Copy link
Member

splix commented Mar 27, 2021

@MysticRyuujin please try v0.9.5

@MysticRyuujin
Copy link
Contributor Author

ummmm... It seems to have thrown some kind of exception when I killed one of the nodes and then it seems like it got stuck and wouldn't update anymore. After the exception it stopped updating the status of the node and just kept spitting back old block height data (I verified the nodes are good).

Here's a sample:

2021-27-03 23:12:27.150 | INFO  |          Multistream | State of ETH: height=12124041, status=OK/2, lag=[1, 0]
2021-27-03 23:12:42.150 | INFO  |          Multistream | State of ETH: height=12124043, status=LAGGING/1,OK/1, lag=[2, 0]
2021-27-03 23:12:57.150 | INFO  |          Multistream | State of ETH: height=12124043, status=SYNCING/1,OK/1, lag=[2, 0]
2021-27-03 23:13:12.150 | INFO  |          Multistream | State of ETH: height=12124043, status=LAGGING/1,OK/1, lag=[2, 0]
2021-27-03 23:13:27.150 | INFO  |          Multistream | State of ETH: height=12124043, status=LAGGING/1,OK/1, lag=[2, 0]
2021-27-03 23:13:42.150 | INFO  |          Multistream | State of ETH: height=12124043, status=LAGGING/1,OK/1, lag=[2, 0]
2021-27-03 23:13:56.210 | WARN  |         WriteRpcJson | Failed to convert to JSON
reactor.core.Exceptions$RetryExhaustedException: Retries exhausted: 3/3
	at reactor.core.Exceptions.retryExhausted(Exceptions.java:291) ~[reactor-core-3.3.5.RELEASE.jar:3.3.5.RELEASE]
	at reactor.util.retry.RetryBackoffSpec.lambda$static$0(RetryBackoffSpec.java:66) ~[reactor-core-3.3.5.RELEASE.jar:3.3.5.RELEASE]
	at reactor.util.retry.RetryBackoffSpec.lambda$generateCompanion$4(RetryBackoffSpec.java:514) ~[reactor-core-3.3.5.RELEASE.jar:3.3.5.RELEASE]
	at reactor.core.publisher.FluxConcatMap$ConcatMapImmediate.drain(FluxConcatMap.java:367) [reactor-core-3.3.5.RELEASE.jar:3.3.5.RELEASE]
	at reactor.core.publisher.FluxConcatMap$ConcatMapImmediate.innerComplete(FluxConcatMap.java:288) [reactor-core-3.3.5.RELEASE.jar:3.3.5.RELEASE]
	at reactor.core.publisher.FluxConcatMap$ConcatMapInner.onComplete(FluxConcatMap.java:875) [reactor-core-3.3.5.RELEASE.jar:3.3.5.RELEASE]
	at reactor.core.publisher.Operators$MonoSubscriber.complete(Operators.java:1756) [reactor-core-3.3.5.RELEASE.jar:3.3.5.RELEASE]
	at reactor.core.publisher.MonoFlatMap$FlatMapInner.onNext(MonoFlatMap.java:241) [reactor-core-3.3.5.RELEASE.jar:3.3.5.RELEASE]
	at reactor.core.publisher.Operators$MonoSubscriber.complete(Operators.java:1755) [reactor-core-3.3.5.RELEASE.jar:3.3.5.RELEASE]
	at reactor.core.publisher.MonoIgnoreThen$ThenIgnoreMain.drain(MonoIgnoreThen.java:147) [reactor-core-3.3.5.RELEASE.jar:3.3.5.RELEASE]
	at reactor.core.publisher.MonoIgnoreThen.subscribe(MonoIgnoreThen.java:56) [reactor-core-3.3.5.RELEASE.jar:3.3.5.RELEASE]
	at reactor.core.publisher.MonoFlatMap$FlatMapMain.onNext(MonoFlatMap.java:150) [reactor-core-3.3.5.RELEASE.jar:3.3.5.RELEASE]
	at reactor.core.publisher.Operators$MonoSubscriber.complete(Operators.java:1755) [reactor-core-3.3.5.RELEASE.jar:3.3.5.RELEASE]
	at reactor.core.publisher.MonoIgnoreThen$ThenAcceptInner.onNext(MonoIgnoreThen.java:296) [reactor-core-3.3.5.RELEASE.jar:3.3.5.RELEASE]
	at reactor.core.publisher.MonoDelay$MonoDelayRunnable.run(MonoDelay.java:117) [reactor-core-3.3.5.RELEASE.jar:3.3.5.RELEASE]
	at reactor.core.scheduler.SchedulerTask.call(SchedulerTask.java:68) [reactor-core-3.3.5.RELEASE.jar:3.3.5.RELEASE]
	at reactor.core.scheduler.SchedulerTask.call(SchedulerTask.java:28) [reactor-core-3.3.5.RELEASE.jar:3.3.5.RELEASE]
	at java.util.concurrent.FutureTask.run(FutureTask.java:264) [?:?]
	at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:304) [?:?]
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) [?:?]
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) [?:?]
	at java.lang.Thread.run(Thread.java:830) [?:?]

Here's the whole log:
dshackle_error.log

@splix
Copy link
Member

splix commented Mar 27, 2021

hm, does it return to normal if you restart the dshackle?

@MysticRyuujin
Copy link
Contributor Author

Yes

@MysticRyuujin
Copy link
Contributor Author

Also, I think you DID solve the routing issue, after restarting dshackle, I did a few more start/stops and it seems to be behaving.

I'm not sure if that crash was a fluke, or maybe just a totally different/new bug.

@splix
Copy link
Member

splix commented Mar 27, 2021

That's great it's finally fixed, that was a serious issue.

The new error is something different, and it's unclear what is the reason. What I see it failed to send a request to an upstream, retried 3 times and gave up. And it had repeated multiple times. Dshackle is supposed to recover from such errors, but something stopped it from recovering.

vdoflip pushed a commit to jit-strategies/dshackle that referenced this issue Aug 1, 2023
…m-upstream

fix managed methods reusing old quorums
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants