EVM-800 nonce too low from eth_getTransactionCount #1853

igorcrevar · 2023-08-28T11:30:15Z

Description

BlockNumber structure has these supported strings as special kind of a block numbers:

"pending" - this returns information from the block that is currently being mined, which might not yet be part of the blockchain.
"latest" - refers to the latest block in state
"earliest" - refers to the genesis block

We should allow pending blocks for the eth_getTransactionCount endpoint only. All the other parts of the current system should treat pending as the latest.

I believe the issue is that Polygon Edge checks the nonce in two different places. In [txpool.validateTx](https://github.com/0xPolygon/polygon-edge/blob/develop/txpool/txpool.go#L672-L676) (using p.store.GetNonce(stateRoot, tx.From) and [txpool.addTx](https://github.com/0xPolygon/polygon-edge/blob/develop/txpool/txpool.go#L824C11-L824C25) (using account.getNonce()). Not sure why it’s tracked in two different places.

When we do check in validate, we assume that txpool does not have any transaction for that specific address. So we are checking if tx nonce is lower than expected nonce from state.
Later on, we load(or create) object where we store all the txpool information for specific address (including nonce). Then we check additionally (if there was already some tx for that address in a pool) correctness of a nonce.
There is no bug here. I just added metrics counter which was missing.

Problem was only because getTransactionCount always returned nonce from the state (even for pending). Nonce from the state and nonce from the txpool account can differ in situations when there is already more than one tx for same address in a pool.
This fix will also fix https://polygon.atlassian.net/browse/EVM-801.

Important note!: two or more consecutive calls to eth_getTransactionCount(address, "pending") can result in same value and that’s is just because nonce updating in txpool is not immediate. So, user of this endpoint must handle this case.

The RPC and Validator agree on nonce
The RPC fails to accept a transaction with nonce too low
The Validator will accept that same transaction
After that transaction has been mined, the validator and RPC still agree on nonce
The RPC will still fail and the validator will succeed in mining

Question @jp have you tried configuring --max-enqueued or --max-slots on the rpc? When I ran your scripts, I saw a lot of maximum number of enqueued transactions reached errors and afterward the RPC would reject txs with the correct nonce. I changed the setup-rpc.sh script to look like this./polygon-edge server --data-dir ./data-rpc/data-1 --chain ./data/genesis.json --grpc-address :50000 --libp2p :30305 --jsonrpc :50002 --price-limit 0 --log-level DEBUG --max-enqueued 100000 --max-slots 10000 > rpc-005.log 2>&1 &The submit_multiple.sh script will still print out some errors but no more errors about max enqueued tx. After cast gives up trying to send the 500 tx, the rpc will work as expected and accept transactions with the correct nonce. So a few thoughts

I agree this is a bug of some kind in the node. The key features might be
It seems to only show up in the RPC / non-validating nodes
It occurs after exceeding the configured queue size
The nonce for that particular account will not be accepted by the node anymore

Mitigations
Requires restarting the node to clear the state

Configure a large queue size
Even with a large queue, I assume the bug could still occur, but it becomes less likely

I didn’t try increasing the account or pool limits. I agree it might help but, as you point out, it probably doesn’t resolve the issue, it just hides it a bit more. I’m also concerned about the memory requirements associated with increasing those limits.I believe the issue is that Polygon Edge checks the nonce in two different places. In txpool.validateTx (using p.store.GetNonce(stateRoot, tx.From) and txpool.addTx (using account.getNonce()). Not sure why it’s tracked in two different places.If you enable prometheus in the RPC node you’ll notice that nonce_too_low_tx metric is not reported, which means that our issue comes from the addTx check (which probably should update the prometheus gauge, but that’s another issue). The addTx check uses the nonce value it finds in account.getNonce(). Calls to account.setNonce happen in account.reset, account.promote and txpool.Drop.My bet is the reproduction steps I shared are causing the issue in txpool.Drop, but all three should be properly reviewed.

Changes include

Bugfix (non-breaking change that solves an issue)
Hotfix (change that solves an urgent issue, and requires immediate attention)
New feature (non-breaking change that adds functionality)
Breaking change (change that is not backwards-compatible and/or changes current functionality)

Breaking changes

It should not be

Checklist

I have assigned this PR to myself
I have added at least 1 reviewer
I have added the relevant labels
I have updated the official documentation
I have added sufficient documentation in code

Testing

I have tested this code with the official test suite
I have tested this code manually

Manual tests

Start cluster with some premine address
From some script send tx to one the nodes this way:
1. first send txs with nonces: 0, 10, 20, 30, 40
2. after that send 45 txs with same account but each time set nonce to the value retrieved by eth_getTransactionCount(address, 'pending')
3. note that two consecutive eth_getTransactionCount calls can return same value, so in that case just wait some time and continue (skip this iteration) with a loop. this is ok, because updating nonce in txpool is not immediate operation, it took some (small) amount of time
4. after all transactions are sent, we can wait for their receipts
5. note that eth_getTransactionCount(address, 'latest') will in most case return same value that is smaller than value from eth_getTransactionCount(address, 'pending'). This is totally normal and as soon as all pending txs from txpool are included in some blocks those values will be the same.

I do not think this e2e is necessary but i can write it if needed

rachit77 · 2023-08-28T12:41:43Z

LGTM

jsonrpc/eth_endpoint.go

jsonrpc/query.go

jsonrpc/codec.go

jsonrpc/query_test.go

vcastellm · 2023-08-31T14:08:28Z

Late LGTM

EVM-800 nonce too low from eth_getTransactionCount

66b3557

igorcrevar added the bug fix Functionality that fixes a bug label Aug 28, 2023

igorcrevar requested a review from a team August 28, 2023 11:30

igorcrevar self-assigned this Aug 28, 2023

test fixes

785a566

rachit77 approved these changes Aug 28, 2023

View reviewed changes

igorcrevar marked this pull request as ready for review August 28, 2023 15:01

igorcrevar requested a review from a team August 28, 2023 21:50

add two missing counters

0f4893e

Stefan-Ethernal reviewed Aug 30, 2023

View reviewed changes

jsonrpc/eth_endpoint.go Show resolved Hide resolved

jsonrpc/query.go Outdated Show resolved Hide resolved

jsonrpc/codec.go Show resolved Hide resolved

jsonrpc/codec.go Show resolved Hide resolved

jsonrpc/query_test.go Outdated Show resolved Hide resolved

Stefan-Ethernal approved these changes Aug 31, 2023

View reviewed changes

goran-ethernal approved these changes Aug 31, 2023

View reviewed changes

cr fix

1d83e09

igorcrevar merged commit 9b5bbe4 into develop Aug 31, 2023
6 checks passed

igorcrevar deleted the feature/EVM-800_nonce_too_low_from_eth_getTransactionCount branch August 31, 2023 13:27

github-actions bot locked and limited conversation to collaborators Aug 31, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

EVM-800 nonce too low from eth_getTransactionCount #1853

EVM-800 nonce too low from eth_getTransactionCount #1853

igorcrevar commented Aug 28, 2023 •

edited by Stefan-Ethernal

Loading

rachit77 commented Aug 28, 2023

vcastellm commented Aug 31, 2023

EVM-800 nonce too low from eth_getTransactionCount #1853

EVM-800 nonce too low from eth_getTransactionCount #1853

Conversation

igorcrevar commented Aug 28, 2023 • edited by Stefan-Ethernal Loading

Description

Changes include

Breaking changes

Checklist

Testing

Manual tests

rachit77 commented Aug 28, 2023

vcastellm commented Aug 31, 2023

igorcrevar commented Aug 28, 2023 •

edited by Stefan-Ethernal

Loading