Fix remove and reset #2484

jonjove · 2020-03-30T08:24:46Z

This PR fixes two bugs in stellar-core, one moderate and one minor. Both of them are related to TransactionQueue::removeAndReset.

The Minor Bug

removeAndReset does not correctly maintain mSizeByAge (see master:herder/TransactionQueue.cpp:260). When resetting the age, there may still be transactions in the queue for that source account.

The Moderate Bug

removeAndReset can leave sequence number gaps in the queue for a source account. The root cause of this is that the semantics of removeAndReset are illogical. As implemented, the function finds transactions by hash and removes them while keeping the backlog. Let’s consider an example of how this can go wrong. Suppose a source account has signed 4 transactions: 2 for the first sequence number, 1 for the second sequence number, and 1 for the third. It is possible for a node to have transactions 1a, 2, and 3 in its queue while the network applies 1b and 2. In this case, removeAndReset on that node will remove transaction 2 only, because 1a does not match 1b by hash. The result is that the queue for that transaction now contains transaction 1a and 3, which are not sequential! This already sounds like a problem, but what can happen as a consequence? The node can crash since #2419 because that PR depends on the invariant that the transactions are sequential (see findBySeq). The node might also proceed to drop valid transactions later because trimInvalid will find transaction 1a, notice it is invalid, and drop it along with all transactions having higher sequence numbers for the same source account (eg. transaction 3 which is actually valid). In a very unfortunate turn of events, there was actually a test that confirms that illogical behavior like this exists (see master:herder/test/TransactionQueueTests.cpp:611-612).

From the above analysis it should be clear that the semantics of removeAndReset are currently dangerous and wrong. Once a sequence number has been consumed it is unusable regardless of the transaction that consumed it. In other words, the correct behavior is to remove applied transactions by sequence number. Specifically, it should remove all transactions with a sequence number less-than-or-equal to the the highest sequence number applied for that source account regardless of whether any of those transactions are actually in the queue.

The Solution

The new semantics for removeAndReset make it quite different from ban. Both of these functions were implemented in an inefficient way using extract (they repeatedly call extract so map look-ups and vector operations are done multiple times). We make the following changes:

Rename removeAndReset to removeApplied
Remove extract and find entirely
Implement a new function dropTransactions which is responsible only for dropping a provided list of transactions for a given source account and doing associate maintenance such as releasing fees
Implement removeApplied efficiently with the semantics described above using dropTransactions
Implement ban efficiently using dropTransactions
Add comments to clarify how these functions work

Checklist

Reviewed the contributing document
Rebased on top of master (no merge commits)
Ran clang-format v5.0.0 (via make format or the Visual Studio extension)
Compiles
Ran all tests
If change impacts performance, include supporting evidence per the performance document

MonsieurNicolas · 2020-03-30T21:29:54Z

r+ f9a1063

jonjove added 3 commits March 30, 2020 09:15

Improve TransactionQueue removeAndReset and ban

198d647

Fix TransactionQueue test that checks wrong semantics

f6fc4a3

Add a TransactionQueue test for duplicate transaction that is not last

f9a1063

latobarita merged commit 5bde3cb into stellar:master Mar 30, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix remove and reset #2484

Fix remove and reset #2484

jonjove commented Mar 30, 2020

MonsieurNicolas commented Mar 30, 2020

Fix remove and reset #2484

Fix remove and reset #2484

Conversation

jonjove commented Mar 30, 2020

The Minor Bug

The Moderate Bug

The Solution

Checklist

MonsieurNicolas commented Mar 30, 2020