Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This PR fixes two bugs in stellar-core, one moderate and one minor. Both of them are related to
TransactionQueue::removeAndReset
.The Minor Bug
removeAndReset
does not correctly maintainmSizeByAge
(see master:herder/TransactionQueue.cpp:260). When resetting the age, there may still be transactions in the queue for that source account.The Moderate Bug
removeAndReset
can leave sequence number gaps in the queue for a source account. The root cause of this is that the semantics ofremoveAndReset
are illogical. As implemented, the function finds transactions by hash and removes them while keeping the backlog. Let’s consider an example of how this can go wrong. Suppose a source account has signed 4 transactions: 2 for the first sequence number, 1 for the second sequence number, and 1 for the third. It is possible for a node to have transactions 1a, 2, and 3 in its queue while the network applies 1b and 2. In this case,removeAndReset
on that node will remove transaction 2 only, because 1a does not match 1b by hash. The result is that the queue for that transaction now contains transaction 1a and 3, which are not sequential! This already sounds like a problem, but what can happen as a consequence? The node can crash since #2419 because that PR depends on the invariant that the transactions are sequential (seefindBySeq
). The node might also proceed to drop valid transactions later becausetrimInvalid
will find transaction 1a, notice it is invalid, and drop it along with all transactions having higher sequence numbers for the same source account (eg. transaction 3 which is actually valid). In a very unfortunate turn of events, there was actually a test that confirms that illogical behavior like this exists (see master:herder/test/TransactionQueueTests.cpp:611-612).From the above analysis it should be clear that the semantics of
removeAndReset
are currently dangerous and wrong. Once a sequence number has been consumed it is unusable regardless of the transaction that consumed it. In other words, the correct behavior is to remove applied transactions by sequence number. Specifically, it should remove all transactions with a sequence number less-than-or-equal to the the highest sequence number applied for that source account regardless of whether any of those transactions are actually in the queue.The Solution
The new semantics for
removeAndReset
make it quite different fromban
. Both of these functions were implemented in an inefficient way usingextract
(they repeatedly callextract
so map look-ups and vector operations are done multiple times). We make the following changes:removeAndReset
toremoveApplied
extract
andfind
entirelydropTransactions
which is responsible only for dropping a provided list of transactions for a given source account and doing associate maintenance such as releasing feesremoveApplied
efficiently with the semantics described above usingdropTransactions
ban
efficiently usingdropTransactions
Checklist
clang-format
v5.0.0 (viamake format
or the Visual Studio extension)