Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Force block refetch if transaction is re-collated in a different block #1917

Merged
merged 4 commits into from
May 15, 2019

Conversation

goodsoft
Copy link
Contributor

@goodsoft goodsoft commented May 8, 2019

Motivation

Due to race conditions described in #1911 transactions from a consensus block might get overwritten by the same transactions from a non-consensus block.

Changes

To prevent this we force a block refetch (by marking it as non-consensus), if a transaction belonging to it gets overwritten by the same transaction from a different block.

Also, a potential race condition is fixed in replaced transactions worker: it doesn't mark as dropped/replaced the transactions that eventually ended up in a block.

Upgrading

A transient old_block_hash field is added to transactions table for atomic deriving of re-collated transactions.

Checklist for your PR

  • I added an entry to CHANGELOG.md with this PR
  • If I added new functionality, I added tests covering it.
  • If I fixed a bug, I added a regression test to prevent the bug from silently reappearing again.
  • I checked whether I should update the docs and did so if necessary

@ghost ghost assigned goodsoft May 8, 2019
@ghost ghost added the in progress label May 8, 2019
@coveralls
Copy link

coveralls commented May 8, 2019

Pull Request Test Coverage Report for Build eaae1e0e-bc98-46d1-9495-fe739e747651

  • 9 of 33 (27.27%) changed or added relevant lines in 2 files are covered.
  • 2 unchanged lines in 1 file lost coverage.
  • Overall coverage decreased (-0.4%) to 81.532%

Changes Missing Coverage Covered Lines Changed/Added Lines %
apps/explorer/lib/explorer/chain/import/runner/transactions.ex 9 10 90.0%
apps/indexer/lib/indexer/temporary/blocks_transactions_mismatch.ex 0 23 0.0%
Files with Coverage Reduction New Missed Lines %
apps/indexer/lib/indexer/fetcher/replaced_transaction.ex 2 90.91%
Totals Coverage Status
Change from base Build 3d44a9ab-5bfd-49f1-b9eb-b272c281eebb: -0.4%
Covered Lines: 4662
Relevant Lines: 5718

💛 - Coveralls

@goodsoft
Copy link
Contributor Author

goodsoft commented May 8, 2019

@pasqu4le I would like you to take over this PR and make the following additions:

  1. Add a test for block discarding similar to explorer/chain/import/runner/blocks_test.exs:264 that inserts a block with a transaction, then inserts a different block with a different number but the same transaction and checks that the first block is present in Chain.missing_block_number_ranges.
    Basically, it should emulate the scenario from Race conditions in indexer cause constraint violations #1911.
    Please add it as a separate commit before the existing ones, so that the test would fail on that commit but would succeed with the other two commits present.

  2. Create a temporary worker in indexer app that would do the following:

  • fetch all consensus blocks with transaction counts on launch
  • go through blocks and fetch them from a node using EthereumJSONRPC method
  • if amount of transactions received from a node differs from the amount in database, mark the block as consensus = false.
  • log the successful completion.

Ideally, the worker should implement BufferedTask behaviour and fetch blocks from node in batches.

@pasqu4le pasqu4le force-pushed the gs-tx-race-conditions branch 2 times, most recently from e779cbe to b49de95 Compare May 13, 2019 08:02
@goodsoft goodsoft marked this pull request as ready for review May 13, 2019 10:34
@goodsoft
Copy link
Contributor Author

@pasqu4le please add a CHANGELOG entry and a few words in indexer README about the temporary fetcher

query =
from(block in Block,
join: transactions in assoc(block, :transactions),
where: block.consensus,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we going to refetch all blocks?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I couldn't find a way to find blocks with missing transactions without checking all blocks.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Though, we should probably add a field to blocks table (refetch_needed) and set it to true for all existing blocks.
This would prevent this worker from starting from scratch on every restart and ensure it stops completely when it's done.
cc @pasqu4le.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sounds good, I'm on it

Copy link
Contributor

@zachdaniel zachdaniel May 13, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@goodsoft @pasqu4le might be easier to add a refetched column that can be null. Then we just say where refetched IS NOT TRUE and then we don't need to bulk update transactions.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't want to refetch new incoming blocks, so we will need to set true for them.
I don't want to "contaminate" the rest of the codebase with these temporary fields.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TIL Ecto.Schema.field's :default does not override Ecto.Migration,add's :default on new elements.
I ended up setting to false the migration's :default and adding a small update to execute there as well.

Copy link
Member

@vbaranov vbaranov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aside from the minor comments LGTM

end

execute(
"UPDATE blocks SET refetch_needed = TRUE;",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pasqu4le just curious, why do we need two updates here? Is UPDATE blocks SET refetch_needed = FALSE; not enough?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The second one (set to FALSE) is kinda pointless, but it is there to use execute/2 instead of execute/1 so that ecto knows this is a reversible operation and rollbacks can be performed. I tried to give it a NULL; as a no-op, because there is no need to make any changes for a rollback given that the field is going to be removed anyway, but ecto did not like that.

The reason there is a set to TRUE is so to force the re-fetch of existing blocks while having the default: false for newer ones.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"" as a second parameter seems to work in such cases.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, I didn't know that, I replaced it with "" now

@@ -92,6 +92,7 @@ After all deployed instances get all needed data, these fetchers should be depre

- `uncataloged_token_transfers`: extracts token transfers from logs, which previously weren't parsed due to unknown format
- `uncles_without_index`: adds previously unfetched `index` field for unfetched blocks in `block_second_degree_relations`
- `blocks_transactions_mismatch`: refetches each block once and revokes consensus to those whose transaction number mismatches, to correct the result of a race condition
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pasqu4le could you please extend this description a little bit more

  • describe what does not mismatches to what
  • clarify the race condition case

@pasqu4le pasqu4le added ready for review This PR is ready for reviews. and removed in progress labels May 14, 2019
pasqu4le and others added 4 commits May 15, 2019 17:08
Regression test for the race condition described in issue #1911
Don't discard a transaction if it was eventually collated in the period
between replaced transaction worker initialization and actual dropping.
Due to race conditions described in #1911 transactions from a consensus
block might get overwritten by the same transactions from a non-consensus
block.

To prevent this we force a block refetch (by marking it as non-consensus),
if a transaction belonging to it gets overwritten by the same transaction
from a different block.
…tching transactions

After the solution for #1991 it is necessary to remove consensus from blocks whose number of transactions mismatch compared to a node, so that these blocks are refetched.
@ghost ghost added the in progress label May 15, 2019
@vbaranov vbaranov merged commit 8c1187a into master May 15, 2019
@ghost ghost removed the in progress label May 15, 2019
@vbaranov vbaranov deleted the gs-tx-race-conditions branch May 15, 2019 15:21
This was referenced Jun 4, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ready for review This PR is ready for reviews.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants