Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Design] Sync with the full-node #35

Closed
6 of 7 tasks
andreabadesso opened this issue Mar 8, 2021 · 0 comments
Closed
6 of 7 tasks

[Design] Sync with the full-node #35

andreabadesso opened this issue Mar 8, 2021 · 0 comments
Assignees

Comments

@andreabadesso
Copy link
Collaborator

andreabadesso commented Mar 8, 2021

Proposal

There are two different mechanisms to keep the service in sync with the full-node. We need to sync the best-chain and keep the mempool synced in real-time.

Best chain sync

The idea here is to have a single method to sync from a height range and have this method started by different signals

The proposed strategy for the sync is the following:

Best block sync mechanism

  1. Check if my best_block is still valid. If it is not, we should handle reorg
  2. If it is valid, we should request the full-node's best block
  3. Navigate from block to block using the parents Navigate from block to block using the new block_at_height API (https://github.com/HathorNetwork/sec-hathor-core/pull/55/)
  4. For every block, we need to run a BFS to identify all transactions that were first verified by it, stopping when first_block != block
  5. We stop when we've reached the genesis block

This sync mechanism can be started by a few different signals:

  1. A service listening on the full node's websocket channel for new blocks
  2. Every 30 seconds, so we guarantee that even if we lose a websocket message, our database will still be in sync
  3. On admin request
  4. After a reorg has happened and the routine is re-syncing the database

I made a proof of work, it is available at:
https://gist.github.com/andreabadesso/539cc9371d10bfedd409d70845565190

Error handling

When syncing, if a transaction fails on txProcessor, we should stop the sync and save the sync state to failed.

We should be able to retry, starting from the service's best block (stored in metadata table)

Handling reorgs

When we detect that a reorg has happened, we need to re-sync our database with the updated full-node database. These are the proposed steps to do that:

  1. We know for a fact that our genesis is the same as the full-node's chain as it is fixed and immutable. Knowing this, we need to do a binary search until we find at what block our database diverged from the new best chain
  2. With the block identified, we need to delete every block after it.
  3. Every transaction that had the deleted blocks as its first_block should be marked as dirty and have its first_block set to NULL. This is done so the wallet user can have feedback on why this transaction is locked until the re-sync finishes.
  4. After this is done, we should call our best block sync mechanism
  5. After the sync is done, we should handle transactions that are still marked as dirty. This is still pending decision, but the suggestion is to leave it for the mempool sync to handle.

We also need to re-calculate address and wallet balances, there is a design for this:
HathorNetwork/hathor-wallet-service#60

The low-level design is being implemented at: HathorNetwork/hathor-wallet-service#64

The implementation for the re-org strategy is being done at HathorNetwork/hathor-wallet-service#71

Daemon

We will have a daemon constantly connected to the full node detecting reorgs and sending blocks and transactions to the wallet-service.

The service is described in more details in the readme:
https://github.com/HathorNetwork/hathor-wallet-service-sync_daemon/blob/dev/README.md

HathorNetwork/hathor-wallet-service#1

Identifying the best block

We can query the full-node using:

GET /transaction?type=block&count=1

The first transaction in the transactions array will be the best block.

Querying individual transactions

We can query individual blocks or transactions using:

GET /transaction?id=<hash>

Potential issues

While developing the proof of concept for the best block sync, we've identified a few potential bottlenecks that should be considered in the design:

RDS max number of connections

Our main bottleneck will be the lambda connection to the database. The max number of connections to the RDS instance is calculated as GREATEST({log(DBInstanceClassMemory/805306368)*45},{log(DBInstanceClassMemory/8187281408)*1000}) which for the instance we are using (t2.micro) would be 66. This editable, but I believe it's a good default.

So we need to consider using a queue so we don't overload the database

Requesting the full-node

When requesting multiple transactions in parallel, the response time increases

Necessary changes to the wallet-service

  • We need to store the hash for every block height
  • We need an API to return our current best block (and its hash)
  • We need to store the first_block on every utxo
  • We need to change the utxo schema to add the dirty flag, indicating that it had a first_block that was deleted (because of a reorg) and still didn't get a new one (it may be in the mempool). We also need to return it on the APIs so the user has feedback on the wallets
  • We need to validate if we skipped block_height and fail on txProcessor if we did
  • On txProcessor, we need to check if our best block is still valid
  • We need to add state to the service and ignore transactions if we are currently running a reorg routine

Notice

This issue replaces the old sync issue: HathorNetwork/hathor-wallet-service#8

@andreabadesso andreabadesso changed the title [Design] Sync with the full-node (2) [Design] Sync with the full-node Mar 8, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant