[Design] Sync with the full-node #35

andreabadesso · 2021-03-08T16:11:25Z

Proposal

There are two different mechanisms to keep the service in sync with the full-node. We need to sync the best-chain and keep the mempool synced in real-time.

Best chain sync

The idea here is to have a single method to sync from a height range and have this method started by different signals

The proposed strategy for the sync is the following:

Best block sync mechanism

Check if my best_block is still valid. If it is not, we should handle reorg
If it is valid, we should request the full-node's best block
~~Navigate from block to block using the parents~~ Navigate from block to block using the new block_at_height API (https://github.com/HathorNetwork/sec-hathor-core/pull/55/)
For every block, we need to run a BFS to identify all transactions that were first verified by it, stopping when first_block != block
We stop when we've reached the genesis block

This sync mechanism can be started by a few different signals:

A service listening on the full node's websocket channel for new blocks
Every 30 seconds, so we guarantee that even if we lose a websocket message, our database will still be in sync
On admin request
After a reorg has happened and the routine is re-syncing the database

I made a proof of work, it is available at:
https://gist.github.com/andreabadesso/539cc9371d10bfedd409d70845565190

Error handling

When syncing, if a transaction fails on txProcessor, we should stop the sync and save the sync state to failed.

We should be able to retry, starting from the service's best block (stored in metadata table)

Handling reorgs

When we detect that a reorg has happened, we need to re-sync our database with the updated full-node database. These are the proposed steps to do that:

We know for a fact that our genesis is the same as the full-node's chain as it is fixed and immutable. Knowing this, we need to do a binary search until we find at what block our database diverged from the new best chain
With the block identified, we need to delete every block after it.
Every transaction that had the deleted blocks as its first_block should be marked as dirty and have its first_block set to NULL. This is done so the wallet user can have feedback on why this transaction is locked until the re-sync finishes.
After this is done, we should call our best block sync mechanism
After the sync is done, we should handle transactions that are still marked as dirty. This is still pending decision, but the suggestion is to leave it for the mempool sync to handle.

We also need to re-calculate address and wallet balances, there is a design for this:
HathorNetwork/hathor-wallet-service#60

The low-level design is being implemented at: HathorNetwork/hathor-wallet-service#64

The implementation for the re-org strategy is being done at HathorNetwork/hathor-wallet-service#71

Daemon

We will have a daemon constantly connected to the full node detecting reorgs and sending blocks and transactions to the wallet-service.

The service is described in more details in the readme:
https://github.com/HathorNetwork/hathor-wallet-service-sync_daemon/blob/dev/README.md

HathorNetwork/hathor-wallet-service#1

Identifying the best block

We can query the full-node using:

GET /transaction?type=block&count=1

The first transaction in the transactions array will be the best block.

Querying individual transactions

We can query individual blocks or transactions using:

GET /transaction?id=<hash>

Potential issues

While developing the proof of concept for the best block sync, we've identified a few potential bottlenecks that should be considered in the design:

RDS max number of connections

Our main bottleneck will be the lambda connection to the database. The max number of connections to the RDS instance is calculated as GREATEST({log(DBInstanceClassMemory/805306368)*45},{log(DBInstanceClassMemory/8187281408)*1000}) which for the instance we are using (t2.micro) would be 66. This editable, but I believe it's a good default.

So we need to consider using a queue so we don't overload the database

Requesting the full-node

When requesting multiple transactions in parallel, the response time increases

Necessary changes to the wallet-service

We need to store the hash for every block height
We need an API to return our current best block (and its hash)
We need to store the first_block on every utxo
We need to change the utxo schema to add the dirty flag, indicating that it had a first_block that was deleted (because of a reorg) and still didn't get a new one (it may be in the mempool). We also need to return it on the APIs so the user has feedback on the wallets
We need to validate if we skipped block_height and fail on txProcessor if we did
On txProcessor, we need to check if our best block is still valid
We need to add state to the service and ignore transactions if we are currently running a reorg routine

Notice

This issue replaces the old sync issue: HathorNetwork/hathor-wallet-service#8

The text was updated successfully, but these errors were encountered:

andreabadesso mentioned this issue Mar 8, 2021

[Design - old] Sync with the full-node #8

Closed

andreabadesso changed the title ~~[Design] Sync with the full-node (2)~~ [Design] Sync with the full-node Mar 8, 2021

msbrogli assigned andreabadesso Mar 8, 2021

andreabadesso mentioned this issue Apr 2, 2021

[Epic] Wallet Service v1 #5

Open

45 tasks

andreabadesso closed this as completed May 24, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Design] Sync with the full-node #35

[Design] Sync with the full-node #35

andreabadesso commented Mar 8, 2021 •

edited

Loading

[Design] Sync with the full-node #35

[Design] Sync with the full-node #35

Comments

andreabadesso commented Mar 8, 2021 • edited Loading

Proposal

Best chain sync

Best block sync mechanism

Error handling

Handling reorgs

Daemon

Identifying the best block

Querying individual transactions

Potential issues

RDS max number of connections

Requesting the full-node

Necessary changes to the wallet-service

Notice

andreabadesso commented Mar 8, 2021 •

edited

Loading