remove storage of intermediate checkpoints in restore blocks

This was a perhaps too-early optimization trying to reduce the time of rolling back. The `sparseCheckpoint` function return an empty list for pretty much the entire restoration, except when reaching the last k blocks where it'll return a list of checkpoints to save, sparse for older blocks and dense near the tip. With the current parameters, blocks are kept if: - Their blockheight is not older than (tip - k) or it is 0 - And their blockheight is a multiple of 100 or, they are near within 10 blocks from the last known block. We currently do this calculation and filtering in two places: 1. In `restoreBlocks`, to pre-filter checkpoints to store in the database 2. In `prune` from the wallet DBLayer, to garbage collect old checkpoints Yet, what (1) buys us is a very little gain on standard wallet, and a huge performance cost on large wallets. So let's analyze the two cases: A/ Small Wallets - The time to create a checkpoint is very small in front of the slot length. - Restoring blocks is fast, (up to 10K blocks per seconds on empty wallets). Therefore, rolling back of 2, 3 blocks or, 100 makes pretty much no difference. Being able to store more checkpoints near the tip adds very little benefits in terms of performances especially, for the first restoration. B/ Large Wallets - The time to create a checkpoint is important in front of the slot length (we've seen up to 4s). - Restoring blocks is still quite fast (the time needed for processing blocks remains quite small in front of the time needed to read and create new checkpoints). The main problem with large wallets occur when the wallet is almost synced and reaches the 10 last blocks of the chain. By trying to store intermediate checkpoints, not only does the wallet spent 10* more time in `restoreBlocks` than normally, but it also keep the database lock for all that duration. Consider the case where the wallet takes 4s to read, and 4s to create a checkpoint, plus some additional 2s to prune them (these are actual data from large exchanges), by default, 10s is spent for creating one checkpoint. And at least 40 more to create the intermediate ones. During this time, between 1 and 3 checkpoints have been created. So it already needs to prune out the last one it spends 12s to create and needs already to create new checkpoints right away. As a consequence, a lot of other functionalities are made needlessly slower than they could be, because for the whole duration of the `restoreBlocks` function, the wallet is holding the database lock. Now, what happen if we avoid storing the "intermediate" checkpoints in restore blocks: blocks near the tip will eventually get stored, but one by one. So, when we _just_ reach the top for the first time, we'll only store the last checkpoint. But then, the next 9 checkpoints created won't be pruned out. So, the worse that can happen is that the wallet is asked to rollback right after we've reached the tip and haven't created many checkpoints yet. Still, We would have at least two checkpoints in the past that are at most 2K blocks from the tip (because we fetch blocks by batches of 1000). So it's important that the batch size remains smaller than `k` so that we can be sure that there's always one checkpoint in the database.
cardano-foundation · Sep 16, 2020 · fc23a81 · fc23a81
1 parent 77a57a3
commit fc23a81
Show file tree

Hide file tree

Showing 4 changed files with 91 additions and 27 deletions.
diff --git a/lib/core/src/Cardano/Wallet.hs b/lib/core/src/Cardano/Wallet.hs
@@ -194,6 +194,8 @@ import Cardano.Wallet.DB
     , ErrRemovePendingTx (..)
     , ErrWalletAlreadyExists (..)
     , PrimaryKey (..)
+    , SparseCheckpointsConfig (..)
+    , defaultSparseCheckpointsConfig
     , sparseCheckpoints
     )
 import Cardano.Wallet.Network
@@ -843,7 +845,22 @@ restoreBlocks ctx wid blocks nodeTip = db & \DBLayer{..} -> mapExceptT atomicall
         liftIO $ logDelegation delegation
         putDelegationCertificate (PrimaryKey wid) cert slotNo
 
-    let unstable = sparseCheckpoints k (nodeTip ^. #blockHeight)
+    let unstable = sparseCheckpoints cfg k (nodeTip ^. #blockHeight)
+            where
+                -- NOTE
+                -- The edge really is an optimization to avoid rolling back too
+                -- "far" in the past. Yet, we let the edge construct itself
+                -- organically once we reach the tip of the chain and start
+                -- processing blocks one by one.
+                --
+                -- This prevents the wallet from trying to create too many
+                -- checkpoints at once during restoration which causes massive
+                -- performance degradation on large wallets.
+                --
+                -- Rollback may still occur during this short period, but
+                -- rolling back from a few hundred blocks is relatively fast
+                -- anyway.
+                cfg = defaultSparseCheckpointsConfig { edgeSize = 0 }
 
     forM_ (NE.init cps) $ \cp' -> do
         let (Quantity h) = currentTip cp' ^. #blockHeight

diff --git a/lib/core/src/Cardano/Wallet/DB.hs b/lib/core/src/Cardano/Wallet/DB.hs
@@ -1,6 +1,7 @@
 {-# LANGUAGE DataKinds #-}
 {-# LANGUAGE ExistentialQuantification #-}
 {-# LANGUAGE GADTs #-}
+{-# LANGUAGE NamedFieldPuns #-}
 {-# LANGUAGE RankNTypes #-}
 {-# LANGUAGE RecordWildCards #-}
 
@@ -20,6 +21,8 @@ module Cardano.Wallet.DB
 
       -- * Checkpoints
     , sparseCheckpoints
+    , SparseCheckpointsConfig (..)
+    , defaultSparseCheckpointsConfig
 
       -- * Errors
     , ErrRemovePendingTx (..)
@@ -381,19 +384,20 @@ cleanDB DBLayer{..} = atomically $
 -- Therefore, we need to keep the very first checkpoint in the database, no
 -- matter what.
 sparseCheckpoints
-    :: Quantity "block" Word32
+    :: SparseCheckpointsConfig
+        -- ^ Parameters for the function.
+    -> Quantity "block" Word32
         -- ^ Epoch Stability, i.e. how far we can rollback
     -> Quantity "block" Word32
         -- ^ A given block height
     -> [Word32]
         -- ^ The list of checkpoint heights that should be kept in DB.
-sparseCheckpoints epochStability blkH =
+sparseCheckpoints cfg epochStability blkH  =
     let
-        gapsSize = 100
-        edgeSize = 10
-
+        SparseCheckpointsConfig{gapsSize,edgeSize} = cfg
         k = getQuantity epochStability
         h = getQuantity blkH
+
         minH =
             let x = if h < k then 0 else h - k
             in gapsSize * (x `div` gapsSize)
@@ -405,3 +409,16 @@ sparseCheckpoints epochStability blkH =
             else [h-edgeSize,h-edgeSize+1..h]
     in
         L.sort $ L.nub $ initial : (longTerm ++ shortTerm)
+
+-- | Captures the configuration for the `sparseCheckpoints` function.
+data SparseCheckpointsConfig = SparseCheckpointsConfig
+    { gapsSize :: Word32
+    , edgeSize :: Word32
+    } deriving Show
+
+-- | A sensible default to use in production.
+defaultSparseCheckpointsConfig :: SparseCheckpointsConfig
+defaultSparseCheckpointsConfig = SparseCheckpointsConfig
+    { gapsSize = 1000
+    , edgeSize = 10
+    }
diff --git a/lib/core/src/Cardano/Wallet/DB/Sqlite.hs b/lib/core/src/Cardano/Wallet/DB/Sqlite.hs
@@ -61,6 +61,7 @@ import Cardano.Wallet.DB
     , ErrRemovePendingTx (..)
     , ErrWalletAlreadyExists (..)
     , PrimaryKey (..)
+    , defaultSparseCheckpointsConfig
     , sparseCheckpoints
     )
 import Cardano.Wallet.DB.Sqlite.TH
@@ -1230,7 +1231,7 @@ pruneCheckpoints
 pruneCheckpoints wid cp = do
     let height = Quantity $ fromIntegral $ checkpointBlockHeight cp
     let epochStability = Quantity $ checkpointEpochStability cp
-    let cps = sparseCheckpoints epochStability height
+    let cps = sparseCheckpoints defaultSparseCheckpointsConfig epochStability height
     deleteCheckpoints wid [ CheckpointBlockHeight /<-. cps ]
 
 -- | Delete TxMeta values for a wallet.

diff --git a/lib/core/test/unit/Cardano/Wallet/DB/Properties.hs b/lib/core/test/unit/Cardano/Wallet/DB/Properties.hs
@@ -1,11 +1,9 @@
 {-# LANGUAGE AllowAmbiguousTypes #-}
 {-# LANGUAGE ConstraintKinds #-}
 {-# LANGUAGE DataKinds #-}
-{-# LANGUAGE DerivingStrategies #-}
 {-# LANGUAGE DuplicateRecordFields #-}
 {-# LANGUAGE FlexibleContexts #-}
 {-# LANGUAGE FlexibleInstances #-}
-{-# LANGUAGE GeneralizedNewtypeDeriving #-}
 {-# LANGUAGE LambdaCase #-}
 {-# LANGUAGE MultiParamTypeClasses #-}
 {-# LANGUAGE OverloadedLabels #-}
@@ -29,7 +27,9 @@ import Cardano.Wallet.DB
     , ErrNoSuchWallet (..)
     , ErrWalletAlreadyExists (..)
     , PrimaryKey (..)
+    , SparseCheckpointsConfig (..)
     , cleanDB
+    , defaultSparseCheckpointsConfig
     , sparseCheckpoints
     )
 import Cardano.Wallet.DB.Arbitrary
@@ -119,6 +119,7 @@ import Test.QuickCheck
     , Gen
     , Property
     , checkCoverage
+    , choose
     , counterexample
     , cover
     , elements
@@ -289,17 +290,26 @@ properties = do
 
     describe "sparseCheckpoints" $ do
         it "k=2160, h=42" $ \_ -> do
+            let cfg = SparseCheckpointsConfig
+                    { gapsSize = 100
+                    , edgeSize = 10
+                    }
             let k = Quantity 2160
             let h = Quantity 42
+
             -- First unstable block: 0
-            sparseCheckpoints k h `shouldBe`
+            sparseCheckpoints cfg k h `shouldBe`
                 [0,32,33,34,35,36,37,38,39,40,41,42]
 
         it "k=2160, h=2414" $ \_ -> do
+            let cfg = SparseCheckpointsConfig
+                    { gapsSize = 100
+                    , edgeSize = 10
+                    }
             let k = Quantity 2160
             let h = Quantity 2714
             -- First unstable block: 554
-            sparseCheckpoints k h `shouldBe`
+            sparseCheckpoints cfg k h `shouldBe`
                 [ 0    , 500  , 600  , 700  , 800  , 900
                 , 1000 , 1100 , 1200 , 1300 , 1400 , 1500
                 , 1600 , 1700 , 1800 , 1900 , 2000 , 2100
@@ -308,10 +318,26 @@ properties = do
                 , 2710 , 2711 , 2712 , 2713 , 2714
                 ]
 
+        it "k=2160, h=2414" $ \_ -> do
+            let cfg = SparseCheckpointsConfig
+                    { gapsSize = 100
+                    , edgeSize = 0
+                    }
+            let k = Quantity 2160
+            let h = Quantity 2714
+            -- First unstable block: 554
+            sparseCheckpoints cfg k h `shouldBe`
+                [ 0    , 500  , 600  , 700  , 800  , 900
+                , 1000 , 1100 , 1200 , 1300 , 1400 , 1500
+                , 1600 , 1700 , 1800 , 1900 , 2000 , 2100
+                , 2200 , 2300 , 2400 , 2500 , 2600 , 2700
+                , 2714
+                ]
+
         it "The tip is always a checkpoint" $ \_ ->
             property prop_sparseCheckpointTipAlwaysThere
 
-        it "There's at least (min h 10) checkpoints" $ \_ ->
+        it "There's at least (min h edgeSize) checkpoints" $ \_ ->
             property prop_sparseCheckpointMinimum
 
         it "There's no checkpoint older than k (+/- 100)" $ \_ ->
@@ -862,51 +888,51 @@ prop_rollbackTxHistory db@DBLayer{..} (InitialCheckpoint cp0) (GenTxHistory txs0
 prop_sparseCheckpointTipAlwaysThere
     :: GenSparseCheckpointsArgs
     -> Property
-prop_sparseCheckpointTipAlwaysThere (GenSparseCheckpointsArgs (k, h)) = prop
+prop_sparseCheckpointTipAlwaysThere (GenSparseCheckpointsArgs cfg (k, h)) = prop
     & counterexample ("Checkpoints: " <> show cps)
     & counterexample ("h=" <> show h)
     & counterexample ("k=" <> show k)
   where
-    cps = sparseCheckpoints (Quantity k) (Quantity h)
+    cps = sparseCheckpoints cfg (Quantity k) (Quantity h)
 
     prop :: Property
     prop = property $ fromIntegral h `elem` cps
 
--- | Check that sparseCheckpoints always return at least 10 checkpoints (or
--- exactly the current height if h < 10).
+-- | Check that sparseCheckpoints always return at least edgeSize checkpoints (or
+-- exactly the current height if h < edgeSize).
 prop_sparseCheckpointMinimum
     :: GenSparseCheckpointsArgs
     -> Property
-prop_sparseCheckpointMinimum (GenSparseCheckpointsArgs (k, h)) = prop
+prop_sparseCheckpointMinimum (GenSparseCheckpointsArgs cfg (k, h)) = prop
     & counterexample ("Checkpoints: " <> show cps)
     & counterexample ("h=" <> show h)
     & counterexample ("k=" <> show k)
   where
-    cps = sparseCheckpoints (Quantity k) (Quantity h)
+    cps = sparseCheckpoints cfg (Quantity k) (Quantity h)
 
     prop :: Property
-    prop = property $ fromIntegral (length cps) >= min 10 h
+    prop = property $ fromIntegral (length cps) >= min (edgeSize cfg) h
 
 
 -- | Check that sparseCheckpoints always return checkpoints that can cover
 -- rollbacks up to `k` in the past. This means that, if the current block height
 -- is #3000, and `k=2160`, we should be able to rollback to #840. Since we make
--- checkpoints every 100 blocks, it means that block #800 should be in the list.
+-- checkpoints every gapsSize blocks, it means that block #800 should be in the list.
 --
 -- Note: The initial checkpoint at #0 will always be present.
 prop_sparseCheckpointNoOlderThanK
     :: GenSparseCheckpointsArgs
     -> Property
-prop_sparseCheckpointNoOlderThanK (GenSparseCheckpointsArgs (k, h)) = prop
+prop_sparseCheckpointNoOlderThanK (GenSparseCheckpointsArgs cfg (k, h)) = prop
     & counterexample ("Checkpoints: " <> show ((\cp -> (age cp, cp)) <$> cps))
     & counterexample ("h=" <> show h)
     & counterexample ("k=" <> show k)
   where
-    cps = sparseCheckpoints (Quantity k) (Quantity h)
+    cps = sparseCheckpoints cfg (Quantity k) (Quantity h)
 
     prop :: Property
     prop = property $ flip all cps $ \cp ->
-        cp == 0 || (age cp - 100 <= int k)
+        cp == 0 || (age cp - (int $ gapsSize cfg) <= int k)
 
     age :: Word32 -> Int
     age cp = int h - int cp
@@ -917,12 +943,15 @@ int = fromIntegral
 pp :: ProtocolParameters
 pp = dummyProtocolParameters
 
-newtype GenSparseCheckpointsArgs
-    = GenSparseCheckpointsArgs (Word32, Word32)
-    deriving newtype Show
+data GenSparseCheckpointsArgs
+    = GenSparseCheckpointsArgs SparseCheckpointsConfig (Word32, Word32)
+    deriving Show
 
 instance Arbitrary GenSparseCheckpointsArgs where
     arbitrary = do
         k <- (\x -> 10 + (x `mod` 1000)) <$> arbitrary
         h <- (`mod` 100000) <$> arbitrary
-        pure $ GenSparseCheckpointsArgs ( k, h )
+        cfg <- SparseCheckpointsConfig
+            <$> choose (1, k-1)
+            <*> choose (0, k)
+        pure $ GenSparseCheckpointsArgs cfg ( k, h )