CAD-744 log a progress meter for block replay events #712

CodiePP · 2020-03-25T12:53:27Z

Issue

log progress report on block replay events (percentage of replayed blocks)
This PR does not result in breaking changes to upstream dependencies.

Checklist

This PR contains all the work required to resolve the linked issue.
The work contained has sufficient documentation to describe what it does and how to do it.
The work has sufficient tests and/or testing.
I have committed clear and descriptive commits. Be considerate as somebody else will have to read these.
I have added the appropriate labels to this PR.

CodiePP · 2020-03-25T15:08:04Z

output from the log:

mainnet-20200325150539.log:[nixos19:cardano.node.ChainDB:Info:24] [2020-03-25 15:07:00.91 UTC] block replay progress (%) = 91.5
mainnet-20200325150539.log:[nixos19:cardano.node.ChainDB:Info:24] [2020-03-25 15:07:01.07 UTC] block replay progress (%) = 92.5
mainnet-20200325150539.log:[nixos19:cardano.node.ChainDB:Info:24] [2020-03-25 15:07:01.23 UTC] block replay progress (%) = 93.5
mainnet-20200325150539.log:[nixos19:cardano.node.ChainDB:Info:24] [2020-03-25 15:07:01.40 UTC] block replay progress (%) = 94.5
mainnet-20200325150539.log:[nixos19:cardano.node.ChainDB:Info:24] [2020-03-25 15:07:01.55 UTC] block replay progress (%) = 95.5
mainnet-20200325150539.log:[nixos19:cardano.node.ChainDB:Info:24] [2020-03-25 15:07:01.72 UTC] block replay progress (%) = 96.5
mainnet-20200325150539.log:[nixos19:cardano.node.ChainDB:Info:24] [2020-03-25 15:07:01.88 UTC] block replay progress (%) = 97.5
mainnet-20200325150539.log:[nixos19:cardano.node.ChainDB:Info:24] [2020-03-25 15:07:02.04 UTC] block replay progress (%) = 98.5
mainnet-20200325150539.log:[nixos19:cardano.node.ChainDB:Info:24] [2020-03-25 15:07:02.19 UTC] block replay progress (%) = 99.5

mrBliss

Thanks for working on this!

This is looking good, except for one thing: the starting point matters, otherwise the progress can be misleading, see my other comment.

mrBliss · 2020-03-25T15:24:47Z

cardano-node/src/Cardano/Tracing/Tracers.hs

+  conteliding _tform _tverb tr ev@(WithSeverity _ (WithTip _ (ChainDB.TraceLedgerReplayEvent (LedgerDB.ReplayedBlock pt replayTo)))) (_old, count) = do
+      let slotno = toInteger $ unSlotNo (realPointSlot pt)
+          endslot = toInteger $ withOrigin 0 unSlotNo (pointSlot replayTo)
+          progress :: Double = (fromInteger slotno * 100.0) / fromInteger (max slotno endslot)


When the replay starts from scratch, the progress will go from 0-100 %, good. But if the replay starts from a later slot, then it will go from X-100 %, which could be a bit misleading? On each startup, we'll do a small replay of the last k blocks, so this is actually very common.

For example: we start replaying from slot 3,000,000 up to slot 3,002,160, the progress will be: 99.9% -> 100%. The more slots the chain gets, the less meaningful the progress will be.

The solution is to figure out the starting point of the replay. You can use the first ReplayedBlock message for this. Alternatively, before replay starts, either ReplayFromGenesis or ReplayFromSnapshot is traced, so you could also use those as the starting point.

what do you want to appear in the logs? shall we print the range of the replayed blocks for the first message?

No, I mean use the first message (or ReplayFrom*) to compute the correct progress for the following messages.

For example, when replaying from slot 3,000,000 up to slot 3,002,160, the first message (or ReplayFromSnapshot) will mention slot 3,000,000. So you can compute the progress as (something like): (slotNo - startSlotNo) / endSlotNo. This means you'll have to store some state, i.e., the start slot.

CodiePP · 2020-03-25T17:14:08Z

reports the percentage of the replayed block range:

[nixos19:cardano.node.ChainDB:Info:24] [2020-03-25 17:06:59.65 UTC] {"event":{"kind":"TraceLedgerReplayEvent.ReplayedBlock","slot":293929,"tip":300551},"kind":"TraceEvent","tip":
"genesis (origin)"}
[nixos19:cardano.node.ChainDB:Info:24] [2020-03-25 17:06:59.79 UTC] block replay progress (%) = 15.1
[nixos19:cardano.node.ChainDB:Info:24] [2020-03-25 17:06:59.92 UTC] block replay progress (%) = 30.2
[nixos19:cardano.node.ChainDB:Info:24] [2020-03-25 17:07:00.05 UTC] block replay progress (%) = 45.3
[nixos19:cardano.node.ChainDB:Info:24] [2020-03-25 17:07:00.19 UTC] block replay progress (%) = 60.4
[nixos19:cardano.node.ChainDB:Info:24] [2020-03-25 17:07:00.33 UTC] block replay progress (%) = 75.5
[nixos19:cardano.node.ChainDB:Info:24] [2020-03-25 17:07:00.46 UTC] block replay progress (%) = 90.6
[nixos19:cardano.node.ChainDB:Info:24] [2020-03-25 17:07:00.54 UTC] before next, messages elided = 293929
[nixos19:cardano.node.ChainDB:Info:24] [2020-03-25 17:07:00.54 UTC] {"event":{"kind":"TraceLedgerReplayEvent.ReplayedBlock","slot":300551,"tip":300551},"kind":"TraceEvent","tip":"genesis (origin)"}

(the message about "elided" cannot be overwritten; needs some upstream changes, but then we could print progress = 100%)

mrBliss

Great!

Any reason for "(%) = 15.1" instead of simply "15.1%" ?

mrBliss · 2020-03-25T17:17:25Z

cardano-node/src/Cardano/Tracing/Tracers.hs

@@ -287,7 +297,7 @@ mkTracers traceOptions tracer = do
    teeTraceChainTip tverb elided tr = Tracer $ \ev -> do
        traceWith (teeTraceChainTip' tr) ev
        traceWith (teeTraceChainTipElide StructuredLogging tverb elided tr) ev
-        traceWith (teeTraceChainTipElide TextualRepresentation tverb elided (appendName "text" tr)) ev
+        --traceWith (teeTraceChainTipElide TextualRepresentation tverb elided (appendName "text" tr)) ev


Comment should be removed

Any reason for "(%) = 15.1" instead of simply "15.1%" ?

it is just a "Double" value that is traced and rendered in the logs.

Comment should be removed

this duplicates the traced values in a different context name with textual representation. I am not sure if someone depends on this..

mrBliss · 2020-03-25T17:17:53Z

configuration/configuration-mainnet.yaml

@@ -1,5 +1,5 @@
 # global filter; messages must have at least this severity to pass:
-minSeverity: Notice
+minSeverity: Info


Is it intentional to change this "main" configuration?

the replay events are traced with severity "Info". wouldn't pass the filter.

Sure. It's just that we're changing the main configuration. This is not the config used by QA and so? If it is, then we shouldn't change this on a whim.

mrBliss · 2020-03-25T17:18:28Z

cardano-config/src/Cardano/Tracing/ToObjectOrphans.hs

@@ -423,6 +423,7 @@ instance HasSeverityAnnotation (WithMuxBearer peer MuxTrace) where
    MuxTraceHandshakeServerError {} -> Error
    MuxTraceRecvDeltaQObservation {} -> Debug
    MuxTraceRecvDeltaQSample {} -> Info
+    MuxTraceSDUReadTimeoutException {} -> Notice


Was this one missing? Shouldn't that have been a warning caught with -Werror?

yes, that's why I added it.

So that means -Werror is not enabled on CI 😳

mrBliss · 2020-03-26T07:54:29Z

BTW, I would just squash the commits into one before merging.

A commit like "CAD-744 restored configuration" doesn't make sense to keep in the final history.

CodiePP · 2020-03-26T07:59:16Z

BTW, I would just squash the commits into one.

yes, for the final submission. I submitted the single commits so changes are traceable.

Signed-off-by: Alexander Diemand <[email protected]>

CodiePP · 2020-03-26T08:05:46Z

bors r+

iohk-bors · 2020-03-26T08:37:41Z

Build succeeded

dcoutts · 2020-03-27T09:48:47Z

Unfortunately it looks like the progress reporting numbers are reset every time there is an intervening message. So the replay progress number keeps being reset to 0.

CodiePP added the byron Required for a Byron mainnet: replace the old core nodes with cardano-node. label Mar 25, 2020

CodiePP added this to the S9 2020-03-26 milestone Mar 25, 2020

CodiePP requested review from mrBliss and Jimbo4350 March 25, 2020 12:53

CodiePP linked an issue Mar 25, 2020 that may be closed by this pull request

CAD-744 don't elide TraceLedgerReplayEvent.ReplayedBlock #562

Closed

CodiePP force-pushed the cad-744-log-progress-meter branch from e8a55ba to 597229a Compare March 25, 2020 15:10

CodiePP marked this pull request as ready for review March 25, 2020 15:10

mrBliss suggested changes Mar 25, 2020

View reviewed changes

CodiePP requested a review from mrBliss March 25, 2020 17:14

mrBliss approved these changes Mar 25, 2020

View reviewed changes

mrBliss mentioned this pull request Mar 26, 2020

Add severity for MuxTraceSDUReadTimeoutException #717

Merged

5 tasks

CodiePP force-pushed the cad-744-log-progress-meter branch from 0706692 to 1d1bc74 Compare March 26, 2020 07:53

CAD-744 log a progress meter for block replay events

4e1ad06

Signed-off-by: Alexander Diemand <[email protected]>

CodiePP force-pushed the cad-744-log-progress-meter branch from 1d1bc74 to 4e1ad06 Compare March 26, 2020 08:03

iohk-bors bot merged commit ced2550 into master Mar 26, 2020

iohk-bors bot deleted the cad-744-log-progress-meter branch March 26, 2020 08:37

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CAD-744 log a progress meter for block replay events #712

CAD-744 log a progress meter for block replay events #712

CodiePP commented Mar 25, 2020 •

edited

Loading

CodiePP commented Mar 25, 2020

mrBliss left a comment

mrBliss Mar 25, 2020

CodiePP Mar 25, 2020

mrBliss Mar 25, 2020

CodiePP commented Mar 25, 2020 •

edited

Loading

mrBliss left a comment

mrBliss Mar 25, 2020

CodiePP Mar 25, 2020

CodiePP Mar 25, 2020

mrBliss Mar 25, 2020

CodiePP Mar 25, 2020

mrBliss Mar 25, 2020

mrBliss Mar 25, 2020

CodiePP Mar 25, 2020

mrBliss Mar 25, 2020

mrBliss commented Mar 26, 2020 •

edited

Loading

CodiePP commented Mar 26, 2020

CodiePP commented Mar 26, 2020

iohk-bors bot commented Mar 26, 2020

dcoutts commented Mar 27, 2020

CAD-744 log a progress meter for block replay events #712

CAD-744 log a progress meter for block replay events #712

Conversation

CodiePP commented Mar 25, 2020 • edited Loading

Issue

Checklist

CodiePP commented Mar 25, 2020

mrBliss left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

CodiePP commented Mar 25, 2020 • edited Loading

mrBliss left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mrBliss commented Mar 26, 2020 • edited Loading

CodiePP commented Mar 26, 2020

CodiePP commented Mar 26, 2020

iohk-bors bot commented Mar 26, 2020

Build succeeded

dcoutts commented Mar 27, 2020

CodiePP commented Mar 25, 2020 •

edited

Loading

CodiePP commented Mar 25, 2020 •

edited

Loading

mrBliss commented Mar 26, 2020 •

edited

Loading