Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CAD-744 log a progress meter for block replay events #712

Merged
merged 1 commit into from
Mar 26, 2020

Conversation

CodiePP
Copy link
Contributor

@CodiePP CodiePP commented Mar 25, 2020

Issue

  • log progress report on block replay events (percentage of replayed blocks)

  • This PR does not result in breaking changes to upstream dependencies.

Checklist

  • This PR contains all the work required to resolve the linked issue.

  • The work contained has sufficient documentation to describe what it does and how to do it.

  • The work has sufficient tests and/or testing.

  • I have committed clear and descriptive commits. Be considerate as somebody else will have to read these.

  • I have added the appropriate labels to this PR.

@CodiePP CodiePP added the byron Required for a Byron mainnet: replace the old core nodes with cardano-node. label Mar 25, 2020
@CodiePP CodiePP added this to the S9 2020-03-26 milestone Mar 25, 2020
@CodiePP CodiePP requested review from mrBliss and Jimbo4350 March 25, 2020 12:53
@CodiePP CodiePP linked an issue Mar 25, 2020 that may be closed by this pull request
@CodiePP
Copy link
Contributor Author

CodiePP commented Mar 25, 2020

output from the log:

mainnet-20200325150539.log:[nixos19:cardano.node.ChainDB:Info:24] [2020-03-25 15:07:00.91 UTC] block replay progress (%) = 91.5
mainnet-20200325150539.log:[nixos19:cardano.node.ChainDB:Info:24] [2020-03-25 15:07:01.07 UTC] block replay progress (%) = 92.5
mainnet-20200325150539.log:[nixos19:cardano.node.ChainDB:Info:24] [2020-03-25 15:07:01.23 UTC] block replay progress (%) = 93.5
mainnet-20200325150539.log:[nixos19:cardano.node.ChainDB:Info:24] [2020-03-25 15:07:01.40 UTC] block replay progress (%) = 94.5
mainnet-20200325150539.log:[nixos19:cardano.node.ChainDB:Info:24] [2020-03-25 15:07:01.55 UTC] block replay progress (%) = 95.5
mainnet-20200325150539.log:[nixos19:cardano.node.ChainDB:Info:24] [2020-03-25 15:07:01.72 UTC] block replay progress (%) = 96.5
mainnet-20200325150539.log:[nixos19:cardano.node.ChainDB:Info:24] [2020-03-25 15:07:01.88 UTC] block replay progress (%) = 97.5
mainnet-20200325150539.log:[nixos19:cardano.node.ChainDB:Info:24] [2020-03-25 15:07:02.04 UTC] block replay progress (%) = 98.5
mainnet-20200325150539.log:[nixos19:cardano.node.ChainDB:Info:24] [2020-03-25 15:07:02.19 UTC] block replay progress (%) = 99.5

@CodiePP CodiePP force-pushed the cad-744-log-progress-meter branch from e8a55ba to 597229a Compare March 25, 2020 15:10
@CodiePP CodiePP marked this pull request as ready for review March 25, 2020 15:10
Copy link
Contributor

@mrBliss mrBliss left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for working on this!

This is looking good, except for one thing: the starting point matters, otherwise the progress can be misleading, see my other comment.

conteliding _tform _tverb tr ev@(WithSeverity _ (WithTip _ (ChainDB.TraceLedgerReplayEvent (LedgerDB.ReplayedBlock pt replayTo)))) (_old, count) = do
let slotno = toInteger $ unSlotNo (realPointSlot pt)
endslot = toInteger $ withOrigin 0 unSlotNo (pointSlot replayTo)
progress :: Double = (fromInteger slotno * 100.0) / fromInteger (max slotno endslot)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When the replay starts from scratch, the progress will go from 0-100 %, good. But if the replay starts from a later slot, then it will go from X-100 %, which could be a bit misleading? On each startup, we'll do a small replay of the last k blocks, so this is actually very common.

For example: we start replaying from slot 3,000,000 up to slot 3,002,160, the progress will be: 99.9% -> 100%. The more slots the chain gets, the less meaningful the progress will be.

The solution is to figure out the starting point of the replay. You can use the first ReplayedBlock message for this. Alternatively, before replay starts, either ReplayFromGenesis or ReplayFromSnapshot is traced, so you could also use those as the starting point.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what do you want to appear in the logs? shall we print the range of the replayed blocks for the first message?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, I mean use the first message (or ReplayFrom*) to compute the correct progress for the following messages.

For example, when replaying from slot 3,000,000 up to slot 3,002,160, the first message (or ReplayFromSnapshot) will mention slot 3,000,000. So you can compute the progress as (something like): (slotNo - startSlotNo) / endSlotNo. This means you'll have to store some state, i.e., the start slot.

@CodiePP
Copy link
Contributor Author

CodiePP commented Mar 25, 2020

reports the percentage of the replayed block range:

[nixos19:cardano.node.ChainDB:Info:24] [2020-03-25 17:06:59.65 UTC] {"event":{"kind":"TraceLedgerReplayEvent.ReplayedBlock","slot":293929,"tip":300551},"kind":"TraceEvent","tip":
"genesis (origin)"}
[nixos19:cardano.node.ChainDB:Info:24] [2020-03-25 17:06:59.79 UTC] block replay progress (%) = 15.1
[nixos19:cardano.node.ChainDB:Info:24] [2020-03-25 17:06:59.92 UTC] block replay progress (%) = 30.2
[nixos19:cardano.node.ChainDB:Info:24] [2020-03-25 17:07:00.05 UTC] block replay progress (%) = 45.3
[nixos19:cardano.node.ChainDB:Info:24] [2020-03-25 17:07:00.19 UTC] block replay progress (%) = 60.4
[nixos19:cardano.node.ChainDB:Info:24] [2020-03-25 17:07:00.33 UTC] block replay progress (%) = 75.5
[nixos19:cardano.node.ChainDB:Info:24] [2020-03-25 17:07:00.46 UTC] block replay progress (%) = 90.6
[nixos19:cardano.node.ChainDB:Info:24] [2020-03-25 17:07:00.54 UTC] before next, messages elided = 293929
[nixos19:cardano.node.ChainDB:Info:24] [2020-03-25 17:07:00.54 UTC] {"event":{"kind":"TraceLedgerReplayEvent.ReplayedBlock","slot":300551,"tip":300551},"kind":"TraceEvent","tip":"genesis (origin)"}

(the message about "elided" cannot be overwritten; needs some upstream changes, but then we could print progress = 100%)

@CodiePP CodiePP requested a review from mrBliss March 25, 2020 17:14
Copy link
Contributor

@mrBliss mrBliss left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great!

Any reason for "(%) = 15.1" instead of simply "15.1%" ?

@@ -287,7 +297,7 @@ mkTracers traceOptions tracer = do
teeTraceChainTip tverb elided tr = Tracer $ \ev -> do
traceWith (teeTraceChainTip' tr) ev
traceWith (teeTraceChainTipElide StructuredLogging tverb elided tr) ev
traceWith (teeTraceChainTipElide TextualRepresentation tverb elided (appendName "text" tr)) ev
--traceWith (teeTraceChainTipElide TextualRepresentation tverb elided (appendName "text" tr)) ev
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment should be removed

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any reason for "(%) = 15.1" instead of simply "15.1%" ?

it is just a "Double" value that is traced and rendered in the logs.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment should be removed

this duplicates the traced values in a different context name with textual representation. I am not sure if someone depends on this..

@@ -1,5 +1,5 @@
# global filter; messages must have at least this severity to pass:
minSeverity: Notice
minSeverity: Info
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it intentional to change this "main" configuration?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the replay events are traced with severity "Info". wouldn't pass the filter.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure. It's just that we're changing the main configuration. This is not the config used by QA and so? If it is, then we shouldn't change this on a whim.

@@ -423,6 +423,7 @@ instance HasSeverityAnnotation (WithMuxBearer peer MuxTrace) where
MuxTraceHandshakeServerError {} -> Error
MuxTraceRecvDeltaQObservation {} -> Debug
MuxTraceRecvDeltaQSample {} -> Info
MuxTraceSDUReadTimeoutException {} -> Notice
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was this one missing? Shouldn't that have been a warning caught with -Werror?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, that's why I added it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So that means -Werror is not enabled on CI 😳

@CodiePP CodiePP force-pushed the cad-744-log-progress-meter branch from 0706692 to 1d1bc74 Compare March 26, 2020 07:53
@mrBliss
Copy link
Contributor

mrBliss commented Mar 26, 2020

BTW, I would just squash the commits into one before merging.

A commit like "CAD-744 restored configuration" doesn't make sense to keep in the final history.

@CodiePP
Copy link
Contributor Author

CodiePP commented Mar 26, 2020

BTW, I would just squash the commits into one.

yes, for the final submission. I submitted the single commits so changes are traceable.

@CodiePP CodiePP force-pushed the cad-744-log-progress-meter branch from 1d1bc74 to 4e1ad06 Compare March 26, 2020 08:03
@CodiePP
Copy link
Contributor Author

CodiePP commented Mar 26, 2020

bors r+

@iohk-bors
Copy link
Contributor

iohk-bors bot commented Mar 26, 2020

@iohk-bors iohk-bors bot merged commit ced2550 into master Mar 26, 2020
@iohk-bors iohk-bors bot deleted the cad-744-log-progress-meter branch March 26, 2020 08:37
@dcoutts
Copy link
Contributor

dcoutts commented Mar 27, 2020

Unfortunately it looks like the progress reporting numbers are reset every time there is an intervening message. So the replay progress number keeps being reset to 0.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
byron Required for a Byron mainnet: replace the old core nodes with cardano-node.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

CAD-744 don't elide TraceLedgerReplayEvent.ReplayedBlock
3 participants