-
Notifications
You must be signed in to change notification settings - Fork 721
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CAD-744 log a progress meter for block replay events #712
Conversation
output from the log:
|
e8a55ba
to
597229a
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for working on this!
This is looking good, except for one thing: the starting point matters, otherwise the progress can be misleading, see my other comment.
conteliding _tform _tverb tr ev@(WithSeverity _ (WithTip _ (ChainDB.TraceLedgerReplayEvent (LedgerDB.ReplayedBlock pt replayTo)))) (_old, count) = do | ||
let slotno = toInteger $ unSlotNo (realPointSlot pt) | ||
endslot = toInteger $ withOrigin 0 unSlotNo (pointSlot replayTo) | ||
progress :: Double = (fromInteger slotno * 100.0) / fromInteger (max slotno endslot) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When the replay starts from scratch, the progress will go from 0-100 %, good. But if the replay starts from a later slot, then it will go from X-100 %, which could be a bit misleading? On each startup, we'll do a small replay of the last k
blocks, so this is actually very common.
For example: we start replaying from slot 3,000,000 up to slot 3,002,160, the progress will be: 99.9% -> 100%. The more slots the chain gets, the less meaningful the progress will be.
The solution is to figure out the starting point of the replay. You can use the first ReplayedBlock
message for this. Alternatively, before replay starts, either ReplayFromGenesis
or ReplayFromSnapshot
is traced, so you could also use those as the starting point.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what do you want to appear in the logs? shall we print the range of the replayed blocks for the first message?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, I mean use the first message (or ReplayFrom*
) to compute the correct progress for the following messages.
For example, when replaying from slot 3,000,000 up to slot 3,002,160, the first message (or ReplayFromSnapshot
) will mention slot 3,000,000. So you can compute the progress as (something like): (slotNo - startSlotNo) / endSlotNo
. This means you'll have to store some state, i.e., the start slot.
reports the percentage of the replayed block range:
(the message about "elided" cannot be overwritten; needs some upstream changes, but then we could print progress = 100%) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great!
Any reason for "(%) = 15.1" instead of simply "15.1%" ?
@@ -287,7 +297,7 @@ mkTracers traceOptions tracer = do | |||
teeTraceChainTip tverb elided tr = Tracer $ \ev -> do | |||
traceWith (teeTraceChainTip' tr) ev | |||
traceWith (teeTraceChainTipElide StructuredLogging tverb elided tr) ev | |||
traceWith (teeTraceChainTipElide TextualRepresentation tverb elided (appendName "text" tr)) ev | |||
--traceWith (teeTraceChainTipElide TextualRepresentation tverb elided (appendName "text" tr)) ev |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Comment should be removed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any reason for "(%) = 15.1" instead of simply "15.1%" ?
it is just a "Double" value that is traced and rendered in the logs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Comment should be removed
this duplicates the traced values in a different context name with textual representation. I am not sure if someone depends on this..
@@ -1,5 +1,5 @@ | |||
# global filter; messages must have at least this severity to pass: | |||
minSeverity: Notice | |||
minSeverity: Info |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it intentional to change this "main" configuration?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the replay events are traced with severity "Info". wouldn't pass the filter.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure. It's just that we're changing the main configuration. This is not the config used by QA and so? If it is, then we shouldn't change this on a whim.
@@ -423,6 +423,7 @@ instance HasSeverityAnnotation (WithMuxBearer peer MuxTrace) where | |||
MuxTraceHandshakeServerError {} -> Error | |||
MuxTraceRecvDeltaQObservation {} -> Debug | |||
MuxTraceRecvDeltaQSample {} -> Info | |||
MuxTraceSDUReadTimeoutException {} -> Notice |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Was this one missing? Shouldn't that have been a warning caught with -Werror
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, that's why I added it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So that means -Werror
is not enabled on CI 😳
0706692
to
1d1bc74
Compare
BTW, I would just squash the commits into one before merging. A commit like "CAD-744 restored configuration" doesn't make sense to keep in the final history. |
yes, for the final submission. I submitted the single commits so changes are traceable. |
Signed-off-by: Alexander Diemand <[email protected]>
1d1bc74
to
4e1ad06
Compare
bors r+ |
Build succeeded |
Unfortunately it looks like the progress reporting numbers are reset every time there is an intervening message. So the replay progress number keeps being reset to 0. |
Issue
log progress report on block replay events (percentage of replayed blocks)
This PR does not result in breaking changes to upstream dependencies.
Checklist
This PR contains all the work required to resolve the linked issue.
The work contained has sufficient documentation to describe what it does and how to do it.
The work has sufficient tests and/or testing.
I have committed clear and descriptive commits. Be considerate as somebody else will have to read these.
I have added the appropriate labels to this PR.