This repository has been archived by the owner on Nov 6, 2020. It is now read-only.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Better logging when backfilling ancient blocks fail #10796
Better logging when backfilling ancient blocks fail #10796
Changes from 14 commits
088cac8
3c13109
020acf1
690b805
c516d1d
9f6d5d4
5e2bfb9
62c65aa
947d1e4
d9ab3b9
72c58b5
f027d4b
2f5a1c6
e019c8e
6bf1ad3
25233d1
12fd856
f577c8f
658f7ef
323edb7
fc220b0
9e17d61
File filter
Filter by extension
Conversations
Jump to
There are no files selected for viewing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think
best_ancient_block
beingNone
can either mean: "no ancient blocks imported" or "all ancient blocks imported (no gaps)". So comment is misleading, although in the case whenfirst_block
isSome
then the former is the true.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, I should move the parens I think.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd keep the comment, but drop the
TODO:
prefix :)There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should actually always deadlock - because we already have
self.restoration.lock()
acquired in line 720. parking_lot locks are not re-entrant.Instead of calling
abort_restore()
we should rather use*restoration = None
here. and maybe add a trace line if you care about it.Alternatively we could have
abort_restore
take the lockedrestoration
. IMHO if we do that for some methods already we should do that for all of them.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, for some reason we do not. It's pretty rare, I've seen it twice.
That's what
abort_restore()
does though: here. Why is it better to do it here?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To avoid locking twice?
As I said the second ption should be
abort_restore_with_restoration(locked_res: &mut Option<Restoration>)
so we can keep the logic there, but just pass the existing lock.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So the reason I removed
*restoration = None
from here and replaced it with a call toabort_restore()
was to collect all shutdown actions in one place. Not thatabort_restore()
is doing very much atm, but I still think it's good to have a single point of abortion.is locking twice really a concern though: this is an error handler and I don't think we care much about performance?
The
restoration
is a member of the struct here, so I'm not sure what the point of passing it as a param tbh. Now you got me confused!There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since
parking_lot::Mutex
is not re-entrant locking it twice in the same thread can (or does) lead to a deadlock. So the call like that:is guaranteed to deadlock, I thought always, but I suspect it might be just random.
In the same file I suppose we already had deadlock issues, so the
with_restoration
pattern emerged, where instead of locking resources locally in a particular function you get passed a (mutable) reference to the locked resource:I propose to use exactly the same patter for
abort_restore
- I'm totally fine grouping all the shutdown actions there, but to prevent potential double-locking and potential deadlocks we can pass the locked resource, so the first example becomes:There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see, thank you for explaining.
Is this an option?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
According to docs:
So I'm not sure either why it didn't always deadlock. The proposed fix looks good.