-
Notifications
You must be signed in to change notification settings - Fork 1.6k
Do not leak active head data in statement distribution #3519
Conversation
for deactivated in deactivated { | ||
if active_heads.remove(&deactivated).is_some() { | ||
tracing::trace!( | ||
target: LOG_TARGET, | ||
hash = ?deactivated, | ||
"Deactivating leaf", | ||
); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can simplify the fix by moving this section above the activated loop. This is equivalent, because we never activate and deactivate a leaf at the same time.
We can revert the rest of the changes, which makes your question obsolete.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can revert the rest of the changes, which makes your question obsolete.
Actually, no, let's collect all errors to a vec instead.
I'm not entirely sure if this is the way we're supposed to handle those errors;
The logic of handling errors is on the caller side depending on whether it's Fatal
or not:
polkadot/node/network/statement-distribution/src/lib.rs
Lines 1632 to 1638 in 5dac532
match result { | |
Ok(true) => break, | |
Ok(false) => {} | |
Err(Error(Fault::Fatal(f))) => return Err(f), | |
Err(Error(Fault::Err(error))) => | |
tracing::debug!(target: LOG_TARGET, ?error) | |
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can revert the rest of the changes, which makes your question obsolete.
Well, not necessarily, since there's still the problem of the rest of elements of activated
essentially being ignored if only one's missing. The memory leak here was definitely not intended, and it was a direct result of the early return, so it would make sense that the other elements of activated
being ignored were also a mistake.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, no, let's collect all errors to a vec instead.
Sounds good to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, not necessarily, since there's still the problem of the rest of elements of activated essentially being ignored if only one's missing.
I just realized that activated
actually only consists of t most 1 head except for the startup. So the simplest fix would be just moving deactivation handling up.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, not necessarily, since there's still the problem of the rest of elements of activated essentially being ignored if only one's missing.
I just realized that
activated
actually only consists of t most 1 head except for the startup. So the simplest fix would be just moving deactivation handling up.
So just for my own reference - are you saying that it's fine to ignore rest of the elements here, or that there won't ever be any extra elements?
(If it's the latter then it'd be nice to add a debug_assert!
here.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Currently we create active leaves only on block import (1 activated head):
polkadot/node/overseer/src/lib.rs
Lines 727 to 758 in 5dac532
async fn block_imported(&mut self, block: BlockInfo) -> SubsystemResult<()> { | |
match self.active_leaves.entry(block.hash) { | |
hash_map::Entry::Vacant(entry) => entry.insert(block.number), | |
hash_map::Entry::Occupied(entry) => { | |
debug_assert_eq!(*entry.get(), block.number); | |
return Ok(()); | |
} | |
}; | |
let mut update = match self.on_head_activated(&block.hash, Some(block.parent_hash)) { | |
Some((span, status)) => ActiveLeavesUpdate::start_work(ActivatedLeaf { | |
hash: block.hash, | |
number: block.number, | |
status, | |
span | |
}), | |
None => ActiveLeavesUpdate::default(), | |
}; | |
if let Some(number) = self.active_leaves.remove(&block.parent_hash) { | |
debug_assert_eq!(block.number.saturating_sub(1), number); | |
update.deactivated.push(block.parent_hash); | |
self.on_head_deactivated(&block.parent_hash); | |
} | |
self.clean_up_external_listeners(); | |
if !update.is_empty() { | |
self.broadcast_signal(OverseerSignal::ActiveLeaves(update)).await?; | |
} | |
Ok(()) | |
} |
polkadot/node/service/src/lib.rs
Lines 678 to 680 in c0387ba
let active_leaves = futures::executor::block_on( | |
active_leaves(&select_chain, &*client) | |
)?; |
(If it's the latter then it'd be nice to add a debug_assert! here.)
We can change the signature to of activated
instead to be a single entry. (But that's out of scope of the PR)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So just for my own reference - are you saying that it's fine to ignore rest of the elements here, or that there won't ever be any extra elements?
It's fine to ignore extra elements on startup for statement-distribution and there won't ever be any extra elements after that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, I've moved the deallocation to the top.
I'm still not exactly comfortable with leaving the loop below as-is; if there's going to be only one element then that definitely should be enforced by the type system, but as you've said that is out of the scope of this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, I'll handle that as part of #2812.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for identifying the issue and bearing with me!
Fixes #3514.
The main problem here was that this code essentially did an early return through the
?
operator on the first error it encountered, effectively ignoring everything else inactivated
, and also leftdeactivated
completely untouched which resulted in permanently leaking items which were added toactive_heads
in the past.I'm not entirely sure if this is the way we're supposed to handle those errors; previously the
handle_subsystem_message
returned anErr
whenget_session_index
failed; now I've made it to justcontinue
. It's not obvious to me what the right answer is here: should the whole function return anErr
if it fails for only one element ofactivated
? I'd appreciate it if someone more experienced with this code could chime in.