-
Notifications
You must be signed in to change notification settings - Fork 59
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reduce the number of allocations needed to find a specific child/sibling #119
base: master
Are you sure you want to change the base?
Reduce the number of allocations needed to find a specific child/sibling #119
Conversation
…ren of SyntaxNode/SyntaxElement
…the generic parameter over Fn(SyntaxKind) from the outwards facing api
DHAT output of the integrated completion benchmarks with the latest commits:
Roughly a 21% decrease in total blocks allocated, a 9.6% decrease in total bytes allocated. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Roughly a 21% decrease in total blocks allocated, a 9.6% decrease in total bytes allocated.
👀 that's pretty significant (for the measured workload)!
Needs a LGTM.cargo fmt
, but otherwise
It might also be interesting to instead expose this as e.g. .children().filter_by_kind(|kind| ..)
(cf. filter
+ *_by_key
) rather than .children_matching(|kind| ..)
, but I don't really have a strong preference here.
r? @matklad?
Nice! Wanted to do this for ages. In terms of API, I suggesting using I also like CAD97's suggestion to makes this look like Not sure how to express the predicate. I can see three options: pass generic impl Fn, pass fn pointer, pass a single kind, pass an array of kinds. I think I'd go with fn pointer -- that's slightly unusual API, but seems to work best for this case. Though, no strong opinion. Finally, I suggest to start with the smallest API extension we can do here. Looking at the rowan diff, it seems that |
Would be curious to see what's wall-clock time difference on rowan as well? ( |
(Completely overkill but) We could do something like the |
I'm expressing the predicate as a generic impl Fn for now as it seems to be the most flexible. What's more, I've been able to integrate these changes in rust-analyzer without needing to add type parameters everywhere (see this branch). |
I see I'm failing tests in rust-analyzer, I will investigate. Edit: fixed the tests. |
…t (i.e, only if we start iterating through them) to avoid some unnecessary allocations when calling by_kind
Current DHAT output of running the integrated completions benchmark: On the master branch of my fork without these changes: On my branch that includes these changes: We can also see that the completions benchmark went from 88.9 billion instructions to 83.7 billion instructions, roughly a 6% decrease. |
src/api.rs
Outdated
pub fn by_kind<F: Clone + Fn(SyntaxKind) -> bool>( | ||
self, | ||
matcher: F, | ||
) -> SyntaxNodeChildrenMatching<F, L> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
) -> SyntaxNodeChildrenMatching<F, L> { | |
) -> SyntaxNodeChildrenByKind<F, L> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed
src/api.rs
Outdated
} | ||
|
||
#[derive(Debug, Clone)] | ||
pub struct SyntaxNodeChildrenMatching<F: Clone + Fn(SyntaxKind) -> bool, L: Language> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd probably still stick with fn(SyntaxKind) -> bool
. Fn trait is indeed more flexible, but that comes at the expense of compile times, as generics break separate compilation. With fn
, we compile everything when compiling rowan
. With Fn
, donstream crates would have to carry compilation time burden.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would there be any disadvantage to a &dyn Fn
vs. an fn
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good question. It'll have to carry a lifetime, and would be two pointers wide, while what we actually need in practice is fn
. I guess, my primary intuition here is about avoiding premature generalization.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think support::token
is a place where we would have to pass a closure to get by_kinds
to work. Because of this, I'm using &dyn Fn
right now.
parent: SyntaxNode, | ||
next: Option<SyntaxElement>, | ||
next_initialized: bool, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm, why do we need to change anything here? I'd expect SyntaxElementChildren to remain exactly as they were
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good question. I changed this so we could avoid creating SyntaxElementChildren::next
when calling children().by_kind
. This actually makes a pretty significant impact - here's the DHAT output of the integrated completions benchmark without this optimization:
And here's the DHAT output of the integrated completions benchmark after this optimization:
98 million blocks allocated (before this optimization) vs 86 million blocks allocated (after this optimization).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds odd to me that using first_child_or_token_by_kind
with these extra changes is so much faster than seeking to the next sibling via next_sibling_or_token_by_kind
, as both ways come down to iterating a slice no?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, my observation is that using first_child_or_token_by_kind
allows us to jump to the first child/token node that matches a kind without needing to allocate any other nodes. Contrast this to using first_child_or_token
followed by next_sibling_or_token_by_kind
(if the first child doesn't match the kind), which can potentially allocate the first child before jumping to the desired node. I think its this allocation avoidance that explains the improvement in the DHAT outputs.
parent: SyntaxNode, | ||
next: Option<SyntaxElement>, | ||
next_initialized: bool, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds odd to me that using first_child_or_token_by_kind
with these extra changes is so much faster than seeking to the next sibling via next_sibling_or_token_by_kind
, as both ways come down to iterating a slice no?
A very common operation in rust-analyzer is finding a child/sibling of a
SyntaxNode
that matches a specificSyntaxKind
. This is currently implemented by iterating through children and finding the first matching node. However, this causes unnecessary allocations - the children that don't match are allocated and quickly discarded.This PR aims to reduce these unnecessary allocations by providing methods to iterate through only the children/siblings that match the desired
SyntaxKind
. Children/siblings that don't match the desiredSyntaxKind
are not allocated in these iterators.So, I've only implemented this optimization for iterating through children, but this optimization already seems quite promising. Below is the DHAT output from running rust-analyzer's integrated completion benchmarks.
DHAT output before this change (on the master branch in my fork):
DHAT output after this change (on this branch where I used
children_matching
insupport::child
).This is roughly a 8.5% decrease in total blocks allocated and a 4% decrease in the total bytes allocated.