Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Walk green tree to avoid excessive allocations in reported trace #67198

Closed
wants to merge 11 commits into from

Conversation

CyrusNajmabadi
Copy link
Member

@CyrusNajmabadi CyrusNajmabadi commented Mar 6, 2023

This is called for every method (from SourceOrdinaryMethodSymbol) and accounts for 10% of allocations in traces we are looking at:

image

@CyrusNajmabadi CyrusNajmabadi requested a review from a team as a code owner March 6, 2023 18:31
@CyrusNajmabadi
Copy link
Member Author

@sharwell can provide trace information.

@CyrusNajmabadi
Copy link
Member Author

@dotnet/roslyn-compiler for a second pair of eyes. Thanks!

@CyrusNajmabadi
Copy link
Member Author

@jcouv @RikkiGibson could you ptal?

continue;
}

if (current is Syntax.InternalSyntax.YieldStatementSyntax)
return true;
Copy link
Member

@sharwell sharwell Mar 6, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need to free stack before returning? #Closed

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

indeed.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

Copy link
Member

@sharwell sharwell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need to free pooled object

@CyrusNajmabadi CyrusNajmabadi requested a review from a team as a code owner March 6, 2023 21:03

namespace Microsoft.CodeAnalysis.PooledObjects
{
[NonCopyable]
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not available in shared layer. so pulled in through a partial part in a higher layer.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was previously rejected from the compiler layer in #37543.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hrm... the reasoning doesn't seem to hold up afaict:

Yeah, let's not do this on the compiler side. I think it's a bad pattern for a couple reasons:

It trades risk of leaks, which don't affect program semantics, for potential double frees, which do affect program semantics.

Experience on IDE side hasn't shown this to be an issue. Whereas leaks are exceedingly simple to introduce (lol, i just did that here :D).

It introduces a try-finally, which is very expensive in the CLR.

Looking at teh compiler, it already extensively uses try/finally for exactly this purpose. Indeed, we use try/finally all over the place (or using) for these same patterns. It seems the simplicy and elegance of the cleanup is definitely something desired all over the place. Unclear why this helper would be different.

It introduces a virtual call instead of a direct call for freeing.

Hrmm... that seems untrue. THis is a struct disposer. I do see a constrained call. But a constrained call on a struct is not going to be a vcall at runtime right?

--

Finally, while i'd say there was potential reason to be concerned initially (new pattern and all that), it's proven itself out Extensively on the IDE side, with thousands of usages in our code without issue. It was also the pattern we took with things like DisposableResetPoint. So i'd say we now have a ton of experience under our belt to feel that this is ok.

The benefit is def clearly worth it as well. I apparently can't avoid leaking memory :D

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd rather get a read on what the entire team thinks about this before accepting the pooling change, given that it was previously rejected.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure. I have no issue with that. I'm just giving my reasoning why for why that initial rejection seems suspect now.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm completely fine if the compiler team as a whole decides to continue to disallow this. And would rather bank the perf win here than have to spend a lot of extra time on this specific PR, if this ends up being a protracted conversation.

My personal view mostly aligns with Cyrus's though. I always feel suspicious when we have to avoid coding patterns which are easier to use correctly because of theoretical perf concerns. (where theoretical means we didn't measure it and observe that it results in an unacceptable regression in perf.)

I would agree that double-free is not a risk here. The places where double-free is a risk are where we have to acquire the thing in one method and plunk it through a bunch of helpers before Free'ing it separately on a bunch of different code paths. We are not at a risk of passing this to a component which sees this as an abstract IDisposable and double-frees it because the IDisposable contract says it's allowed to.

I also wouldn't mind seeing if we could establish a checked acquire-release pattern, which doesn't necessitate the IDisposable interface or a try/finally, in the language through some means. A pattern where the expectation is that we don't care if we fail to release the thing when an exception occurs. For example, by adding an analyzer which performs a flow analysis, and a pattern for type declarations to indicate that the method which acquires the thing must also release it.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i'm also pretty skeptical that try/finally is a problem outside of maybe the most intense hotspots. The parser (which we need to be staggeringly fast) is chock full of this try/finally/dispose pattern, and yet never ever itself shows up as any sort of perf concern in any of the many core perf scenarios we care deeply about. I would expect that if this was a problem, we'd def notice, as we happily/freely add try/finally/dispose there and haven't seen even a blip on any radar anywhere.

In my opinion, we should code with clear patterns, and when we have data indicating there is a perf issue, we judiciously fix up those precise areas, with the right documentation explaining the issue.

So, in a case like this, we win an absolute ton just by the red->green transition. And any sort of worry about a loss from try/finally is completely negligible in comparison.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was previously rejected from the compiler layer in #37543.

I think the reasoning is still valid and the resolution should stand.

@RikkiGibson RikkiGibson self-assigned this Mar 6, 2023
@333fred 333fred self-requested a review March 6, 2023 21:38
@CyrusNajmabadi CyrusNajmabadi requested a review from a team as a code owner March 6, 2023 21:58
@jcouv jcouv assigned jcouv and unassigned RikkiGibson Mar 6, 2023
Copy link
Member

@jcouv jcouv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM Thanks (iteration 11)

@RikkiGibson
Copy link
Contributor

Ensure we don't merge till #67198 (review) is resolved.

@CyrusNajmabadi
Copy link
Member Author

@333fred up to you on next steps. If you're ok with this, i'm hoping 3 signoffs show okness with this approach. if there's further steps you'd like me to take here, lmk :)

@jcouv
Copy link
Member

jcouv commented Mar 7, 2023

@333fred up to you on next steps. If you're ok with this, i'm hoping 3 signoffs show okness with this approach. if there's further steps you'd like me to take here, lmk :)

FWIW, I have pinged the compiler team on Teams to chime in if needed.

@jcouv
Copy link
Member

jcouv commented Mar 7, 2023

Kindly reminder, if/when this gets merged, it should be squashed. Thanks

@CyrusNajmabadi
Copy link
Member Author

Note: i have a followup PR that that will then cause conflicts with :-/ Squashing basically doesn't work with any sort of "fork and breakup intermediary pieces into PRs" approach. It's not the end of hte world. But it is a PITA. My preference would be that we strongly embrace branches and branching as the superior way of doing development, as it allows full tree/graph based flows of code that all of git supports super well. Squash-based development ends up losing all context and code flow information, which virtually guarantees code conflicts in teh entire downstream tree/graph :(

@jcouv
Copy link
Member

jcouv commented Mar 7, 2023

The compiler policy is clear. The runtime repo has a similar policy (documented here and enforced in GitHub UI).
You do have to adjust your branching style (ie. use rebase) which I understand can be jarring and inconvenient. Will take merge/squash discussion offline.

@CyrusNajmabadi
Copy link
Member Author

Closing out. Timebox exceeded.

@CyrusNajmabadi CyrusNajmabadi deleted the greenWalk branch March 7, 2023 08:38
@AlekseyTs
Copy link
Contributor

AlekseyTs commented Mar 7, 2023

Squashing basically doesn't work with any sort of "fork and breakup intermediary pieces into PRs" approach.

To be honest, I do not understand what is the problem. Commits shouldn't be duplicated in different PRs. And, if so, what conflicts could possibly arise from squashing? I can imagine that some pending local changes could be built on top of the original commits, but a quick rebase after the merge usually works for me without any problems or conflicts. Of course, the squashed commits should be manually removed from the list of commits for the rebase, but, that shouldn't be a problem, unless, of course, commits were created for every single character change (then there are hundreds of them and none of them have meaningful title to be able to distinguish one commit from the other).

@CyrusNajmabadi
Copy link
Member Author

Commits shouldn't be duplicated in different PRs.

I'm not sure what that means. Every time a branch happens, it includes all previous commits shared from that point prior.

Those commits are how git knows what a merge should mean (it pulls in the commits that are not in both).

And, if so, what conflicts could possibly arise from squashing?

Edits to the same locations now have no shared history, so they conflict.

I can imagine that some pending local changes could be built on top of the original commits, but a quick rebase after the merge usually works for me without any problems or conflicts.

My experience is often different, with conflicts galore as further work, refinements and refactorings are made in the subsequent prs. It's totally normal for me to have a chain of dependent prs going through that are broken into smaller pieces to make the PR, and their concepts, easy to digest and review. I've had to do the rebase strategy here and have had to fix up conflicts in a cascading fashion through each subsequent pr.

@AlekseyTs
Copy link
Contributor

Commits shouldn't be duplicated in different PRs.

I'm not sure what that means. Every time a branch happens, it includes all previous commits shared from that point prior.

This means that, until a PR is merged, there shouldn't be another PR that includes the same new commit. As with everything, there are exceptions, but, in my experience, they are rare. For the benefit of this discussion I am going to ignore those exceptions. So, only the person introducing the initial new commits is going to deal with merging/rebasing after a squash-merge. We recognize the fact, and think that this is a reasonable "price" to pay for clear and meaningful history in the public repository. There are certain development practices that can be employed by an individual developers to minimize the individual "price", or avoid it altogether. It is an individual choice whether to follow them or not.

@CyrusNajmabadi
Copy link
Member Author

CyrusNajmabadi commented Mar 7, 2023

This means that, until a PR is merged, there shouldn't be another PR that includes the same new commit

I have no idea why that would be appropriate or desirable.

It's the norm for me that there may be many PRs/branches in flight at any point in time that include portions of another. That's completely normal and expected as changes are being made, and there are future downstream changes depending on them. Not being able to do is limiting and unpleasant, directly impacting code velocity.

So, only the person introducing the initial new commits is going to deal with merging/rebasing after a squash-merge.

Yes. At the very least, that person will be impacted. Quite negatively in my experience :)

And now that totally normal (and expected) use case of git/github is now made that much more unpleasant. What would be totally pain-free merges, become conflict messes. Even something basic like adding a method, but then changing it to be async (or adding a parameter) will result in an enormous number of conflicts because of the lack of shared history for git to use to understand what happened.

It is an individual choice whether to follow them or not.

I don't see any reason to avoid the most normal and simple of git operations, that enables easy parallel development of code, when all that avoidance does is cause pain. It seems like cutting off ones nose to spite their face. Branches are the bread and butter of git. Calling it exceptional to use them in the exact way they were intended to be used seems unnecessarily limiting and (clearly), quite painful.

@AlekseyTs
Copy link
Contributor

I don't see any reason to avoid the most normal and simple of git operations, that enables easy parallel development of code, when all that avoidance does is cause pain.

It is quite possible you are not doing rebase in the "right" way. I never have to deal with merge conflicts when doing rebase after squash-merge. I'll try to follow up with you offline

@CyrusNajmabadi
Copy link
Member Author

CyrusNajmabadi commented Mar 7, 2023

It is quite possible you are not doing rebase in the "right" way.

What rebase helps when you add a method in one PR, squash the commits there, and then, for example, make it async in a following PR that branched off of the original?

You can see this chain of operations in my fork with teh following branches commits:

--

Examples. Starting with aecd498 as the root branch off of main

Branch "branchExample1" adds methods in d250d43
Branch "branchExample2" is branched from "branchExample1" and makes the methods async 2883699
Back in "branchExample1" during PR we add a new method at the end 9bd06aa

In normal world, branchExample1 merges back into main. Merging this into branchExample2 has no conflicts, successfully merging as 73fa49f98f2643dcfe21ae2ca0491b027f4b1772. This is normal merge flow and works fantastically as the relation between commits between branches is fully understood (which is the point of branches in the first place :)).

However, say that "branchExample1" is instead squashed into main. This now goes in as commit 76225ff, which has no sharing information between it and branchExample2.

Attempting to merge main back into branchExample2 now leads to: Automatic merge failed; fix conflicts and then commit the result. Merge conflict is:

<<<<<<< HEAD
        public async Task MAsync()
        {
        }

        public async Task NAsync()
        {
            await MAsync();
            await MAsync();
=======
        public void M()
        {
        }

        public void N()
        {
            M();
            M();
        }

        public void O()
        {
>>>>>>> branchExample3
        }

The lack of hte shared history means that git doesn't understand the relation between the branches.

--

Attempting a rebase of branchExample2 onto branchExample1 (forming 5c25c7603ab96d25189d6b87e674c7284ac03e2a for example) allows it to see the new method added in branchExample1. However, there is still no shared history between that and the squashed commit going into main. So attempting to merge main back in still leads to the same conflicts as above.

Without the awareness that the "make async" change has already causally incorporated the "add methods" commit, git has no way to know what the meaning is here. For all it knows, these are entirely independent, and the commits that make things async truly commit with the original commits, despite the branches and original commits fully indicating the relationships here.

--

Branches and merging make this a complete non-issue, and fully exemplifies what each piece is doing and how the branches depend. Squashing/rebasing removes that.

@333fred
Copy link
Member

333fred commented Mar 7, 2023

I've opened #67221, which reverts back to 8c4bd9e and then manually frees the builder.

@AlekseyTs
Copy link
Contributor

Examples. Starting with aecd498 as the root branch off of main

I managed to go through the steps and rebase without any merge conflicts.

Here is the command I used (rebase1_test is the branch with "Add methods" and "Add O method" commits squashed, rebase2_test is the branch with "Add methods" and "Make async" commits):

D:\GitHub\roslyn>git checkout rebase2_test
Switched to branch 'rebase2_test'

D:\GitHub\roslyn>git rebase -i rebase1_test
hint: Waiting for your editor to close the file...
[main 2023-03-07T18:13:33.603Z] update#setState idle
[main 2023-03-07T18:13:34.103Z] [UtilityProcess id: 1, type: extensionHost, pid: <none>]: creating new...
[main 2023-03-07T18:13:34.113Z] [UtilityProcess id: 1, type: extensionHost, pid: 13912]: successfully created
[main 2023-03-07T18:14:03.615Z] update#setState checking for updates
[main 2023-03-07T18:14:03.629Z] update#setState idle
[56636:0307/101533.570:ERROR:gpu_init.cc(481)] Passthrough is not supported, GL is disabled, ANGLE is
[main 2023-03-07T18:16:14.613Z] [UtilityProcess id: 1, type: extensionHost, pid: 13912]: received exit event with code 0

[main 2023-03-07T18:16:14.613Z] Extension host with pid 13912 exited with code: 0, signal: unknown.
Successfully rebased and updated refs/heads/rebase2_test.

When rebase UI popped up, this is what was in the editor:
image

Since the "Add methods" commit was squashed into a different commit in rebase1_test, I simply remove it from the list because I no longer need this change as a separate commit (no need to duplicate changes from it, which, as one would expect, will cause a merge conflict):
image

Save and close. Done.

@AlekseyTs
Copy link
Contributor

Content in the file after this:

        public async Task MAsync()
        {
        }

        public async Task NAsync()
        {
            await MAsync();
            await MAsync();
        }

        public void O()
        {
        }

@sharwell
Copy link
Member

sharwell commented Mar 7, 2023

Squashing by policy destroys both incremental changes and the logical flow of a feature. When enabled, I've learned to just submit a distinct PR for every incremental change leading to a whole. This allows the change to preserve the incremental changes, but unfortunately squashing by policy is fundamentally incompatible with maintaining logical flow. There is some additional burden considering that prerequisite PRs have to be reviewed without understanding what the final result is, but this doesn't seem to be a problem since they already signed up for increased burden by choosing squashing as a policy. 🤷‍♂️

@333fred
Copy link
Member

333fred commented Mar 7, 2023

The policy is not "squash, regardless of flow". The policy is "keep self-contained changes to a single commit". I will sometimes use rebase instead of squash, when I've authored a history that should be preserved for future ease of reading. But the local history of developing a change is very often not the same history I would like to be preserved.

@CyrusNajmabadi
Copy link
Member Author

I certainly have no issue with a particular dev usign whatever policy they want. However, IME, this sort of rebasing is painful because of either the lack of historical code-flow info from teh commits to make effective merges in other branches, or hte need to then reconcile how such rebasing needs to be incorporated cleanly in dependent work.

My preference is simply to make this friction free and use branches naturally and simply, with having any extra steps needed to avoid conflicts (unless that's something i want to go through) :). However, given the number of branches and cross-branch work i'm often caught up in, that desire is almost always zero :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants