Swapping with tuple deconstruction compiles to inferior IL code compared to using a temporary variable #53300

hamarb123 · 2021-05-10T07:31:03Z

Description

Swapping / reordering variables compiles to slower IL code when using tuple destruction than a temporary variable.
I tested this in Release mode.

The following C# code

int x = 1;
int y = 2;
int z = 3;
(x, y)=(y, x);
Console.WriteLine((x,y,z)); //this line is to stop compiling out the variables

should be equivalent in IL output to

int x = 1;
int y = 2;
int z = 3;
int temp = x;
x = y;
y = temp;
Console.WriteLine((x,y,z)); //this line is to stop compiling out the variables

, but the first compiles to:

// Code size       31 (0x1f)
.maxstack  3
.locals init (int32 V_0,
         int32 V_1,
         int32 V_2,
         int32 V_3)
ldc.i4.1
stloc.0
ldc.i4.2
stloc.1
ldc.i4.3
stloc.2
ldloc.1
ldloc.0
stloc.3
stloc.0
ldloc.3
stloc.1
ldloc.0
ldloc.1
ldloc.2
newobj     instance void valuetype [System.Runtime]System.ValueTuple`3<int32,int32,int32>::.ctor(!0,
                                                                                                           !1,
                                                                                                           !2)
box        valuetype [System.Runtime]System.ValueTuple`3<int32,int32,int32>
call       void [System.Console]System.Console::WriteLine(object)
ret

and the second compiles to

// Code size       29 (0x1d)
.maxstack  3
.locals init (int32 V_0,
         int32 V_1,
         int32 V_2)
ldc.i4.1
stloc.0
ldc.i4.2
stloc.1
ldc.i4.3
stloc.2
ldloc.0
ldloc.1
stloc.0
stloc.1
ldloc.0
ldloc.1
ldloc.2
newobj     instance void valuetype [System.Runtime]System.ValueTuple`3<int32,int32,int32>::.ctor(!0,
                                                                                                           !1,
                                                                                                           !2)
box        valuetype [System.Runtime]System.ValueTuple`3<int32,int32,int32>
call       void [System.Console]System.Console::WriteLine(object)
ret

With the essential code being for (1):

ldloc.1
ldloc.0
stloc.3
stloc.0
ldloc.3
stloc.1

, and for (2):

ldloc.0
ldloc.1
stloc.0
stloc.1

I did test this on .NET 5 locally, but also on an online .NET 6 compiler at https://sharplab.io/, which uses a very recent version.
I also imagine that a similar thing happens for other variants, such as:

(x, y) = (x, x);
(x, y, z, w) = (w, y, z, x);

etc.
Surely these destruction assignments should be as simple as ld all of the variables, and then st them all (as long as there's not too many).
Interestingly, the tuple destruction code essentially uses a temporary variable in IL, whereas the temporary variable code eliminates it.
Not sure if this belongs on roslyn or on here.

The text was updated successfully, but these errors were encountered:

dotnet-issue-labeler · 2021-05-10T07:31:08Z

I couldn't figure out the best area label to add to this issue. If you have write-permissions please help me learn by adding exactly one area label.

Wraith2 · 2021-05-10T07:37:28Z

If you check the JIT asm on sharplab you'll find they have the same output.

hamarb123 · 2021-05-10T07:41:21Z

Yes, that makes sense, but I still think that we should produce the best IL possible, since it will take less time to JIT and will be smaller. Especially since it seems that they bothered to do it for C# temporary variable so it should be easier to implement if it's already implemented elsewhere, and I'm sure that they had a good reason to do it for C# temporary variable.

Joe4evr · 2021-05-10T08:34:29Z

but I still think that we should produce the best IL possible

While that may be reasonable for you to assume, the only responsibility the compiler has is producing correct IL. Producing optimized IL would require much more analysis, most of which can't be done if you're compiling something like a class library (no entry point).

It's not always clear what "optimal" IL looks like. One must anticipate the capabilities and limitations of the downstream code generator.

- #15929 (comment)

since it will take less time to JIT and will be smaller

And your metrics for this statement are where?

hamarb123 · 2021-05-10T11:12:20Z

I see your point re making correct IL, that's probably fair enough then, but this optimisation has nothing to do whether it's a class library or not, it's just swapping locals, it doesn't rely on any library at all, or the possible behaviour differences between a library and an executable (which shouldn't be different if you ask me since you can reference it even if it has an entry point).
Also, by smaller, I meant the IL is smaller, which it is.
I can take a good guess with the JITting time since it is just the same instruction, but more unnecessary copies of it.

Neme12 · 2022-05-04T13:43:39Z

I just noticed this conversation in dotnet/runtime: dotnet/runtime#64412 (comment)
Apparently, they avoid using (a, b) = (b, a) to swap variables because of the worse IL. It would be nice if this generated the same code as the classic way of swapping 2 variables, because this is a really nice, clear and explicit way of doing the swapping and helps readability. If it generates worse IL, some projects like dotnet/runtime will avoid using this, which is unfortunate.

Neme12 · 2022-05-04T13:47:17Z

@Wraith2

If you check the JIT asm on sharplab you'll find they have the same output.

The JIT does optimize it for int:
https://sharplab.io/#v2:EYLgtghglgdgNAFxBAzmAPgAQAwAIDKAFhAE4AOAMhMAHQBKArjAlGAKYDcAsAFC8DaAKSgIA4mxhsSUAMYAKBAE8ybAPYAzObAQBKHQF1emACy4AKmxQIAjAB4zAPjkk2687nVQSVuLhduzXAAbVF1eAG9eXGj3BDYwMlwAXg8vK24eGNTvBGTg0IyskKs8uISMgF9eAWExCSlZBWU1TW09Qx4Tc0sEACZ7J393TxzfIcDisJ5IzJi5Sd8Rqx08uSWEX0mdSqA=
but not for other types: https://sharplab.io/#v2:EYLgtghglgdgNAFxBAzmAPgAQAwAIDKAFhAE4AOAMhMAHQBKArjAlGAKYDcAsAFC8DaAKSgIA4mxhsSUAMYAKBAE8ybAPYAzOZgCMNACIQEbACqs2ASnMBdXpgAsuY2xQJtAHmMA+OSTbrHuOpQJC5wuL7+xrgANqgI5rwA3ry4qQFGYGS4ALyBwS7cPGl5IQg5MXGFxbEu5RlkhQC+vALCYhJSsgrKapo6+oYmZpY2PPaOzggATB7eEQFBpWHzUTXxSSlpcmthiy7m5XJ7CGFr5k1AA

Neme12 · 2022-05-04T13:54:17Z

@Joe4evr

While that may be reasonable for you to assume, the only responsibility the compiler has is producing correct IL. Producing optimized IL would require much more analysis, most of which can't be done if you're compiling something like a class library (no entry point).

Yes, the only guarantee the compiler makes is producing correct IL, which means that this isn't a bug, but it's still a reasonable feature request and an improvement, especially given the big focus on performance in today's .NET. Also, this would just be a small focused optimization to eliminate a local or avoid creating it in the first place, nothing to do with interprocedural optimization, so your comment about a class library and entry points is irrelevant.

NN--- · 2022-06-16T20:36:32Z

@Neme12 JIT performs optimization for x64

https://sharplab.io/#v2:EYLgxg9gTgpgtADwGwBYA0AXEBDAzgWwB8ABABgAIBlAC2ygAcAZbYAOgCUBXAOwwEt8MANwBYAFDiA2gCk+GAOIxuMKHzAAKDAE96MCADN1xAIysAItgwwAKgJgBKewF1xxFOWsxcGYwB5rAHzqsPoe5Pp8UN5o5CFhADZ4GPbiAN7i5JlhVvj05AC84ZHeomJZRVEYBeSJJRlZtVWFOfSlAL7iUrIKSipqmjp6hibmljZ2ji5ibh5eGABM/kFx1hXRsTChq40pYullWeqNMRGV9tXqp+s77UA=

stephentoub · 2022-09-04T11:38:57Z

JIT performs optimization for x64

That type is 64 bits. Try one that's larger, e.g.
https://sharplab.io/#v2:EYLgxg9gTgpgtADwGwBYA0AXEBDAzgWwB8ABABgAIBlAC2ygAcAZbYAOgCUBXAOwwEt8MANwBYAFDiA2gCk+GAOIxuMKHzAAKDAE96MCADN1xAIyt5nPgBMAlNYC644inIAVGLgzGAPC4B862H1Xcn0+KA80ckDggBs8DGtxAG9xcjTgjBh8enIAXhCwj1ExdILwjDzyOKLU9OqK/Mzs4oBfcSlZBSUVNU0dPUMTMwsbe0dnNw8AJh9/aJcyiKiYIIX6xLEUkvT1esjQ8utK9QOl9dagA===
Once it's large than a word, they're no longer the same.

hamarb123 · 2022-09-04T22:13:40Z

I'd also like to say that in terms of inlining, tuple destruction can be a turn off sometimes.
The shortest version of the method (where you just ld all the destinations and values, and then just st them all - I will elaborate below), which I would think would also be the simplest to implement, generates the smallest code IL Code, and equivalent JIT code to the longer form (which has the better JIT). It also allows the .maxstack 8 tiny header optimisation for reordering up to 4 things.
https://sharplab.io/#v2:EYLgxg9gTgpgtADwGwBYA0AbEBLDAfAASQAI...

Elaboration on what IL code I propose gets emitted:
For code like (a, b, c, ...) = (d, e, f, ...) the following code would be emitted:
Load a's storage
Load d
Load b's storage
Load e
Load c's storage
Load f
...
Store to c
Store to b
Store to a
This method should also work when one address is in multiple positions due to how it's ordered, so no special checking if they're the same address would be needed.

For a field on the left (a):
Load a's storage - load the object for an instance field on a class, or the address for an instance field on a value type, or none for a static field.
Store to a - stfld or stsfld

For a pointer or byref on the left (a):
Load a's storage - load the value (value being the pointer or byref) itself
Store to a - stind or appropriate short form

For a local variable (a):
Load a's storage - none
Store to a - stloc

And for things on the right, it is just the standard way of loading them onto the stack.

jeffschwMSFT transferred this issue from dotnet/runtime May 10, 2021

dotnet-issue-labeler bot added Area-Compilers untriaged Issues and PRs which have not yet been triaged by a lead labels May 10, 2021

jaredpar added Bug Concept-Continuous Improvement Tenet-Performance Regression in measured performance of the product from goals. Area-Compilers and removed Area-Compilers untriaged Issues and PRs which have not yet been triaged by a lead labels Jul 13, 2021

jaredpar added this to the Backlog milestone Jul 13, 2021

Youssef1313 mentioned this issue May 4, 2022

Generate better IL for (a, b) = (b, a) #61127

Closed

stephentoub mentioned this issue Sep 4, 2022

Improve Span.Reverse fast path performance dotnet/runtime#70944

Merged

jaredpar added this to Roslyn Compiler Tech Debt Sep 6, 2022

jaredpar assigned jcouv Sep 6, 2022

deeprobin mentioned this issue Sep 20, 2022

Optimization potential for tuple-style assignments dotnet/runtime#75925

Open

AlekseyTs changed the title ~~Swapping with tuple destruction compiles to inferior IL code compared to using a temporary variable~~ Swapping with tuple deconstruction compiles to inferior IL code compared to using a temporary variable Sep 30, 2022

RikkiGibson assigned RikkiGibson and unassigned jcouv Oct 27, 2022

RikkiGibson removed this from the Backlog milestone Oct 27, 2022

RikkiGibson added this to the 17.5 milestone Oct 27, 2022

RikkiGibson mentioned this issue Nov 10, 2022

Optimize codegen for tuple swap scenarios #65327

Merged

RikkiGibson closed this as completed in #65327 Nov 18, 2022

RikkiGibson moved this to Done in Roslyn Compiler Tech Debt Nov 18, 2022

RikkiGibson mentioned this issue Dec 6, 2022

Add a setting to only use IDE0180 Use tuple swap if it will avoid extra temps #65815

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Swapping with tuple deconstruction compiles to inferior IL code compared to using a temporary variable #53300

Swapping with tuple deconstruction compiles to inferior IL code compared to using a temporary variable #53300

hamarb123 commented May 10, 2021

dotnet-issue-labeler bot commented May 10, 2021

Wraith2 commented May 10, 2021

hamarb123 commented May 10, 2021

Joe4evr commented May 10, 2021

hamarb123 commented May 10, 2021 •

edited

Loading

Neme12 commented May 4, 2022 •

edited

Loading

Neme12 commented May 4, 2022

Neme12 commented May 4, 2022 •

edited

Loading

NN--- commented Jun 16, 2022

stephentoub commented Sep 4, 2022 •

edited

Loading

hamarb123 commented Sep 4, 2022 •

edited

Loading

Swapping with tuple deconstruction compiles to inferior IL code compared to using a temporary variable #53300

Swapping with tuple deconstruction compiles to inferior IL code compared to using a temporary variable #53300

Comments

hamarb123 commented May 10, 2021

Description

dotnet-issue-labeler bot commented May 10, 2021

Wraith2 commented May 10, 2021

hamarb123 commented May 10, 2021

Joe4evr commented May 10, 2021

hamarb123 commented May 10, 2021 • edited Loading

Neme12 commented May 4, 2022 • edited Loading

Neme12 commented May 4, 2022

Neme12 commented May 4, 2022 • edited Loading

NN--- commented Jun 16, 2022

stephentoub commented Sep 4, 2022 • edited Loading

hamarb123 commented Sep 4, 2022 • edited Loading

hamarb123 commented May 10, 2021 •

edited

Loading

Neme12 commented May 4, 2022 •

edited

Loading

Neme12 commented May 4, 2022 •

edited

Loading

stephentoub commented Sep 4, 2022 •

edited

Loading

hamarb123 commented Sep 4, 2022 •

edited

Loading