[mono] Intrinsify API's in SpanHelpers.ByteMemOps.cs #99161

fanyang-mono · 2024-03-01T14:19:13Z

Two new API's were added and intrinsified by CoreCLR. They are used in the library which caused Performance regression in Mono. See dotnet/perf-autofiling-issues#29872. These new API's were added via #98623

The API's to intrinsify are

SpanHelpers.Memmove -> Update existing intrinsics support for Buffer.Memmove. See the code below

runtime/src/mono/mono/mini/intrinsics.c

Lines 288 to 310 in fd48b6f

    
           if (in_corlib && !strcmp (m_class_get_name (cmethod->klass), "Buffer")) { 
        
           	if (!strcmp (cmethod->name, "Memmove") && fsig->param_count == 3 && m_type_is_byref (fsig->params [0]) && m_type_is_byref (fsig->params [1]) && !cmethod->is_inflated) { 
        
           		MonoBasicBlock *end_bb; 
        
           		NEW_BBLOCK (cfg, end_bb); 
        
           		// do nothing if len == 0 (even if src or dst are nulls) 
        
           		MONO_EMIT_NEW_BIALU_IMM (cfg, OP_COMPARE_IMM, -1, args [2]->dreg, 0); 
        
           		MONO_EMIT_NEW_BRANCH_BLOCK (cfg, OP_IBEQ, end_bb); 
        
           		// throw NRE if src or dst are nulls 
        
           		MONO_EMIT_NEW_BIALU_IMM (cfg, OP_COMPARE_IMM, -1, args [0]->dreg, 0); 
        
           		MONO_EMIT_NEW_COND_EXC (cfg, EQ, "NullReferenceException"); 
        
           		MONO_EMIT_NEW_BIALU_IMM (cfg, OP_COMPARE_IMM, -1, args [1]->dreg, 0); 
        
           		MONO_EMIT_NEW_COND_EXC (cfg, EQ, "NullReferenceException"); 
        
           		MONO_INST_NEW (cfg, ins, OP_MEMMOVE); 
        
           		ins->sreg1 = args [0]->dreg; // i1* dst 
        
           		ins->sreg2 = args [1]->dreg; // i1* src 
        
           		ins->sreg3 = args [2]->dreg; // i32/i64 len 
        
           		MONO_ADD_INS (cfg->cbb, ins); 
        
           		MONO_START_BB (cfg, end_bb); 
        
           	} 
        
           }

SpanHelpers.ClearWithoutReferences
SpanHelpers.Fill

The text was updated successfully, but these errors were encountered:

ghost · 2024-03-01T14:19:16Z

Tagging subscribers to this area: @SamMonoRT, @fanyang-mono
See info in area-owners.md if you want to be subscribed.

Issue Details

Two new API's were added and intrinsified by CoreCLR. They are used in the library which caused Performance regression in Mono. See dotnet/perf-autofiling-issues#29872. These new API's were added via #98623

The API's to intrinsify are

SpanHelpers.Memmove -> Update existing intrinsics support for Buffer.Memmove. See the code below

runtime/src/mono/mono/mini/intrinsics.c

Lines 288 to 310 in fd48b6f

    
           if (in_corlib && !strcmp (m_class_get_name (cmethod->klass), "Buffer")) { 
        
           	if (!strcmp (cmethod->name, "Memmove") && fsig->param_count == 3 && m_type_is_byref (fsig->params [0]) && m_type_is_byref (fsig->params [1]) && !cmethod->is_inflated) { 
        
           		MonoBasicBlock *end_bb; 
        
           		NEW_BBLOCK (cfg, end_bb); 
        
           		// do nothing if len == 0 (even if src or dst are nulls) 
        
           		MONO_EMIT_NEW_BIALU_IMM (cfg, OP_COMPARE_IMM, -1, args [2]->dreg, 0); 
        
           		MONO_EMIT_NEW_BRANCH_BLOCK (cfg, OP_IBEQ, end_bb); 
        
           		// throw NRE if src or dst are nulls 
        
           		MONO_EMIT_NEW_BIALU_IMM (cfg, OP_COMPARE_IMM, -1, args [0]->dreg, 0); 
        
           		MONO_EMIT_NEW_COND_EXC (cfg, EQ, "NullReferenceException"); 
        
           		MONO_EMIT_NEW_BIALU_IMM (cfg, OP_COMPARE_IMM, -1, args [1]->dreg, 0); 
        
           		MONO_EMIT_NEW_COND_EXC (cfg, EQ, "NullReferenceException"); 
        
           		MONO_INST_NEW (cfg, ins, OP_MEMMOVE); 
        
           		ins->sreg1 = args [0]->dreg; // i1* dst 
        
           		ins->sreg2 = args [1]->dreg; // i1* src 
        
           		ins->sreg3 = args [2]->dreg; // i32/i64 len 
        
           		MONO_ADD_INS (cfg->cbb, ins); 
        
           		MONO_START_BB (cfg, end_bb); 
        
           	} 
        
           }

SpanHelpers.ClearWithoutReferences

Author:	fanyang-mono
Assignees:	-
Labels:	`area-Codegen-Intrinsics-mono`
Milestone:	9.0.0

lewing · 2024-04-24T21:26:12Z

This is one of the regressions in dotnet/perf-autofiling-issues#30686 so it is potentially quite impactful on wasm
cc @steveisok

lewing · 2024-04-24T22:10:12Z

dotnet/perf-autofiling-issues#30686 (comment)

lewing · 2024-04-24T22:15:26Z

cc @EgorBo @JulieLeeMSFT

EgorBo · 2024-04-24T23:15:35Z

cc @EgorBo @JulieLeeMSFT

It was decided that it should be handled on Mono side #99059

lewing · 2024-04-25T14:54:10Z

cc @EgorBo @JulieLeeMSFT

It was decided that it should be handled on Mono side #99059

Yes, this issue is about fixing it in mono. I pinged you because this is a ship blocking regression for wasm and I wanted to make sure that was clear.

lewing · 2024-04-25T15:00:05Z

It is probably worth understanding why the other mono-AOT-llvm runtimes were not impacted in the same way.

lewing · 2024-04-25T15:30:18Z

Iinterpreter changes were added in #99115

EgorBo · 2024-04-25T15:47:32Z

this is a ship blocking regression

I think we can just land #99059 if so cc @fanyang-mono

fanyang-mono · 2024-04-25T16:05:29Z

It is unfortunate that this wasn't flagged as a ship blocking issue for wasm at the time this regression was introduced. I could work on getting this in for Preview 5 instead of #99059.

lewing · 2024-04-25T18:52:14Z

It is unfortunate that this wasn't flagged as a ship blocking issue for wasm at the time this regression was introduced. I could work on getting this in for Preview 5 instead of #99059.

We couldn't flag it then because we couldn't identify it because wasm runs in dotnet/performance were not working. I was only able to identify it now that the performance team has backfilled the missing runs and I made time to dig into them again. But to be clear this needs to be fixed before rtm not preview 5

lewing · 2024-04-25T19:22:19Z

And thank you to the perf team for doing the backfill it made it possible to positively identify the cause. I'm still curious why it impacted wasm aot so much more heavily than the rest of mono.

cc @LoopedBard3 @DrewScoggins @sblom

EgorBo · 2024-04-25T20:05:40Z

I presume previously it was intrinsified to be an llvm.memset/memcpy calls. The thing is - those calls were not GC-interrupt friendly so, technically, this performance regression improves GC latency in fact.

lewing · 2024-04-25T20:53:29Z

I presume previously it was intrinsified to be an llvm.memset/memcpy calls. The thing is - those calls were not GC-interrupt friendly so, technically, this performance regression improves GC latency in fact.

GC latency in single threaded wasm is not impacted at all by that.

fanyang-mono · 2024-04-25T22:02:16Z

I am also curious why this regression has bigger impact on wasm.

fanyang-mono · 2024-04-26T19:59:59Z

@lewing I digged into the data. And this is the same microbenchmark chart but on arm64 with Mono LLVM AOT, which doesn't have the same regression as wasm

@matouskozak Do you know if this chart contains accurate data?

matouskozak · 2024-04-27T07:22:33Z

n arm64 with Mono LLVM AOT, which doesn't have the same regression as wasm

I very much hope so. We are missing the data between February and March because Mono AOT-llvm was similarly affected as WASM was. Since we didn't observe significant regression after we regain the measurements we didn't ask for backfill of the missing values. Do you think the measurements could be flawed?

I run a quick local measurements using benchmarks_local.py script for this range 79dd9ba...5ef47c8 and got 300/270ns so nowhere near the regression magnitude that WASM had.

fanyang-mono · 2024-04-29T15:02:02Z

I looked into this issue a little bit more and confirmed these:

Why did System.Collections.CopyTo<Int32>.ReadOnlySpan(Size: 2048) regress so much on wasm AOT?
WASM AOT mode is a LLVM-AOT mode with interpreter fallback. When comparing the result between wasm AOT and wasm interpreter, the result matches. So it seems that something used to be AOT'ed wasn't AOT'ed anymore, after Move memset/memcpy helpers to managed impl #98623 was merged. In the WASM AOT chart, it also showed that with the fix here, it went back down to the performance level as before the regression.

WASM interpreter data

WASM AOT data

Why didn't System.Collections.CopyTo<Int32>.ReadOnlySpan(Size: 2048) regress on mono LLVM-AOT?
This microbenchmark test is written with generics. Normal AOT mode usually doesn't AOT generic methods. And the relevant intrinsics are LLVM only. Additionally, the LLVM-AOT mode that we measure with microbenchmarks is LLVM-AOT mode with JIT fallback. And the number matches with JIT on x64.

Mono LLVM-AOT

Mono JIT

Are we measuring Mono AOT-llvm correctly?
Yes. On x64, Vector4.Add is only intrinsified with LLVM. I saw that microbenchmark ran much faster on Mono LLVM-AOT than on Mono JIT.

Mono LLVM-AOT

Mono JIT

Why this microbenchmark triggered AOT on WASM AOT but not on Mono AOT?
I suspect that it is related to this piece of code: https://github.com/dotnet/runtime/blob/main/src/mono/mono/mini/aot-compiler.c#L5744-L5762

fanyang-mono added the area-Codegen-Intrinsics-mono label Mar 1, 2024

fanyang-mono added this to the 9.0.0 milestone Mar 1, 2024

This was referenced Mar 1, 2024

[mono] Tracking: Intrinsics implementation #43051

Open

Special case Mono in SpanHelpers.Fill/CleanWithoutReference #99059

Closed

lewing mentioned this issue Apr 24, 2024

[Perf] Linux/x64: 1455 Regressions on 3/2/2024 3:53:48 AM dotnet/perf-autofiling-issues#30686

Open

lewing added the regression-from-last-release label Apr 24, 2024

lewing mentioned this issue Apr 25, 2024

[Perf] Linux/x64: 1373 Regressions on 3/2/2024 3:53:48 AM dotnet/perf-autofiling-issues#30552

Open

fanyang-mono mentioned this issue Apr 26, 2024

[mono] Intrinsify SpanHelper API's #101622

Merged

fanyang-mono self-assigned this Apr 26, 2024

fanyang-mono closed this as completed in #101622 Apr 27, 2024

github-actions bot locked and limited conversation to collaborators May 31, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[mono] Intrinsify API's in SpanHelpers.ByteMemOps.cs #99161

[mono] Intrinsify API's in SpanHelpers.ByteMemOps.cs #99161

fanyang-mono commented Mar 1, 2024 •

edited

Loading

ghost commented Mar 1, 2024

lewing commented Apr 24, 2024 •

edited

Loading

lewing commented Apr 24, 2024

lewing commented Apr 24, 2024

EgorBo commented Apr 24, 2024

lewing commented Apr 25, 2024

lewing commented Apr 25, 2024 •

edited

Loading

lewing commented Apr 25, 2024

EgorBo commented Apr 25, 2024

fanyang-mono commented Apr 25, 2024

lewing commented Apr 25, 2024 •

edited

Loading

lewing commented Apr 25, 2024 •

edited

Loading

EgorBo commented Apr 25, 2024

lewing commented Apr 25, 2024 •

edited

Loading

fanyang-mono commented Apr 25, 2024

fanyang-mono commented Apr 26, 2024 •

edited

Loading

matouskozak commented Apr 27, 2024 •

edited

Loading

fanyang-mono commented Apr 29, 2024

[mono] Intrinsify API's in SpanHelpers.ByteMemOps.cs #99161

[mono] Intrinsify API's in SpanHelpers.ByteMemOps.cs #99161

Comments

fanyang-mono commented Mar 1, 2024 • edited Loading

ghost commented Mar 1, 2024

lewing commented Apr 24, 2024 • edited Loading

lewing commented Apr 24, 2024

lewing commented Apr 24, 2024

EgorBo commented Apr 24, 2024

lewing commented Apr 25, 2024

lewing commented Apr 25, 2024 • edited Loading

lewing commented Apr 25, 2024

EgorBo commented Apr 25, 2024

fanyang-mono commented Apr 25, 2024

lewing commented Apr 25, 2024 • edited Loading

lewing commented Apr 25, 2024 • edited Loading

EgorBo commented Apr 25, 2024

lewing commented Apr 25, 2024 • edited Loading

fanyang-mono commented Apr 25, 2024

fanyang-mono commented Apr 26, 2024 • edited Loading

matouskozak commented Apr 27, 2024 • edited Loading

fanyang-mono commented Apr 29, 2024

fanyang-mono commented Mar 1, 2024 •

edited

Loading

lewing commented Apr 24, 2024 •

edited

Loading

lewing commented Apr 25, 2024 •

edited

Loading

lewing commented Apr 25, 2024 •

edited

Loading

lewing commented Apr 25, 2024 •

edited

Loading

lewing commented Apr 25, 2024 •

edited

Loading

fanyang-mono commented Apr 26, 2024 •

edited

Loading

matouskozak commented Apr 27, 2024 •

edited

Loading