Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rewrite GetIndexOfFirstNonAsciiByte #104503

Open
wants to merge 23 commits into
base: main
Choose a base branch
from

Conversation

neon-sunset
Copy link
Contributor

@neon-sunset neon-sunset commented Jul 6, 2024

Description

The purpose of this change is to simplify GetIndexOfFirstNonAsciiByte by servicing all targets with a single path, which is now possible, and achieving optimal throughput on longer lengths.
Per "Validating gigabytes of Unicode strings per second… in C#?", it appears that we leave a lot of performance on the table for ARM64 target.

Also closes #89924

Analysis

After addressing feedback, this PR now improves maximum non-ASCII byte search throughput by 3x for AVX2 (Zen 3) and AdvSimd (Neoverse N1), and 2x for AVX512 (Ice Lake). Short lengths improve by up to 25%, and the worst case regression is on AMD only for <5% for lengths ~65-200 (under 1ns impact).

Benchmark

[SimpleJob]
public class AsciiBench
{
    [ParamsSource(nameof(Lengths))]
    public int Length;
    public IEnumerable<int> Lengths => [..Enumerable.Range(1, 8), 12, 16, 25, 50, 70, 90, 200, 512, 16384];

    byte[] ascii = [];

    [GlobalSetup]
    public void Setup()
    {
        ascii = new byte[Length];
        ascii.AsSpan().Fill((byte)'f');
    }

    [Benchmark]
    public bool ValidateAscii()
    {
        return Utf8.IsValid(ascii);
    }
}

Results

Ice Lake, Zen 3 and N1: #104503 (comment)

M1 Pro

BenchmarkDotNet v0.13.12, macOS 15.0 (24A5289g) [Darwin 24.0.0]
Apple M1 Pro, 1 CPU, 8 logical and 8 physical cores
.NET SDK 9.0.100-preview.7.24354.8
  [Host]     : .NET 9.0.0 (9.0.24.35215), Arm64 RyuJIT AdvSIMD
  DefaultJob : .NET 9.0.0 (9.0.24.35215), Arm64 RyuJIT AdvSIMD
  Job-PJSXIE : .NET 9.0.0 (42.42.42.42424), Arm64 RyuJIT AdvSIMD
Method Job Toolchain Length Mean Error StdDev
ValidateAscii DefaultJob Default 1 2.336 ns 0.0087 ns 0.0073 ns
ValidateAscii Job-PJSXIE CoreRun 1 2.345 ns 0.0062 ns 0.0052 ns
ValidateAscii DefaultJob Default 2 2.478 ns 0.0026 ns 0.0025 ns
ValidateAscii Job-PJSXIE CoreRun 2 2.355 ns 0.0033 ns 0.0030 ns
ValidateAscii DefaultJob Default 3 2.634 ns 0.0059 ns 0.0050 ns
ValidateAscii Job-PJSXIE CoreRun 3 2.507 ns 0.0025 ns 0.0022 ns
ValidateAscii DefaultJob Default 4 2.397 ns 0.0030 ns 0.0026 ns
ValidateAscii Job-PJSXIE CoreRun 4 2.347 ns 0.0020 ns 0.0019 ns
ValidateAscii DefaultJob Default 5 2.537 ns 0.0029 ns 0.0024 ns
ValidateAscii Job-PJSXIE CoreRun 5 2.462 ns 0.0047 ns 0.0042 ns
ValidateAscii DefaultJob Default 6 2.638 ns 0.0054 ns 0.0050 ns
ValidateAscii Job-PJSXIE CoreRun 6 2.517 ns 0.0016 ns 0.0014 ns
ValidateAscii DefaultJob Default 7 2.645 ns 0.0044 ns 0.0042 ns
ValidateAscii Job-PJSXIE CoreRun 7 2.666 ns 0.0048 ns 0.0045 ns
ValidateAscii DefaultJob Default 8 2.476 ns 0.0024 ns 0.0022 ns
ValidateAscii Job-PJSXIE CoreRun 8 2.435 ns 0.0015 ns 0.0014 ns
ValidateAscii DefaultJob Default 12 2.641 ns 0.0126 ns 0.0105 ns
ValidateAscii Job-PJSXIE CoreRun 12 2.432 ns 0.0033 ns 0.0030 ns
ValidateAscii DefaultJob Default 16 2.720 ns 0.0048 ns 0.0045 ns
ValidateAscii Job-PJSXIE CoreRun 16 2.435 ns 0.0033 ns 0.0029 ns
ValidateAscii DefaultJob Default 25 3.307 ns 0.0056 ns 0.0047 ns
ValidateAscii Job-PJSXIE CoreRun 25 3.358 ns 0.0050 ns 0.0047 ns
ValidateAscii DefaultJob Default 50 4.541 ns 0.0055 ns 0.0046 ns
ValidateAscii Job-PJSXIE CoreRun 50 3.618 ns 0.0055 ns 0.0052 ns
ValidateAscii DefaultJob Default 70 5.022 ns 0.0054 ns 0.0050 ns
ValidateAscii Job-PJSXIE CoreRun 70 3.993 ns 0.0062 ns 0.0055 ns
ValidateAscii DefaultJob Default 90 5.486 ns 0.0083 ns 0.0077 ns
ValidateAscii Job-PJSXIE CoreRun 90 4.674 ns 0.0050 ns 0.0047 ns
ValidateAscii DefaultJob Default 200 8.652 ns 0.0082 ns 0.0073 ns
ValidateAscii Job-PJSXIE CoreRun 200 5.941 ns 0.0046 ns 0.0041 ns
ValidateAscii DefaultJob Default 512 17.469 ns 0.0360 ns 0.0319 ns
ValidateAscii Job-PJSXIE CoreRun 512 8.589 ns 0.0323 ns 0.0270 ns
ValidateAscii DefaultJob Default 16384 473.304 ns 2.2263 ns 1.9736 ns
ValidateAscii Job-PJSXIE CoreRun 16384 169.737 ns 0.2044 ns 0.1912 ns

- Deduplicate SIMD paths - there is no more need for that
- Remove at least one branch from short length path
- Use up to 128x4/256x2 SIMD unrolling
@dotnet-policy-service dotnet-policy-service bot added the community-contribution Indicates that the PR has been added by a community member label Jul 6, 2024
@neon-sunset

This comment was marked as outdated.

@EgorBot
Copy link

EgorBot commented Jul 6, 2024

Benchmark results on Arm64
BenchmarkDotNet v0.13.12, Ubuntu 22.04.4 LTS (Jammy Jellyfish)
Unknown processor
  Job-IDDVGL : .NET 9.0.0 (42.42.42.42424), Arm64 RyuJIT AdvSIMD
  Job-BQBANQ : .NET 9.0.0 (42.42.42.42424), Arm64 RyuJIT AdvSIMD
Method Toolchain Length Mean Error Ratio
ValidateAscii Main 1 6.245 ns 0.0085 ns 1.00
ValidateAscii PR 1 5.958 ns 0.0028 ns 0.95
ValidateAscii Main 2 6.396 ns 0.0069 ns 1.00
ValidateAscii PR 2 6.244 ns 0.0142 ns 0.98
ValidateAscii Main 3 6.533 ns 0.0019 ns 1.00
ValidateAscii PR 3 6.281 ns 0.0179 ns 0.96
ValidateAscii Main 4 6.358 ns 0.0051 ns 1.00
ValidateAscii PR 4 6.027 ns 0.0039 ns 0.95
ValidateAscii Main 5 6.569 ns 0.0021 ns 1.00
ValidateAscii PR 5 6.220 ns 0.0041 ns 0.95
ValidateAscii Main 6 6.813 ns 0.0075 ns 1.00
ValidateAscii PR 6 6.431 ns 0.0046 ns 0.94
ValidateAscii Main 7 6.922 ns 0.0011 ns 1.00
ValidateAscii PR 7 6.472 ns 0.0081 ns 0.93
ValidateAscii Main 8 6.287 ns 0.0014 ns 1.00
ValidateAscii PR 8 6.245 ns 0.0067 ns 0.99
ValidateAscii Main 9 6.496 ns 0.0028 ns 1.00
ValidateAscii PR 9 6.664 ns 0.0067 ns 1.03
ValidateAscii Main 10 6.689 ns 0.0037 ns 1.00
ValidateAscii PR 10 6.682 ns 0.0082 ns 1.00
ValidateAscii Main 11 6.938 ns 0.0018 ns 1.00
ValidateAscii PR 11 6.653 ns 0.0064 ns 0.96
ValidateAscii Main 12 6.657 ns 0.0016 ns 1.00
ValidateAscii PR 12 6.570 ns 0.0070 ns 0.99
ValidateAscii Main 13 6.865 ns 0.0008 ns 1.00
ValidateAscii PR 13 6.649 ns 0.0071 ns 0.97
ValidateAscii Main 14 7.190 ns 0.0054 ns 1.00
ValidateAscii PR 14 6.649 ns 0.0063 ns 0.92
ValidateAscii Main 15 7.478 ns 0.0007 ns 1.00
ValidateAscii PR 15 6.647 ns 0.0078 ns 0.89
ValidateAscii Main 16 7.172 ns 0.0009 ns 1.00
ValidateAscii PR 16 6.567 ns 0.0043 ns 0.92
ValidateAscii Main 24 9.155 ns 0.0014 ns 1.00
ValidateAscii PR 24 9.153 ns 0.0025 ns 1.00
ValidateAscii Main 48 9.968 ns 0.0024 ns 1.00
ValidateAscii PR 48 10.684 ns 0.0120 ns 1.07
ValidateAscii Main 64 10.775 ns 0.0012 ns 1.00
ValidateAscii PR 64 10.723 ns 0.0019 ns 1.00
ValidateAscii Main 72 12.227 ns 0.0035 ns 1.00
ValidateAscii PR 72 13.220 ns 0.0089 ns 1.08
ValidateAscii Main 96 12.649 ns 0.0026 ns 1.00
ValidateAscii PR 96 13.110 ns 0.0117 ns 1.04
ValidateAscii Main 256 23.221 ns 0.0076 ns 1.00
ValidateAscii PR 256 16.474 ns 0.0115 ns 0.71
ValidateAscii Main 512 39.824 ns 0.0184 ns 1.00
ValidateAscii PR 512 21.312 ns 0.0190 ns 0.54
ValidateAscii Main 2048 139.264 ns 0.0350 ns 1.00
ValidateAscii PR 2048 56.632 ns 0.0088 ns 0.41
ValidateAscii Main 16384 1,072.289 ns 0.0625 ns 1.00
ValidateAscii PR 16384 373.391 ns 0.1192 ns 0.35

BDN_Artifacts.zip

@EgorBot
Copy link

EgorBot commented Jul 6, 2024

Benchmark results on Amd
BenchmarkDotNet v0.13.12, Ubuntu 22.04.4 LTS (Jammy Jellyfish)
AMD EPYC 7763, 1 CPU, 8 logical and 4 physical cores
  Job-HPNOVE : .NET 9.0.0 (42.42.42.42424), X64 RyuJIT AVX2
  Job-PZAFXR : .NET 9.0.0 (42.42.42.42424), X64 RyuJIT AVX2
Method Toolchain Length Mean Error Ratio
ValidateAscii Main 1 5.271 ns 0.0009 ns 1.00
ValidateAscii PR 1 4.979 ns 0.0012 ns 0.94
ValidateAscii Main 2 5.595 ns 0.0016 ns 1.00
ValidateAscii PR 2 4.973 ns 0.0006 ns 0.89
ValidateAscii Main 3 5.271 ns 0.0018 ns 1.00
ValidateAscii PR 3 4.967 ns 0.0018 ns 0.94
ValidateAscii Main 4 5.281 ns 0.0012 ns 1.00
ValidateAscii PR 4 4.661 ns 0.0014 ns 0.88
ValidateAscii Main 5 5.287 ns 0.0018 ns 1.00
ValidateAscii PR 5 4.978 ns 0.0018 ns 0.94
ValidateAscii Main 6 5.595 ns 0.0018 ns 1.00
ValidateAscii PR 6 4.979 ns 0.0030 ns 0.89
ValidateAscii Main 7 5.578 ns 0.0023 ns 1.00
ValidateAscii PR 7 5.278 ns 0.0017 ns 0.95
ValidateAscii Main 8 5.610 ns 0.0012 ns 1.00
ValidateAscii PR 8 5.010 ns 0.0040 ns 0.89
ValidateAscii Main 9 5.594 ns 0.0016 ns 1.00
ValidateAscii PR 9 4.975 ns 0.0021 ns 0.89
ValidateAscii Main 10 5.897 ns 0.0022 ns 1.00
ValidateAscii PR 10 4.965 ns 0.0009 ns 0.84
ValidateAscii Main 11 6.203 ns 0.0010 ns 1.00
ValidateAscii PR 11 4.985 ns 0.0013 ns 0.80
ValidateAscii Main 12 5.905 ns 0.0015 ns 1.00
ValidateAscii PR 12 4.984 ns 0.0015 ns 0.84
ValidateAscii Main 13 5.897 ns 0.0021 ns 1.00
ValidateAscii PR 13 4.984 ns 0.0021 ns 0.85
ValidateAscii Main 14 6.216 ns 0.0016 ns 1.00
ValidateAscii PR 14 4.976 ns 0.0007 ns 0.80
ValidateAscii Main 15 5.902 ns 0.0013 ns 1.00
ValidateAscii PR 15 4.979 ns 0.0018 ns 0.84
ValidateAscii Main 16 5.909 ns 0.0019 ns 1.00
ValidateAscii PR 16 4.976 ns 0.0015 ns 0.84
ValidateAscii Main 24 6.519 ns 0.0011 ns 1.00
ValidateAscii PR 24 6.526 ns 0.0017 ns 1.00
ValidateAscii Main 48 6.846 ns 0.0018 ns 1.00
ValidateAscii PR 48 6.536 ns 0.0038 ns 0.95
ValidateAscii Main 64 7.154 ns 0.0015 ns 1.00
ValidateAscii PR 64 6.540 ns 0.0032 ns 0.91
ValidateAscii Main 72 7.772 ns 0.0024 ns 1.00
ValidateAscii PR 72 6.852 ns 0.0023 ns 0.88
ValidateAscii Main 96 7.779 ns 0.0033 ns 1.00
ValidateAscii PR 96 6.844 ns 0.0025 ns 0.88
ValidateAscii Main 256 9.078 ns 0.0095 ns 1.00
ValidateAscii PR 256 7.770 ns 0.0030 ns 0.86
ValidateAscii Main 512 13.534 ns 0.1369 ns 1.00
ValidateAscii PR 512 9.774 ns 0.0177 ns 0.72
ValidateAscii Main 2048 41.238 ns 0.0177 ns 1.00
ValidateAscii PR 2048 24.271 ns 0.0104 ns 0.59
ValidateAscii Main 16384 333.180 ns 2.4574 ns 1.00
ValidateAscii PR 16384 171.384 ns 0.0595 ns 0.51

BDN_Artifacts.zip

@neon-sunset

This comment was marked as outdated.

@neon-sunset

This comment was marked as outdated.

@neon-sunset

This comment was marked as outdated.

1 similar comment
@neon-sunset

This comment was marked as outdated.

@neon-sunset

This comment was marked as outdated.

@EgorBot
Copy link

EgorBot commented Jul 9, 2024

Benchmark results on Arm64
BenchmarkDotNet v0.13.12, Ubuntu 22.04.4 LTS (Jammy Jellyfish)
Unknown processor
  Job-REYHIP : .NET 9.0.0 (42.42.42.42424), Arm64 RyuJIT AdvSIMD
  Job-CQPRHS : .NET 9.0.0 (42.42.42.42424), Arm64 RyuJIT AdvSIMD
Method Toolchain Length Mean Error Ratio
ValidateAscii Main 1 5.376 ns 0.0056 ns 1.00
ValidateAscii PR 1 5.172 ns 0.0052 ns 0.96
ValidateAscii Main 2 5.448 ns 0.0033 ns 1.00
ValidateAscii PR 2 5.262 ns 0.0304 ns 0.97
ValidateAscii Main 3 5.717 ns 0.0056 ns 1.00
ValidateAscii PR 3 5.488 ns 0.0038 ns 0.96
ValidateAscii Main 4 5.454 ns 0.0040 ns 1.00
ValidateAscii PR 4 5.189 ns 0.0016 ns 0.95
ValidateAscii Main 5 5.743 ns 0.0037 ns 1.00
ValidateAscii PR 5 5.465 ns 0.0035 ns 0.95
ValidateAscii Main 6 5.739 ns 0.0014 ns 1.00
ValidateAscii PR 6 5.526 ns 0.0039 ns 0.96
ValidateAscii Main 7 6.041 ns 0.0056 ns 1.00
ValidateAscii PR 7 5.724 ns 0.0031 ns 0.95
ValidateAscii Main 8 5.454 ns 0.0020 ns 1.00
ValidateAscii PR 8 5.454 ns 0.0047 ns 1.00
ValidateAscii Main 12 5.663 ns 0.0037 ns 1.00
ValidateAscii PR 12 5.479 ns 0.0061 ns 0.97
ValidateAscii Main 16 6.494 ns 0.0075 ns 1.00
ValidateAscii PR 16 5.451 ns 0.0065 ns 0.84
ValidateAscii Main 25 9.342 ns 0.0029 ns 1.00
ValidateAscii PR 25 8.386 ns 0.0007 ns 0.90
ValidateAscii Main 50 11.428 ns 0.0063 ns 1.00
ValidateAscii PR 50 8.749 ns 0.0014 ns 0.77
ValidateAscii Main 70 12.374 ns 0.0039 ns 1.00
ValidateAscii PR 70 10.612 ns 0.0019 ns 0.86
ValidateAscii Main 200 21.311 ns 0.0017 ns 1.00
ValidateAscii PR 200 14.255 ns 0.0055 ns 0.67
ValidateAscii Main 512 42.772 ns 0.0038 ns 1.00
ValidateAscii PR 512 20.237 ns 0.0030 ns 0.47
ValidateAscii Main 16384 1,065.931 ns 0.1022 ns 1.00
ValidateAscii PR 16384 339.312 ns 0.0216 ns 0.32

BDN_Artifacts.zip

@EgorBot
Copy link

EgorBot commented Jul 9, 2024

Benchmark results on Amd
BenchmarkDotNet v0.13.12, Ubuntu 22.04.4 LTS (Jammy Jellyfish)
AMD EPYC 7763, 1 CPU, 8 logical and 4 physical cores
  Job-UWXNYT : .NET 9.0.0 (42.42.42.42424), X64 RyuJIT AVX2
  Job-XOGXAK : .NET 9.0.0 (42.42.42.42424), X64 RyuJIT AVX2
Method Toolchain Length Mean Error Ratio
ValidateAscii Main 1 4.951 ns 0.0003 ns 1.00
ValidateAscii PR 1 4.365 ns 0.0004 ns 0.88
ValidateAscii Main 2 4.965 ns 0.0008 ns 1.00
ValidateAscii PR 2 3.946 ns 0.0010 ns 0.79
ValidateAscii Main 3 4.982 ns 0.0011 ns 1.00
ValidateAscii PR 3 4.666 ns 0.0008 ns 0.94
ValidateAscii Main 4 4.994 ns 0.0005 ns 1.00
ValidateAscii PR 4 4.373 ns 0.0009 ns 0.88
ValidateAscii Main 5 4.981 ns 0.0012 ns 1.00
ValidateAscii PR 5 4.676 ns 0.0106 ns 0.94
ValidateAscii Main 6 4.977 ns 0.0010 ns 1.00
ValidateAscii PR 6 4.652 ns 0.0031 ns 0.93
ValidateAscii Main 7 5.251 ns 0.0012 ns 1.00
ValidateAscii PR 7 4.674 ns 0.0005 ns 0.89
ValidateAscii Main 8 5.283 ns 0.0011 ns 1.00
ValidateAscii PR 8 4.366 ns 0.0006 ns 0.83
ValidateAscii Main 12 5.564 ns 0.0005 ns 1.00
ValidateAscii PR 12 4.365 ns 0.0005 ns 0.78
ValidateAscii Main 16 5.981 ns 0.0008 ns 1.00
ValidateAscii PR 16 4.369 ns 0.0014 ns 0.73
ValidateAscii Main 25 6.526 ns 0.0014 ns 1.00
ValidateAscii PR 25 5.582 ns 0.0007 ns 0.86
ValidateAscii Main 50 6.209 ns 0.0015 ns 1.00
ValidateAscii PR 50 5.583 ns 0.0008 ns 0.90
ValidateAscii Main 70 5.607 ns 0.0014 ns 1.00
ValidateAscii PR 70 6.203 ns 0.0007 ns 1.11
ValidateAscii Main 200 7.132 ns 0.0010 ns 1.00
ValidateAscii PR 200 6.818 ns 0.0006 ns 0.96
ValidateAscii Main 512 11.592 ns 0.0144 ns 1.00
ValidateAscii PR 512 8.678 ns 0.0021 ns 0.75
ValidateAscii Main 16384 324.580 ns 0.3885 ns 1.00
ValidateAscii PR 16384 171.401 ns 0.0528 ns 0.53

BDN_Artifacts.zip

@neon-sunset

This comment was marked as outdated.

@dotnet dotnet deleted a comment from EgorBot Jul 9, 2024
@dotnet dotnet deleted a comment from EgorBot Jul 9, 2024
@EgorBot
Copy link

EgorBot commented Jul 9, 2024

Benchmark results on Intel
BenchmarkDotNet v0.13.12, Ubuntu 22.04.4 LTS (Jammy Jellyfish)
Intel Xeon Platinum 8370C CPU 2.80GHz, 1 CPU, 8 logical and 4 physical cores
  Job-IRPCYL : .NET 9.0.0 (42.42.42.42424), X64 RyuJIT AVX-512F+CD+BW+DQ+VL+VBMI
  Job-HIDRWE : .NET 9.0.0 (42.42.42.42424), X64 RyuJIT AVX-512F+CD+BW+DQ+VL+VBMI
Method Toolchain Length Mean Error Ratio
ValidateAscii Main 1 4.712 ns 0.0203 ns 1.00
ValidateAscii PR 1 3.828 ns 0.0180 ns 0.81
ValidateAscii Main 2 4.492 ns 0.0068 ns 1.00
ValidateAscii PR 2 3.904 ns 0.0065 ns 0.87
ValidateAscii Main 3 4.923 ns 0.0087 ns 1.00
ValidateAscii PR 3 4.172 ns 0.0233 ns 0.85
ValidateAscii Main 4 4.313 ns 0.0294 ns 1.00
ValidateAscii PR 4 3.945 ns 0.0329 ns 0.91
ValidateAscii Main 5 4.739 ns 0.0089 ns 1.00
ValidateAscii PR 5 4.049 ns 0.0112 ns 0.85
ValidateAscii Main 6 4.494 ns 0.0056 ns 1.00
ValidateAscii PR 6 4.263 ns 0.0196 ns 0.95
ValidateAscii Main 7 5.355 ns 0.0211 ns 1.00
ValidateAscii PR 7 4.102 ns 0.0139 ns 0.77
ValidateAscii Main 8 4.714 ns 0.0135 ns 1.00
ValidateAscii PR 8 3.894 ns 0.0118 ns 0.83
ValidateAscii Main 12 4.848 ns 0.0116 ns 1.00
ValidateAscii PR 12 3.875 ns 0.0107 ns 0.80
ValidateAscii Main 16 5.186 ns 0.0170 ns 1.00
ValidateAscii PR 16 3.889 ns 0.0133 ns 0.75
ValidateAscii Main 25 5.818 ns 0.0089 ns 1.00
ValidateAscii PR 25 5.105 ns 0.0159 ns 0.88
ValidateAscii Main 50 6.527 ns 0.0339 ns 1.00
ValidateAscii PR 50 5.260 ns 0.0022 ns 0.81
ValidateAscii Main 70 6.859 ns 0.0139 ns 1.00
ValidateAscii PR 70 6.169 ns 0.0032 ns 0.90
ValidateAscii Main 200 9.229 ns 0.0114 ns 1.00
ValidateAscii PR 200 6.453 ns 0.0147 ns 0.70
ValidateAscii Main 512 9.452 ns 0.0124 ns 1.00
ValidateAscii PR 512 10.454 ns 0.0289 ns 1.11
ValidateAscii Main 16384 162.361 ns 0.0240 ns 1.00
ValidateAscii PR 16384 163.184 ns 0.0489 ns 1.01

BDN_Artifacts.zip

@neon-sunset
Copy link
Contributor Author

@EgorBot -amd -arm64

using System.Text.Unicode;
using BenchmarkDotNet.Attributes;
using BenchmarkDotNet.Running;

// Let's try again
BenchmarkRunner.Run<AsciiBench>(args: args);

public class AsciiBench
{
    [ParamsSource(nameof(Lengths))]
    public int Length;
    public IEnumerable<int> Lengths => [..Enumerable.Range(1, 8), 12, 16, 25, 50, 70, 200, 512, 16384];

    byte[] ascii = [];

    [GlobalSetup]
    public void Setup()
    {
        ascii = new byte[Length];
        ascii.AsSpan().Fill((byte)'f');
    }

    [Benchmark]
    public bool ValidateAscii()
    {
        return Utf8.IsValid(ascii);
    }
}

@EgorBot
Copy link

EgorBot commented Jul 15, 2024

Benchmark results on Intel
BenchmarkDotNet v0.13.12, Ubuntu 22.04.4 LTS (Jammy Jellyfish)
Intel Xeon Platinum 8370C CPU 2.80GHz, 1 CPU, 8 logical and 4 physical cores
  Job-DGHARR : .NET 9.0.0 (42.42.42.42424), X64 RyuJIT AVX-512F+CD+BW+DQ+VL+VBMI
  Job-CCKSVA : .NET 9.0.0 (42.42.42.42424), X64 RyuJIT AVX-512F+CD+BW+DQ+VL+VBMI
Method Toolchain Length Mean Error Ratio
ValidateAscii Main 1 4.731 ns 0.0199 ns 1.00
ValidateAscii PR 1 3.672 ns 0.0061 ns 0.78
ValidateAscii Main 2 4.471 ns 0.0039 ns 1.00
ValidateAscii PR 2 3.723 ns 0.0122 ns 0.83
ValidateAscii Main 3 4.915 ns 0.0093 ns 1.00
ValidateAscii PR 3 4.162 ns 0.0148 ns 0.85
ValidateAscii Main 4 4.490 ns 0.0198 ns 1.00
ValidateAscii PR 4 3.799 ns 0.0158 ns 0.85
ValidateAscii Main 5 4.794 ns 0.0179 ns 1.00
ValidateAscii PR 5 4.164 ns 0.0142 ns 0.87
ValidateAscii Main 6 4.451 ns 0.0022 ns 1.00
ValidateAscii PR 6 4.129 ns 0.0325 ns 0.93
ValidateAscii Main 7 5.376 ns 0.0222 ns 1.00
ValidateAscii PR 7 4.035 ns 0.0199 ns 0.75
ValidateAscii Main 8 4.867 ns 0.0126 ns 1.00
ValidateAscii PR 8 3.690 ns 0.0047 ns 0.76
ValidateAscii Main 12 4.986 ns 0.0232 ns 1.00
ValidateAscii PR 12 3.690 ns 0.0055 ns 0.74
ValidateAscii Main 16 5.321 ns 0.0194 ns 1.00
ValidateAscii PR 16 3.689 ns 0.0044 ns 0.69
ValidateAscii Main 25 5.852 ns 0.0103 ns 1.00
ValidateAscii PR 25 4.616 ns 0.0193 ns 0.79
ValidateAscii Main 50 6.172 ns 0.0012 ns 1.00
ValidateAscii PR 50 4.674 ns 0.0303 ns 0.76
ValidateAscii Main 70 6.872 ns 0.0179 ns 1.00
ValidateAscii PR 70 5.886 ns 0.0735 ns 0.86
ValidateAscii Main 200 7.563 ns 0.0244 ns 1.00
ValidateAscii PR 200 6.432 ns 0.0128 ns 0.85
ValidateAscii Main 512 8.873 ns 0.1425 ns 1.00
ValidateAscii PR 512 7.488 ns 0.0179 ns 0.84
ValidateAscii Main 16384 162.573 ns 0.0717 ns 1.00
ValidateAscii PR 16384 80.495 ns 0.0317 ns 0.50

BDN_Artifacts.zip

@GrabYourPitchforks
Copy link
Member

FYI the original implementation of this method waaaaay back when did indeed use byrefs instead of raw pointers, but IIRC @jkotas had concerns with that approach and much preferred the simplicity of using true pointers. I'd recommend asking him for his thoughts on this PR.

@jkotas
Copy link
Member

jkotas commented Aug 7, 2024

Yes, I still think that unmanaged pointers are less bug prone than byref arithmetics. We have been using byref arithmetics all over the place just because of we can - I have stopped commenting on it.

@neon-sunset
Copy link
Contributor Author

neon-sunset commented Aug 7, 2024

My idea was to move the ASCII path over to byrefs first to allow moving the UTF-8 path to them later (once Egor's PR for adopting Lemire's UTF-8 validation is done, possibly adding ARM64 path if it doesn't happen in scope of it). Once done, this would have allowed to improve call codegen around Utf8.IsValid and GetIndexOfFirstInvalidUtf8Sequence that it forwards to for external callers - right now it is quite messy for something that is just Handle(span) -> nint and result < 0. Additionally, I hope it makes the path more GC friendly under heavy allocation traffic with reduced pinning.

However, if you think this PR should be changed back to pointer arithmetics - please let me know. It was an optional goal of the change, and I'm happy as long as the performance of this hot path improves. Thanks!

@neon-sunset
Copy link
Contributor Author

CC @jeffhandley

As an area owner, PTAL when you have time. Thanks!

@tarekgh
Copy link
Member

tarekgh commented Sep 23, 2024

@tannergooding will you continue reviewing this change?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area-System.Text.Encoding community-contribution Indicates that the PR has been added by a community member
Projects
None yet
Development

Successfully merging this pull request may close these issues.

GetIndexOfFirstNonAsciiByte_Vector path in Ascii.Utility.cs is never exercised for AdvSimd/SSE4.1
6 participants