Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Vectorize ProbabilisticMap.IndexOfAny #80963

Merged
merged 9 commits into from
Mar 10, 2023

Conversation

MihaZupan
Copy link
Member

@MihaZupan MihaZupan commented Jan 21, 2023

AVX2

Method Toolchain NewLineFrequency Length Mean Error Ratio
ReplaceLineEndings main 0 10000 11.481 us 0.0569 us 1.00
ReplaceLineEndings pr 0 10000 1.086 us 0.0026 us 0.09
ReplaceLineEndings main 0.05 10000 24.280 us 0.0478 us 1.00
ReplaceLineEndings pr 0.05 10000 13.983 us 0.0596 us 0.58
ReplaceLineEndings main 0.1 10000 35.616 us 0.1959 us 1.00
ReplaceLineEndings pr 0.1 10000 27.322 us 0.3056 us 0.77
ReplaceLineEndings main 0.2 10000 58.372 us 1.0346 us 1.00
ReplaceLineEndings pr 0.2 10000 52.060 us 0.2331 us 0.89
ReplaceLineEndings main 1 10000 160.760 us 1.1764 us 1.00
ReplaceLineEndings pr 1 10000 201.541 us 4.8189 us 1.26

ARM64

Method Length Mean Error
Old 10000 13.122 µs 0.0117 µs
New 10000 3.701 µs 0.0012 µs
Benchmark source
public class ReplaceLineEndingsBenchmark
{
    private string _input;

    [Params(0.0, 0.05, 0.1, 0.2, 1.0)]
    public double NewLineFrequency;

    [Params(10_000)]
    public int Length;

    [GlobalSetup]
    public void Setup()
    {
        char[] input = new char[Length];

        var rng = new Random(42);

        for (int i = 0; i < input.Length; i++)
        {
            if (rng.NextDouble() < NewLineFrequency)
            {
                input[i] = '\n';
            }
            else
            {
                char c;
                do
                {
                    c = (char)rng.Next(0, 65536);
                }
                while ("\r\n\f\u0085\u2028\u2029".Contains(c));

                input[i] = c;
            }
        }

        _input = new string(input);
    }

    [Benchmark]
    public string ReplaceLineEndings() => _input.ReplaceLineEndings();
}

If we care, we could also do this for LastIndexOfAny.

#80297 could be interesting for ARM if we could do a 256-bit lookup instead of blending together smaller lookups.

@MihaZupan MihaZupan added this to the 8.0.0 milestone Jan 21, 2023
@MihaZupan MihaZupan requested a review from stephentoub January 21, 2023 01:54
@MihaZupan MihaZupan self-assigned this Jan 21, 2023
@ghost
Copy link

ghost commented Jan 21, 2023

Tagging subscribers to this area: @dotnet/area-system-memory
See info in area-owners.md if you want to be subscribed.

Issue Details
Method Toolchain NewLineFrequency Length Mean Error Ratio
ReplaceLineEndings main 0 10000 11,172.53 ns 18.391 ns 1.00
ReplaceLineEndings pr 0 10000 2,420.99 ns 5.713 ns 0.22
ReplaceLineEndings main 0.05 10000 24,471.23 ns 167.827 ns 1.00
ReplaceLineEndings pr 0.05 10000 15,055.74 ns 116.653 ns 0.62
ReplaceLineEndings main 0.1 10000 37,305.05 ns 724.469 ns 1.00
ReplaceLineEndings pr 0.1 10000 27,831.78 ns 862.877 ns 0.75
ReplaceLineEndings main 0.2 10000 57,099.06 ns 305.187 ns 1.00
ReplaceLineEndings pr 0.2 10000 47,964.65 ns 562.536 ns 0.84
ReplaceLineEndings main 1 10000 160,707.06 ns 1,484.932 ns 1.00
ReplaceLineEndings pr 1 10000 175,589.93 ns 3,588.352 ns 1.09
Benchmark source
public class ReplaceLineEndingsBenchmark
{
    private string _input;

    [Params(0.0, 0.05, 0.1, 0.2, 1.0)]
    public double NewLineFrequency;

    [Params(10_000)]
    public int Length;

    [GlobalSetup]
    public void Setup()
    {
        char[] input = new char[Length];

        var rng = new Random(42);

        for (int i = 0; i < input.Length; i++)
        {
            if (rng.NextDouble() < NewLineFrequency)
            {
                input[i] = '\n';
            }
            else
            {
                char c;
                do
                {
                    c = (char)rng.Next(0, 65536);
                }
                while ("\r\n\f\u0085\u2028\u2029".Contains(c));

                input[i] = c;
            }
        }

        _input = new string(input);
    }

    [Benchmark]
    public string ReplaceLineEndings() => _input.ReplaceLineEndings();
}

If we care, we could also do this for LastIndexOfAny.

I don't know if it's possible to do something similar efficiently on ARM given that the bloom filter in this case is 256-bit.
We could experiment with different implementations that would vectorize well with Vector128.

Author: MihaZupan
Assignees: MihaZupan
Labels:

area-System.Memory

Milestone: 8.0.0

@MihaZupan MihaZupan marked this pull request as draft January 21, 2023 05:59
@MihaZupan MihaZupan changed the title Vectorize ProbabilisticMap.IndexOfAny on AVX2 Vectorize ProbabilisticMap.IndexOfAny Jan 21, 2023
@MihaZupan MihaZupan marked this pull request as ready for review January 21, 2023 09:37
@MihaZupan
Copy link
Member Author

/azp run runtime-libraries-coreclr outerloop

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@MihaZupan
Copy link
Member Author

MihaZupan commented Jan 26, 2023

Added Vector{128/256}.LoadUnsafe(ref char) and Vector128.ShuffleUnsafe as internal APIs for now and replaced a few existing uses.

@lewing
Copy link
Member

lewing commented Feb 8, 2023

@radekdoulik we should add PackedSimd versions of the Unsafe* functions wasm supports to ease handling all the calling callers. I'm not sure we have a great pattern to simplify the *.IsHardwareAccelerated paths that works for all the cases yet but that is worth considering too.

@MihaZupan
Copy link
Member Author

Any thoughts on this @dotnet/area-system-memory?

@tannergooding tannergooding self-requested a review February 22, 2023 21:21
Comment on lines 167 to 169
Vector128<byte> highNibble = Sse2.IsSupported
? Sse2.ShiftRightLogical(values.AsInt32(), VectorizedIndexShift).AsByte() & Vector128.Create((byte)15)
: AdvSimd.ShiftRightLogical(values, VectorizedIndexShift);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are different because shifting via byte isn't accelerated on xarch, right?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. We do the same thing in IndexOfAnyAsciiSearcher as well, where that also feeds into a shuffle

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍, probably worth opening an issue so an efficient implementation can be done and we can avoid the separate paths here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@MihaZupan
Copy link
Member Author

Failures are: #82575, #81123, #82611

@MihaZupan
Copy link
Member Author

Anything else that should be changed here, or is this one good to merge @tannergooding?

@MihaZupan
Copy link
Member Author

@tannergooding can this be merged? (I'm trying to avoid more merge conflicts as this was already rebased a fair number of times)

@tannergooding
Copy link
Member

Another merge conflict popped up.

Would be good to see some more benchmark numbers to better display where the cutoff is for the index ratio.

Likely also needs a secondary review from someone like @stephentoub given the code its touching.

@stephentoub
Copy link
Member

Thanks!

@stephentoub stephentoub merged commit 7a0b0e1 into dotnet:main Mar 10, 2023
@ghost ghost locked as resolved and limited conversation to collaborators Apr 9, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants