unicode/utf8: potential performance improvement of UTF8 validation #69915

romshark · 2024-10-16T22:07:02Z

Proposal Details

I've been experimenting with different kinds of optimization techniques in order to speed up utf8.Valid and utf8.ValidString and I'd like to share my findings that appear to be low-hanging fruit, at least on ARM64:

The change is minimal but yields performance improvements of up to 41% on Apple M1 Max for all-UTF8 inputs while not showing any significant negative side-effects. Though on AMD Ryzen 7 5700x the geomean is significantly lower at just -1.43%, with some benchmarks even showing significant performance degradation ⚠️.

The test setup with benchmarks can be found here: https://github.com/romshark/utf8valid

Relation to Other Proposals

It might be related to #47120 but I haven't had the opportunity to dig into the details of Russ' proposal. At first glance it looks like Russ tackled exactly that problem and my proposal may just be a duplicate.

Compared to #63347 this proposal doesn't require changes of significant complexity, it's a 9-line code change.

gabyhelp · 2024-10-16T22:07:16Z

Related Issues and Documentation

_{(Emoji vote if this was helpful or unhelpful; more detailed feedback welcome in this discussion.)}

ianlancetaylor · 2024-10-17T00:38:06Z

I don't see any API change here, so taking this out of the proposal process.

randall77 · 2024-10-18T00:28:32Z

You've gotten performance improvements but I don't see any description of the changes you made to get those performance improvements (unless the changes are all in that screenshot?).

It would be nice to post a CL, together with some Benchmarks that demonstrate the improvements (if those benchmarks are not already present in the utf8 package).

adonovan · 2024-10-18T18:34:15Z

@randall77 See the linked repo, which contains the patched code and the benchmark results. The diff is hard to see (it's not even in the history of the repo), but it really is just this:

@@ -473,16 +473,22 @@ func Valid(p []byte) bool {
                        return false // Short or invalid.
                }
                accept := acceptRanges[x>>4]
+               // The change is here. Instead of reusing `size` constants 2, 3 and 4
+               // are used which leads to performance improvement of up to 41%.
                if c := p[i+1]; c < accept.lo || accept.hi < c {
                        return false
                } else if size == 2 {
+                       i += 2
                } else if c := p[i+2]; c < locb || hicb < c {
                        return false
                } else if size == 3 {
+                       i += 3
                } else if c := p[i+3]; c < locb || hicb < c {
                        return false
+               } else {
+                       i += 4
                }
-               i += size
+
        }
        return true
 }
@@ -519,16 +525,21 @@ func ValidString(s string) bool {
                        return false // Short or invalid.
                }
                accept := acceptRanges[x>>4]
+               // The change is here. Instead of reusing `size` constants 2, 3 and 4
+               // are used which leads to performance improvement of up to 41%.
                if c := s[i+1]; c < accept.lo || accept.hi < c {
                        return false
                } else if size == 2 {
+                       i += 2
                } else if c := s[i+2]; c < locb || hicb < c {
                        return false
                } else if size == 3 {
+                       i += 3
                } else if c := s[i+3]; c < locb || hicb < c {
                        return false
+               } else {
+                       i += 4
                }
-               i += size
        }
        return true
 }

ldemailly · 2024-10-19T18:12:30Z

I guess the speedup comes from https://cs.opensource.google/go/go/+/refs/tags/go1.23.2:src/unicode/utf8/utf8.go;l=508

		size := int(x & 7)

and the compiler can't know that 5,6,7 are never supposed to happen (vs hardcoding += 4)
(thanks to how first[] is constructed)

nor 1 because of

		if si < RuneSelf {

ldemailly · 2024-10-19T18:17:58Z

I think you should make a CL but also add comments to explain why it works / why +4 is the only possible case in the else

also I wonder if you could benchmark that vs a switch (which would be more readable and equivalent I think)

AlexanderYastrebov · 2024-10-20T11:04:42Z

It might be related to #47120 but I haven't had the opportunity to dig into the details of Russ' proposal. At first glance it looks like Russ tackled exactly that problem and my proposal may just be a duplicate.

DFA implementation from #47120 looks much simpler than existing one.
It would be interesting to see how this change benchmark results compare to DFA results.

randall77 · 2024-10-21T20:27:48Z

I'm not sure I understand how that change makes things any faster, but if there are benchmarks that demonstrate it I'm ok with it.

An aside, feel free to ignore...

size is already needed in a register for the conditional, so it isn't clear to me why += const is any faster than += register. Both are identical speedwise on all chips I've seen. Maybe because we have to merge from each case back to a single += instruction before jumping to the top of the loop, instead of jumping to the top of the loop directly? It might be worth trying += size in each if block instead of after, just to see what happens. Not that the final code should do that, but just to tease out why there is a speed improvement.

romshark added the Proposal label Oct 16, 2024

gopherbot added this to the Proposal milestone Oct 16, 2024

romshark changed the title ~~proposal: Improve performance of utf8.Valid and utf8.ValidString on ARM64~~ observation: Potential performance improvement of UTF8 validation Oct 16, 2024

ianlancetaylor changed the title ~~observation: Potential performance improvement of UTF8 validation~~ unicode/utf8: potential performance improvement of UTF8 validation Oct 17, 2024

ianlancetaylor added NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. and removed Proposal labels Oct 17, 2024

ianlancetaylor modified the milestones: Proposal, Backlog Oct 17, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

unicode/utf8: potential performance improvement of UTF8 validation #69915

unicode/utf8: potential performance improvement of UTF8 validation #69915

romshark commented Oct 16, 2024

gabyhelp commented Oct 16, 2024

ianlancetaylor commented Oct 17, 2024

randall77 commented Oct 18, 2024

adonovan commented Oct 18, 2024 •

edited

Loading

ldemailly commented Oct 19, 2024 •

edited

Loading

ldemailly commented Oct 19, 2024

AlexanderYastrebov commented Oct 20, 2024 •

edited

Loading

randall77 commented Oct 21, 2024

unicode/utf8: potential performance improvement of UTF8 validation #69915

unicode/utf8: potential performance improvement of UTF8 validation #69915

Comments

romshark commented Oct 16, 2024

Proposal Details

Relation to Other Proposals

gabyhelp commented Oct 16, 2024

ianlancetaylor commented Oct 17, 2024

randall77 commented Oct 18, 2024

adonovan commented Oct 18, 2024 • edited Loading

ldemailly commented Oct 19, 2024 • edited Loading

ldemailly commented Oct 19, 2024

AlexanderYastrebov commented Oct 20, 2024 • edited Loading

randall77 commented Oct 21, 2024

adonovan commented Oct 18, 2024 •

edited

Loading

ldemailly commented Oct 19, 2024 •

edited

Loading

AlexanderYastrebov commented Oct 20, 2024 •

edited

Loading