cmd/compile: improve inline memrun decision in generated alg eq routines #38494

josharian · 2020-04-16T21:56:01Z

cmd/compile/internal/gc/alg.go func geneq contains this code:

// Find maximal length run of memory-only fields.
size, next := memrun(t, i)
if s := fields[i:next]; len(s) <= 2 {
	// Two or fewer fields: use plain field equality.
	for _, f := range s {
		and(eqfield(np, nq, f.Sym))
	}
} else {
	// More than two fields: use memequal.
	and(eqmem(np, nq, f.Sym, size))
}

The idea here is to inline small, simple memory comparisons instead of calling runtime.memequal.

However, using the number of fields isn't a great heuristic. If one of those fields is (say) [64]uint64, inlining it turns into a runtime call anyway. If one of those fields is (say) struct { x, y, z, w uint64 }, we end up inlining four comparisons instead of the intended one. The heuristic fails in the other direction too: If there are eight fields, but each has type byte, we'd be better off doing a single uint64 comparison than calling runtime.memequal.

A better plan would be to calculate how much work we actually have to do and then do it optimally, either with the minimal number of inlined comparisons or with a single runtime call.

Roughly, if we need to compare n bytes, figure out how many comparisons are required to accomplish that, and then use that as a threshold. (We already do something like that in walk.go's walkCompareString when comparing a non-constant string to a constant string.) One complication is alignment. On architectures with alignment requirements, comparing 8 bytes could require (say) a 2 byte comparison + a 4 byte comparison + a 2 byte comparison. The other thing that requires care is pointers: Given struct { f float32; p *int; u uint32 }, we don't want to pull half of p into one register and the other half of p into another register.

Ideally, we would also use this work calculation in our decision about whether to inline equality comparisons in the first place or call a generated routine (walk.go func walkcompare); we currently count fields there, too.

I believe that this would be challenging but possible for someone with relatively little compiler experience, so marking as help wanted (and spelling things out in more detail than usual).

The text was updated successfully, but these errors were encountered:

renthraysk · 2020-04-17T15:07:27Z

For small n (<16), the minimum number of comparisons is the number of bits set in n.

josharian · 2020-04-17T15:35:40Z

@renthraysk I don't follow. What is n? Are you taking into account that some instructions can compare multiple bytes at once? And that there are alignment considerations?

renthraysk · 2020-04-18T10:17:23Z

n is the number of bytes to compare. (Or size returned by memrun)
The number of comparisons (on 64 bit with no alignment requirements arch) would be

cost := n / 8 + bits.OnesCount(uint(n&7))

n / 8 is the number of full wide comparisons (eg x86 CMPQ)
Each of the lower 3 bits of n refers to CMPL, CMPW, CMPB respectively.

With alignment restrictions, means have to compare between [0, 7] bytes, w, first to gain alignment.

n -= w
cost := bits.OnesCount(w) + n / 8 + bits.OnesCount(uint(n&7))

Or am I missing something? (My familiarity with the go compiler internals is low)

josharian · 2020-04-18T14:29:39Z

You can do a bit better than that on unaligned arches using overlapping comparisons. E.g. you can do a 15 byte comparison by comparing bytes 0–7 and then bytes 8–15, instead of 0–7, 8–11, 12–14, 14–15. And even on unaligned arches, you need to treat pointers as having alignment requirements.

gopherbot · 2020-04-26T15:32:43Z

Change https://golang.org/cl/230203 mentions this issue: cmd/compile: make runtime calls last in eq algs

josharian · 2020-04-26T15:43:51Z

Related (possibly a prerequisite?): #38674.

type T struct { f float64 a [64]uint64 g float64 } Prior to this change, the generated equality algorithm for T was: func eqT(p, q *T) bool { return p.f == q.f && runtime.memequal(p.a, q.a, 512) && p.g == q.g } In handwritten code, we would normally put the cheapest checks first. This change takes a step in that direction. We now generate: func eqT(p, q *T) bool { return p.f == q.f && p.g == q.g && runtime.memequal(p.a, q.a, 512) } For most types, this also generates considerably shorter code. Examples: runtime .eq."".mstats 406 -> 391 (-3.69%) .eq.""._func 114 -> 101 (-11.40%) .eq."".itab 115 -> 102 (-11.30%) .eq."".scase 125 -> 116 (-7.20%) .eq."".traceStack 119 -> 102 (-14.29%) .eq."".gcControllerState 169 -> 161 (-4.73%) .eq."".sweepdata 121 -> 112 (-7.44%) However, for types in which we make unwise choices about inlining memory-only comparisons (#38494), this generates longer code. Example: cmd/internal/obj .eq."".objWriter 211 -> 214 (+1.42%) .eq."".Addr 185 -> 187 (+1.08%) Fortunately, such cases are not common. Change-Id: I47a27da93c1f88ec71fa350c192f36b29548a217 Reviewed-on: https://go-review.googlesource.com/c/go/+/230203 Run-TryBot: Josh Bleecher Snyder <[email protected]> TryBot-Result: Gobot Gobot <[email protected]> Reviewed-by: Brad Fitzpatrick <[email protected]>

type T struct { f float64 a [64]uint64 g float64 } Prior to this change, the generated equality algorithm for T was: func eqT(p, q *T) bool { return p.f == q.f && runtime.memequal(p.a, q.a, 512) && p.g == q.g } In handwritten code, we would normally put the cheapest checks first. This change takes a step in that direction. We now generate: func eqT(p, q *T) bool { return p.f == q.f && p.g == q.g && runtime.memequal(p.a, q.a, 512) } For most types, this also generates considerably shorter code. Examples: runtime .eq."".mstats 406 -> 391 (-3.69%) .eq.""._func 114 -> 101 (-11.40%) .eq."".itab 115 -> 102 (-11.30%) .eq."".scase 125 -> 116 (-7.20%) .eq."".traceStack 119 -> 102 (-14.29%) .eq."".gcControllerState 169 -> 161 (-4.73%) .eq."".sweepdata 121 -> 112 (-7.44%) However, for types in which we make unwise choices about inlining memory-only comparisons (golang#38494), this generates longer code. Example: cmd/internal/obj .eq."".objWriter 211 -> 214 (+1.42%) .eq."".Addr 185 -> 187 (+1.08%) Fortunately, such cases are not common. Change-Id: I47a27da93c1f88ec71fa350c192f36b29548a217 Reviewed-on: https://go-review.googlesource.com/c/go/+/230203 Run-TryBot: Josh Bleecher Snyder <[email protected]> TryBot-Result: Gobot Gobot <[email protected]> Reviewed-by: Brad Fitzpatrick <[email protected]>

gopherbot · 2022-06-10T18:43:06Z

Change https://go.dev/cl/411674 mentions this issue: cmd/compile: new inline heuristic for struct compares

nicktrav · 2022-09-28T21:32:25Z

We've noticed that in go1.19 there can be a sizable performance regression due to the heuristics introduced in 3c29aca (more details here). These regressions are rectified by 6605686, which resolved this issue.

I didn't see 6605686 on release-branch.go1.19. Is that something that could be backported? Happy to open a new issue for that, if that's preferable. cc: @derekparker

randall77 · 2022-09-28T22:00:29Z

We don't generally backport performance fixes. See https://github.com/golang/go/wiki/MinorReleases

derekparker · 2022-09-28T22:30:12Z

@nicktrav apologies for the temporary performance regression as the optimization work got split across multiple releases. Once 1.19.2 is released it should contain that latest commit which rounds out the work.

EDIT: Sorry I was clearly mistaken, no backport means it won't hit the 1.19 release at all and will wait for 1.20. I was hoping to get both CLs in for the 1.19 release but didn't make the freeze.

josharian added Performance help wanted labels Apr 16, 2020

josharian added this to the Unplanned milestone Apr 16, 2020

andybons added the NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. label Apr 17, 2020

josharian mentioned this issue Apr 26, 2020

cmd/compile: unify and improve struct/array equality comparisons #38674

Closed

derekparker mentioned this issue Jun 10, 2022

cmd/compile: new inline heuristic for struct compares #53326

Closed

gopherbot added the compiler/runtime Issues related to the Go compiler and/or runtime. label Jul 13, 2022

mknyszek added this to Go Compiler / Runtime Jul 13, 2022

mknyszek removed this from Go Compiler / Runtime Jul 13, 2022

gopherbot closed this as completed in 6605686 Sep 2, 2022

golang locked and limited conversation to collaborators Sep 29, 2023

gopherbot added the FrozenDueToAge label Sep 29, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

cmd/compile: improve inline memrun decision in generated alg eq routines #38494

cmd/compile: improve inline memrun decision in generated alg eq routines #38494

josharian commented Apr 16, 2020

renthraysk commented Apr 17, 2020

josharian commented Apr 17, 2020

renthraysk commented Apr 18, 2020 •

edited

Loading

josharian commented Apr 18, 2020

gopherbot commented Apr 26, 2020

josharian commented Apr 26, 2020

gopherbot commented Jun 10, 2022

nicktrav commented Sep 28, 2022

randall77 commented Sep 28, 2022

derekparker commented Sep 28, 2022 •

edited

Loading

cmd/compile: improve inline memrun decision in generated alg eq routines #38494

cmd/compile: improve inline memrun decision in generated alg eq routines #38494

Comments

josharian commented Apr 16, 2020

renthraysk commented Apr 17, 2020

josharian commented Apr 17, 2020

renthraysk commented Apr 18, 2020 • edited Loading

josharian commented Apr 18, 2020

gopherbot commented Apr 26, 2020

josharian commented Apr 26, 2020

gopherbot commented Jun 10, 2022

nicktrav commented Sep 28, 2022

randall77 commented Sep 28, 2022

derekparker commented Sep 28, 2022 • edited Loading

renthraysk commented Apr 18, 2020 •

edited

Loading

derekparker commented Sep 28, 2022 •

edited

Loading