Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cmd/compile: recognize map-clearing range idiom #20138

Closed
josharian opened this issue Apr 26, 2017 · 13 comments
Closed

cmd/compile: recognize map-clearing range idiom #20138

josharian opened this issue Apr 26, 2017 · 13 comments

Comments

@josharian
Copy link
Contributor

Sometimes, maps get emptied for re-use. Idiom:

for k := range m {
  delete(m, k)
}

This is pretty inefficient, though; it involves juggling iterators and making one delete call per map element. This could be recognized by cmd/compile and converted into a (newly added) runtime mapclear call, which could just walk and clear the buckets directly.

We could also decide to have it free all extra overflow buckets that have been allocated for it, which would ameliorate some of the problem in #20135. (The main bucket array and pre-allocated overflow buckets would remain, but that's desirable; emptying a map and re-filling is an intentional way to reuse those.) This would make the implementation of mapclear very efficient, mostly just a typedmemclr of hmap.buckets.

cc @randall77 @martisch

@josharian josharian added this to the Go1.10 milestone Apr 26, 2017
@bradfitz
Copy link
Contributor

Another way to do that is:

m = map[K]V{}

How common is this?

I think more common is:

for k, v := range m { // m == work to do
  doSomeWork(k, v)
  delete(m, k)
}

... but that probably wouldn't match.

@josharian
Copy link
Contributor Author

m = map[K]V{}

This is different. In the range code the map is emptied, in this, it is replaced with a new, empty map. Consider

x := m
// range clear or assign new empty map
x[0] = "a"
println(m[0])

These will yield different results based on the range clear vs assignment.

And in practice, they have very different performance characteristics, which is why the range loop gets used instead of just abandoning the old map.

How common is this?

Flipping through the cmd subdir, I see:

  • cmd/gofmt/rewrite.go lines 70-20
  • cmd/compile/internal/ssa/regalloc.go lines 1766-1768
  • cmd/go/internal/load/pkg.go lines 280-282
  • cmd/go/internal/load/pkg.go lines 1670-1672

I think more common is:

If the range delete idiom were fast, I suspect that this would get rewritten into two loops in performance-critical code, one to do the work and one to empty the map.

@martisch
Copy link
Contributor

I had a use case for this too and would like to work on it as discussed with josharian.
Might even still make it in for 1.9.

The route of clearing all existing buckets and removing extra overflow buckets sounds good.

just supporting:

for k := range m {
  delete(m, k)
}

for now is consistent with slice clearing and provides an easy idiom to fast clearing of maps and reusing the memory.

for k, v := range m { // m == work to do
  doSomeWork(k, v)
  delete(m, k)
}

can just be written as

for k, v := range m {
  doSomeWork(k, v)
}

for k, v := range m {
  delete(m, k)
}

if performance is critical. For anything additional in the loop it is currently to complex to detemine in the frontend that nothing else changes the map.

i think the side effects of the code above are less surprising than if the result (map backing memory)
of the below code is dependent on the size of m.

m = map[K]V{}

And it seems inconsistent with:

s := make([]int, 0, 10)
s = []int{}

not resulting in cap(s) == 10 currently.

@martisch martisch self-assigned this Apr 26, 2017
@randall77
Copy link
Contributor

Two tricky cases:

func g(m map[int]int) int {
	var k int
	for k = range m {
		delete(m, k)
	}
	return k
}

We need to avoid the optimization here because k is used after the loop.
(Or have the mapclear call return a key? But that doesn't work for len(m)==0.)
This is easier for ranging over slices because we just assign k=len(s)-1 and if not used it gets deadcode eliminated.

func f(m map[float64]int) {
	for k := range m {
		delete(m, k)
	}
}

If there are NaNs as keys, then their entries should survive the loop.

@odeke-em
Copy link
Member

To add to @randall77's mentioned tricky cases in #20138 (comment), it might seem a little obvious, but there is another one in which doWork is not a pure function in regards to m, or actually does something with m

doWork := func(k, v) {
   if m[k+1] % 2 == 0 {
      return
   }
   // do the work or even modifies m
}

for k, v := range m {
   doWork(k, v)
   delete(m, k)
}

in this case we just can't convert it to

for k, v := range m {
  doWork(k, v)
}
for k := range m {
  delete(m, k)
}
// or
runtime_mapclear(m)

@martisch
Copy link
Contributor

@odeke-em indeed. As a start it suffices to recognize the simple case:

for k, v := range m {
  delete(m, k)
}

as this is consistent with the detection capability of the slice/array clearing idiom.

Later we can think about making all the clearing idioms more sophisticated if warranted.

@davecheney
Copy link
Contributor

davecheney commented Jul 30, 2017 via email

@bradfitz
Copy link
Contributor

@davecheney, that would require it can prove that no references to the old m map value exist in a live state anywhere. I wouldn't hold my breath waiting for such things. (e.g. #2205)

Whereas just recognizing the for-delete loop doesn't require any new analysis.

@davecheney
Copy link
Contributor

davecheney commented Jul 31, 2017 via email

@randall77
Copy link
Contributor

m := map[int]int{1:2,3:4}
c <- m
m = map[int]int{}

You can't clear m at the third line. It would modify the original map that we sent over the channel.

@mvdan
Copy link
Member

mvdan commented Dec 16, 2017

Could something similar be done for map copies? That is, recognising:

m2 := make(map[K]V, len(m1)) // or no capacity
for k, v := range m1 {
    m2[k] = v
}

A quick search yields a dozen matches in the tree (ignore the one with Header):

$ gogrep '~ $m2 = $_; $*_; for $k, $v := range $(m1 is(map)) { $m2[$k] = $v }' std cmd
archive/tar/common.go:680:4: h.Xattrs = make(map[string]string); for k, v := range sys.Xattrs { h.Xattrs[k] = v; }
archive/tar/common.go:692:4: h.PAXRecords = make(map[string]string); for k, v := range sys.PAXRecords { h.PAXRecords[k] = v; }
cmd/fix/typecheck.go:192:7: cfg1.Type = make(map[string]*Type); for k, v := range cfg.Type { cfg1.Type[k] = v; }
cmd/vendor/github.com/google/pprof/driver/driver.go:276:2: isrcs := MappingSources{}; for m, s := range srcs { isrcs[m] = s; }
encoding/gob/type.go:272:2: builtinIdToType = make(map[typeId]gobType); for k, v := range idToType { builtinIdToType[k] = v; }
encoding/gob/type.go:754:2: newm := make(map[reflect.Type]*typeInfo); m, _ := typeInfoMap.Load().(map[reflect.Type]*typeInfo); for k, v := range m { newm[k] = v; }
encoding/json/encode.go:1285:2: newM := make(map[reflect.Type][]field, len(m)+1); for k, v := range m { newM[k] = v; }
net/http/server.go:3127:3: dst := w.Header(); for k, vv := range tw.h { dst[k] = vv; }
net/http/transport.go:512:2: newMap := make(map[string]RoundTripper); for k, v := range oldMap { newMap[k] = v; }
net/internal/socktest/switch.go:47:2: tab := make(Sockets, len(sw.sotab)); for i, s := range sw.sotab { tab[i] = s; }
runtime/pprof/label.go:40:2: childLabels := make(labelMap); parentLabels := labelValue(ctx); for k, v := range parentLabels { childLabels[k] = v; }
sync/atomic/value_test.go:189:3: m2 := make(Map); for k, v := range m1 { m2[k] = v; }
sync/map_reference_test.go:146:2: dirty := make(map[interface{}]interface{}, len(clean)+1); for k, v := range clean { dirty[k] = v; }

For the sake of completeness, here's the command one can use to find instances of the map-clearing range idiom:

$ gogrep 'for $k := range $m { delete($m, $k) }' std cmd
cmd/compile/internal/gc/ssa.go:373:2: for n := range s.fwdVars { delete(s.fwdVars, n); }
cmd/compile/internal/ssa/regalloc.go:1799:2: for k := range e.contents { delete(e.contents, k); }
cmd/go/internal/load/pkg.go:306:2: for name := range packageCache { delete(packageCache, name); }
cmd/go/internal/load/pkg.go:1275:2: for name := range cmdCache { delete(cmdCache, name); }
cmd/gofmt/rewrite.go:70:3: for k := range m { delete(m, k); }

@martisch
Copy link
Contributor

Still coding up the solution to this one for 1.11.

Map copies could be done but it would need to be considered how this will interact when the copied map is changed during the copy so the new copied map does not end up corrupted. Also needs to honor the "If a map entry that has not yet been reached is removed during iteration, the corresponding iteration value will not be produced." semantics.

Best to open a new issue to discuss the details.

Instead of doing all this pattern matching which also is made harder by usually being applied after the order pass we might want to consider having a generic clear and shallow copy function working on maps (or all types) for go 2. Those functions could also handle e.g. the NaN key case differently (clear the entry) to allow for a simpler and faster implementation.

@gopherbot
Copy link
Contributor

Change https://golang.org/cl/110055 mentions this issue: cmd/compile: optimize map-clearing range idiom

josharian referenced this issue in google/starlark-go Dec 14, 2018
This complements the dynamic checks, which is already implemented
in setArgs and in UnpackeArgs.

Updates bazelbuild/starlark#21

Change-Id: If18c9b1f228149b67eca9637050e76e94bd9ff85
@golang golang locked and limited conversation to collaborators May 8, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

8 participants