-
Notifications
You must be signed in to change notification settings - Fork 17.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
cmd/gc: optimized map[string] lookup from []byte key #3512
Comments
In general this might introduce races. If you pass in a []byte to the key lookup, there is no guarantee that something is not editing the bytes during the lookup. But it could probably be done so that it would just fail the lookup instead of causing corruption. Labels changed: added priority-later, removed priority-triage. Owner changed to @rsc. Status changed to Thinking. |
Failing the lookup in that case would be fine & expected. In fact, I think the naive implementation would get that for free: if the memory is scanned once for the hashcode and once for the equality, it seems like it just might not match if the []byte changed. I don't see how it could do something worse. The []byte couldn't be freed by GC and reclaimed to the OS in the meantime because the faked non-escaping stringheader would still pin it. |
Answering Dmitry's question from https://golang.org/issue/5160?c=4 here: Consider: m := make(map[string]T) var b []byte = slice of some buffer go someFunc(b) t, ok := m[string(b)] I would like: 1) string(b) to not make a copy. It should instead make a non-escaping fake String header to pass to the map lookup function, using the same byte data & length as the []byte b. 2) the map lookup to be guaranteed to not crash, even if b is concurrently mutated. 3) t to be valid, iff ok == true. I don't think that would violate any of the current Go semantics, since any ordering between mutations from "go someFunc" and string(b) are already not specified by the memory model. Correct? |
This bug has nothing to do with interning except that if I were to implement interning myself in pure Go, without the help of the runtime (which is issue #5160), then I'd want to lookup interned strings from []byte from the network, without making copies. That is, I'd have: var internTable map[string]string // identity map. from key to same value. But I'd want to access it like: var b []byte = slice of bytes from the network in a bytes.Buffer _ = internTable[string(b)] ... reusing memory from another similar request (e.g. caching HTTP Request MIME headers) I'm only talking about indexing m[string(b)] in this bug, for Go->runtime maps. For Go->Go, that is issue #2205 (cmd/gc: read-only escape analysis and avoiding string -> []byte copies) |
Please, ignore everything about interning. That was just one example of a user of this. I think you're making this sound more complicated than it is. And I think one of us is using the wrong vocabulary. (I don't want to "intern []byte" because I can't intern a mutable thing) I just want m[string(byteSlice)] to avoid a copy and to instead create a fake String header temporary. It never escapes the hashmap functions, so it's safe. And no language semantics change which weren't already racy and undefined. Example: m := map[string]int{ "X": 1, "Y": 2, } mutate := func(b []byte) { b[0] = 'Y' } key := []byte{'X'} go mutate(key) value, ok := m[string(key)] What is value & ok? 1, true? 2, true? 0, false? It's already undefined. |
Patch at https://golang.org/cl/8090046 Owner changed to ---. |
This can be seen as a compiler optimization, a specialized form of escape analysis. Given a []byte b, a map lookup of string(b) can use the bytes directly without making a copy. In general, we can do this optimization for any function call where the compiler knows the complete code of the function (the function does not make any calls to unknown functions) and where the string does not escape the function (this is not something we normally check for) and where the function does not execute any synchronization operations (also something we do not normally check for, but given the lack of function calls this means no channel operations). Map lookup is a special case of a function call where these attributes are known. We could also view it as a language extension, but I don't think that is a good idea. It would be similar to the way copy/append accept string as well as []byte, but we could only permit it for a map lookup, not for a map insertion. I agree that this optimization doesn't introduce any races that were not already present, but we still need to consider what should happen for a racy program. Right now a racy program will yield an unpredictable string, but the code to convert from []byte to string is quite simple and will not crash or introduce memory corruption. We would need to preserve that property. Looking at the strhash and strequal functions, I think we would be OK. So I don't see any strong objections to this optimization, though of course somebody would have to actually do the work. Labels changed: removed priority-later. Status changed to Accepted. |
Thanks for the reply. > So I don't see any strong objections to this optimization, Russ has previously argued that performance optimizations are de-facto language changes that people come to rely on and must then be implemented widely. I argue App Engine's lack of unsafe + gccgo vs gc already introduce three big worlds of differing performance characteristics. > though of course somebody would have to actually do the work. I did it earlier. See comment #10 above. I barely close to no feedback on it, though. What I'd like is a decision that we can actually do this. I very much don't want a language change. In my CL above, I just looked for m[string(b)] and made a fake stringheader from b. |
In September, I wrote back to a mail thread with you (Brad) and Rob (on 9/23/2013; subject "m[string(byteSlice))] CL"): --- I have thought a bit more about this. I believe that we could do it more generally, but that we should not do it just for this special case. Specifically, if we special case x = m[string(b)], then s := string(b) x = m[s] y = m2[s] will actually be slower than x = m[string(b)] y = m2[string(b)] and that bothers me quite a bit. However, it should be possible to do something a little like escape analysis to see that between the definition of s and its uses, there are no synchronization points and no assignments that might legitimately allow s to contain different bytes from b, at which point the optimization would apply. If we did something like that, it would make things faster while still keeping performance a bit more predictable. I don't know when this would happen. Maybe for Go 1.3. --- You asked why it bothered me and I wrote: --- because pulling repeated computed subexpressions into a variable should be faster than not. i don't know of any case today where that's not true. --- I don't think that's "barely close to no feedback". My position is still what it was in September. I apologize for not seeing this issue until today. I did not see this issue due to over-aggressive mail filtering, which I have fixed[*]. I went looking for the issue because Ian asked me about it off-issue. (So no, this thing was not on.) I will look into how hard it is to do the analysis I wrote about in September, but I can't make any promises. [*] Because I was auto-CC'ed on every bug, I filtered them all away; I've removed my auto-CC and am now on golang-bugs, which I filter away, but if I'm explicitly CC'ed on a bug it will land in my inbox. |
This issue was closed by revision f5f5a8b. Status changed to Fixed. |
This issue was updated by revision 8072f46. LGTM=rsc R=rsc, dave CC=golang-codereviews, iant https://golang.org/cl/84230043 |
This issue was closed.
Sign up for free
to subscribe to this conversation on GitHub.
Already have an account?
Sign in.
The text was updated successfully, but these errors were encountered: