-
Notifications
You must be signed in to change notification settings - Fork 17.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
encoding/json: Unmarshal behavior changed to match its documented behavior about not reusing slice elements #39427
Comments
As far as I understand, this is working as intended, and the fix for issue #21092 is correct. The docs read:
That is, the two elements in the slice value don't matter, because the length is reset to zero, and thus they are entirely ignored. You also bring up a potential fix, along with:
Note that your element type is just Regardless of whether one would consider your initial code correct or not, I think it was never guaranteed to work given the docs. It seemed to rely on an unintended implementation detail which, as far as I can tell, did the opposite of what the docs say. If you think my understanding of the situation is incorrect, please explain why in detail. |
Thanks for your reply @mvdan. According to the documentation, this is indeed an undefined behavior. So the correct way is to use func decode(raw string, args ...interface{}) {
dec := json.NewDecoder(strings.NewReader(raw))
dec.Token()
var i int
for dec.More() {
if i == len(args) {
return
}
err := dec.Decode(args[i])
if err != nil {
panic(err.Error())
}
i++
}
} |
I should note that it's not undefined behavior. The docs document that your original code should not work. The fact that it used to work was a bug, which we're fixing in 1.15. I'm not sure what the correct way to do what you want to do would be; you never told us what you're trying to do, you only provided the solution. Assuming that you want to decode a list of JSON values into specific and different Go types, multiple ideas come to mind:
|
I am curious at how many people this will break. This broke some of our code (and showed up in tests) where we relied on unmarshalling merging values as opposed to completely zeroing and overwriting. Looking back at #21092, @OneOfOne indicated that they also relied on the merging behavior. @robmccoll indicated that this would not break @OneOfOne's usage because only new values appended would be zeroed and existing ones would not, but that's not true. I agree that the docs do indicate that we were relying on broken behavior, and I'm in the process of changing my code. I think that it's worth considering how much other code this change may break. If it's more than expected, i.e. people have come to rely on the broken behavior, it may be worth changing the documentation instead. |
We have done this kind of thing in the past where the documentation was ambiguous. One good example is #39149. However, this case is different from that one in two very important ways:
Of course, we still have time. beta1 came out two weeks ago, and rc1 should be out in a couple of weeks. If more people bring up this issue, and especially if they provide proof that it will break many reasonable Go programs, we can reconsider. |
Perhaps it would make sense to test this on some large projects like Kubernetes to do a bit more probing for brakage? |
If large projects want to test 1.15beta1 or the upcoming 1.15rc1, that would of course be very helpful :) It has to be their initiative, though. |
Copying comment from #21092: This change has produced a small but significant number of test failures in Google's codebase. A reduced example of one case which was broken is: type T struct {
Index int
Elements []json.RawMessage
}
var message T
tmp := []interface{}{&message.Index, &message.Elements}
err = json.Unmarshal(raw[0], &tmp)
if err != nil {
return message, err
}
return message, nil https://play.golang.org/p/iNBD_-mhWTI I am concerned that there may be more errors in code not covered by tests. (Failing test count: 6, or ~500, depending on whether you count one cluster of related tests as one test or not.) |
Here's another relevant minified case I've come across. This might not be exactly the same issue, but it's related to the same fix CL. It is an example of code that was directly relying on the opposite-of-what-documentation-said behavior in Go 1.14, and so it breaks in 1.15 where #21092 is fixed: package main
import (
"encoding/json"
"fmt"
"log"
)
type combo struct {
One one
Many []one
}
type one struct {
A string
B string
}
func main() {
b := []byte(`{
"One": {"A":"hello"},
"Many":[{"B":"there"}]
}`)
var v combo
if err := json.Unmarshal(b, &v); err != nil {
log.Fatalln(err)
}
// Initialize many with one.
for i := range v.Many {
v.Many[i] = v.One
}
// Unmarshal again, which in 1.14 would reuse slice elements
// when decoding (counter to encoding/json documentation).
if err := json.Unmarshal(b, &v); err != nil {
log.Fatalln(err)
}
fmt.Println(len(v.Many), v.Many[0].A, v.Many[0].B)
// Output (Go 1.14): 1 hello there
// Output (Go 1.15): 1 there
} As far as I see, the original issue with the mismatch in documentation and behavior was reported in #21092. That was in July 2017. The general preference of package owners at the time was to fix it. @rsc said back then:
The relevant CL notes:
If it's been in place for years, another option is to update documentation to match the (unintended) behavior. We should decide what is better to do at this point. I've seen us update the documentation more often than change behavior in such cases, but I'll let people more familiar with |
For context of our usage, we unmarshal the same data into two types, where the second type is initialized from the first. If the data was for the first type, then the second unmarshal would be a no-op. If the data is for the second type, then the first unmarshal would be a no-op. The change for 1.15 now means that the second operation is a zeroing op if the data was for the first type. Again, for us this is a minor fix, but I'm just glad we had tests for it. |
@neild can you share how that number compares to other past changes in the json library? "small but significant" doesn't really tell me if I should worry. I assume @dsnet will have an opinion here, as we had a similar discussion in #39149.
The docs have always been pretty clear as far as I can tell, so I still default to leaving the change as-is. I'm happy to hear counter-arguments, though, but please try to back them up with evidence. Another option is a compromise. In the original code, we reused existing element values within a slice's capacity. In the new code, we never reuse any element values in a slice, at all (as documented). A compromise could be to only reuse the existing element values up to a slice's length, ignoring those between its length and its capacity. This is what the original bug report was about:
However, I still prefer the current behavior over this compromise. But the compromise is better than reverting the entire change, at least. |
The reason this is subtle is that "don't reuse element data between len and cap when appending" sounds great, except that the rule "unmarshal into slice cuts len to 0 first" means that all unmarshalling is appending. The pattern that's now been identified both inside and outside Google is to set up a []interface{} containing multiple pointer targets and then unmarshal a json array into it. This is analogous to setting up a struct containing multiple pointer targets and unmarshaling a json object into it. In Go 1.14 these two operations worked the same. In the current Go 1.15 beta they don't. Consider this program:
Same thing is going on in both the array and object case, just a different container around them. On the playground with Go 1.14 - https://play.golang.org/p/BjJRIo4cSBP - both behave the same: they try to unmarshal "hi" into a *int and report an error. In the Go 1.15 beta, the struct case still reports an error but the array case now replaces the pre-filled *int with a string. This is a silly short example; a more important one is when you have a JSON array of different object types and prefill an []interface{} with pre-allocations of the right structs for the decoding. In that case, you'd get success in Go 1.14 but failure in the Go 1.15 beta, because the important pre-filled type information is thrown away. I don't believe it makes sense to break this example by introducing such a big difference between unmarshal into struct and unmarshal into slice. For Go 1.16, we might try doing what #21092 claimed to be about, namely ignoring pre-filled items beyond the len of the slice passed to unmarshal, while still respecting pre-filled items between 0 and the original len. For Go 1.15, at this late stage, it looks to me like the best path forward is to revert the breaking change. (It's too late to try some other new semantics.) |
More than some, less than others? As I said, there are only 6 failures, which isn't a lot in the grand scheme of things. However, I'm concerned both that the first case I looked at (#39427 (comment)) isn't obviously doing anything unreasonable, and that there may be more places that aren't caught by test coverage. |
@mvdan @dsnet Are you in agreement with the plan suggested for 1.15 in #39427 (comment)? |
I'm not in agreement; the pattern that Russ mentioned goes directly against the existing documentation, so I'm not sure how it's considered a breaking change to fix the code instead of rewriting the documentation. Still, I appear to be alone in my understanding of the Go1 compatibility guarantee, and we've run out of time, so it's time to revert. |
Change https://golang.org/cl/240657 mentions this issue: |
What version of Go are you using (
go version
)?I can confirm the changes is introduced by 11b2853
Does this issue reproduce with the latest release?
On master
What did you do?
What did you expect to see?
In Go 1.14 the output is
1 2
What did you see instead?
The output is
0 0
It seems like the new JSON decoder handles the element type incorrectly.
can output
1 2
.The text was updated successfully, but these errors were encountered: