encoding/json: Unmarshal behavior changed to match its documented behavior about not reusing slice elements #39427

bobotu · 2020-06-05T20:20:43Z

What version of Go are you using (`go version`)?

I can confirm the changes is introduced by 11b2853

$ go version

go version devel +325540922a Fri Jun 5 17:27:32 2020 +0000 linux/amd64

Does this issue reproduce with the latest release?

On master

What did you do?

package main

import "encoding/json"

type A struct {
	A int
}

type B struct {
	B int
}

func main() {
	var a A
	var b B

	decode(`[{"A": 1}, {"B": 2}]`, &a, &b)

	println(a.A, b.B)
}

func decode(raw string, args ...interface{}) {
	json.Unmarshal([]byte(raw), &args)
}

What did you expect to see?

In Go 1.14 the output is 1 2

What did you see instead?

The output is 0 0

It seems like the new JSON decoder handles the element type incorrectly.

diff --git a/src/encoding/json/decode.go b/src/encoding/json/decode.go
index 5f34af44ea..b13df504e0 100644
--- a/src/encoding/json/decode.go
+++ b/src/encoding/json/decode.go
@@ -575,7 +575,15 @@ func (d *decodeState) array(v reflect.Value) error {
                        if i < initialSliceCap {
                                // Reusing an element from the slice's original
                                // backing array; zero it before decoding.
-                               into.Set(reflect.Zero(v.Type().Elem()))
+                               tp := into.Type()
+                               if tp.Kind() == reflect.Interface {
+                                       tp = into.Elem().Type()
+                               }
+                               if tp.Kind() != reflect.Ptr {
+                                       into.Set(reflect.Zero(tp))
+                               }
                        }

can output 1 2.

The text was updated successfully, but these errors were encountered:

dmitshur · 2020-06-05T21:56:03Z

Thanks for reporting @bobotu.

The commit you mentioned was to fix issue #21092.

/cc @mvdan @trotha01 @ianlancetaylor @bradfitz

mvdan · 2020-06-06T21:27:49Z

As far as I understand, this is working as intended, and the fix for issue #21092 is correct. The docs read:

To unmarshal a JSON array into a slice, Unmarshal resets the slice length to zero and then appends each element to the slice.

That is, the two elements in the slice value don't matter, because the length is reset to zero, and thus they are entirely ignored.

You also bring up a potential fix, along with:

It seems like the new JSON decoder handles the element type incorrectly.

Note that your element type is just interface{}, so the decoder applies the default rule which results in a map. If you add fmt.Printf("%#v\n", args) right after the unmarshal, you'll see []interface {}{map[string]interface {}{"A":1}, map[string]interface {}{"B":2}}.

Regardless of whether one would consider your initial code correct or not, I think it was never guaranteed to work given the docs. It seemed to rely on an unintended implementation detail which, as far as I can tell, did the opposite of what the docs say. If you think my understanding of the situation is incorrect, please explain why in detail.

bobotu · 2020-06-07T03:08:31Z

Thanks for your reply @mvdan.

According to the documentation, this is indeed an undefined behavior.

So the correct way is to use josn.Decoder?

func decode(raw string, args ...interface{}) {
	dec := json.NewDecoder(strings.NewReader(raw))
	dec.Token()
	var i int
	for dec.More() {
		if i == len(args) {
			return
		}
		err := dec.Decode(args[i])
		if err != nil {
			panic(err.Error())
		}
		i++
	}
}

mvdan · 2020-06-07T21:59:39Z

According to the documentation, this is indeed an undefined behavior.

I should note that it's not undefined behavior. The docs document that your original code should not work. The fact that it used to work was a bug, which we're fixing in 1.15.

I'm not sure what the correct way to do what you want to do would be; you never told us what you're trying to do, you only provided the solution. Assuming that you want to decode a list of JSON values into specific and different Go types, multiple ideas come to mind:

decoding into []json.RawMessage, so you can then decode each into the appropriate type in a second pass
decoding one element at a time, like your latest sample above
decoding into a superset of the two types, like []AB

dmitshur · 2020-06-08T15:46:13Z

Thanks for the report @bobotu and the analysis @mvdan.

twmb · 2020-06-20T01:48:10Z

I am curious at how many people this will break. This broke some of our code (and showed up in tests) where we relied on unmarshalling merging values as opposed to completely zeroing and overwriting. Looking back at #21092, @OneOfOne indicated that they also relied on the merging behavior. @robmccoll indicated that this would not break @OneOfOne's usage because only new values appended would be zeroed and existing ones would not, but that's not true.

I agree that the docs do indicate that we were relying on broken behavior, and I'm in the process of changing my code.

I think that it's worth considering how much other code this change may break. If it's more than expected, i.e. people have come to rely on the broken behavior, it may be worth changing the documentation instead.

mvdan · 2020-06-20T21:57:38Z

I think that it's worth considering how much other code this change may break. If it's more than expected, i.e. people have come to rely on the broken behavior, it may be worth changing the documentation instead.

We have done this kind of thing in the past where the documentation was ambiguous. One good example is #39149.

However, this case is different from that one in two very important ways:

Clarity of the docs. In this case, the docs were very clear, so we are only breaking programs which were expecting exactly the opposite.
Number of programs affected. So far, two people have raised the issue here, but we don't have a good idea of how many programs would break once the final release is out. That other issue broke 50 targets/tests at Google, which is quite a lot compared to previous behavior changes and regressions in the json package, so that is a good metric. As far as I can tell, noone at Google has raised an alarm about this change, nor has anyone else been bothered enough by the change to open an issue.

Of course, we still have time. beta1 came out two weeks ago, and rc1 should be out in a couple of weeks. If more people bring up this issue, and especially if they provide proof that it will break many reasonable Go programs, we can reconsider.

invidian · 2020-06-25T20:49:48Z

Perhaps it would make sense to test this on some large projects like Kubernetes to do a bit more probing for brakage?

mvdan · 2020-06-25T20:51:30Z

If large projects want to test 1.15beta1 or the upcoming 1.15rc1, that would of course be very helpful :) It has to be their initiative, though.

neild · 2020-06-25T22:13:49Z

Copying comment from #21092:

This change has produced a small but significant number of test failures in Google's codebase. A reduced example of one case which was broken is:

type T struct {
  Index    int
  Elements []json.RawMessage
}

var message T
tmp := []interface{}{&message.Index, &message.Elements}
err = json.Unmarshal(raw[0], &tmp)
if err != nil {
  return message, err
}
return message, nil

https://play.golang.org/p/iNBD_-mhWTI

I am concerned that there may be more errors in code not covered by tests. (Failing test count: 6, or ~500, depending on whether you count one cluster of related tests as one test or not.)

dmitshur · 2020-06-25T23:30:04Z

Here's another relevant minified case I've come across. This might not be exactly the same issue, but it's related to the same fix CL. It is an example of code that was directly relying on the opposite-of-what-documentation-said behavior in Go 1.14, and so it breaks in 1.15 where #21092 is fixed:

package main

import (
	"encoding/json"
	"fmt"
	"log"
)

type combo struct {
	One  one
	Many []one
}

type one struct {
	A string
	B string
}

func main() {
	b := []byte(`{
		"One":  {"A":"hello"},
		"Many":[{"B":"there"}]
	}`)

	var v combo
	if err := json.Unmarshal(b, &v); err != nil {
		log.Fatalln(err)
	}

	// Initialize many with one.
	for i := range v.Many {
		v.Many[i] = v.One
	}

	// Unmarshal again, which in 1.14 would reuse slice elements
	// when decoding (counter to encoding/json documentation).
	if err := json.Unmarshal(b, &v); err != nil {
		log.Fatalln(err)
	}

	fmt.Println(len(v.Many), v.Many[0].A, v.Many[0].B)

	// Output (Go 1.14): 1 hello there
	// Output (Go 1.15): 1  there
}

As far as I see, the original issue with the mismatch in documentation and behavior was reported in #21092. That was in July 2017. The general preference of package owners at the time was to fix it. @rsc said back then:

I agree that when appending to a slice, unmarshal should start with fresh zeroed slice elements and not the old slice elements that happen to sit between len and cap. Too late for Go 1.10 though.

The relevant CL notes:

The previous behavior directly contradicted the docs that have been in place for years

If it's been in place for years, another option is to update documentation to match the (unintended) behavior.

We should decide what is better to do at this point. I've seen us update the documentation more often than change behavior in such cases, but I'll let people more familiar with encoding/json and these subtle trade-offs weigh in. Marking as NeedsDecision release-blocker for 1.15.

/cc @mvdan @rsc @neild @ianlancetaylor

twmb · 2020-06-25T23:50:32Z

For context of our usage, we unmarshal the same data into two types, where the second type is initialized from the first. If the data was for the first type, then the second unmarshal would be a no-op. If the data is for the second type, then the first unmarshal would be a no-op. The change for 1.15 now means that the second operation is a zeroing op if the data was for the first type.

Again, for us this is a minor fix, but I'm just glad we had tests for it.

mvdan · 2020-06-26T15:34:56Z

This change has produced a small but significant number of test failures in Google's codebase.

@neild can you share how that number compares to other past changes in the json library? "small but significant" doesn't really tell me if I should worry. I assume @dsnet will have an opinion here, as we had a similar discussion in #39149.

We should decide what is better to do at this point.

The docs have always been pretty clear as far as I can tell, so I still default to leaving the change as-is. I'm happy to hear counter-arguments, though, but please try to back them up with evidence.

Another option is a compromise. In the original code, we reused existing element values within a slice's capacity. In the new code, we never reuse any element values in a slice, at all (as documented). A compromise could be to only reuse the existing element values up to a slice's length, ignoring those between its length and its capacity. This is what the original bug report was about:

unmarshal into slice reuses element data between len and cap

However, I still prefer the current behavior over this compromise. But the compromise is better than reverting the entire change, at least.

rsc · 2020-06-26T15:43:54Z

The reason this is subtle is that "don't reuse element data between len and cap when appending" sounds great, except that the rule "unmarshal into slice cuts len to 0 first" means that all unmarshalling is appending. The pattern that's now been identified both inside and outside Google is to set up a []interface{} containing multiple pointer targets and then unmarshal a json array into it. This is analogous to setting up a struct containing multiple pointer targets and unmarshaling a json object into it. In Go 1.14 these two operations worked the same. In the current Go 1.15 beta they don't.

Consider this program:

package main

import (
	"encoding/json"
	"fmt"
)

func tojs(x interface{}) string {
	js, err := json.Marshal(x)
	if err != nil {
		panic(err)
	}
	return string(js)
}

func main() {
	array := []interface{}{new(int)}
	err := json.Unmarshal([]byte(`["hi"]`), &array)
	fmt.Printf("%s, %v\n", tojs(array), err)

	object := struct{ X interface{} }{new(int)}
	err = json.Unmarshal([]byte(`{"x":"hi"}`), &object)
	fmt.Printf("%s, %v\n", tojs(object), err)
}

Same thing is going on in both the array and object case, just a different container around them.

On the playground with Go 1.14 - https://play.golang.org/p/BjJRIo4cSBP - both behave the same: they try to unmarshal "hi" into a *int and report an error. In the Go 1.15 beta, the struct case still reports an error but the array case now replaces the pre-filled *int with a string. This is a silly short example; a more important one is when you have a JSON array of different object types and prefill an []interface{} with pre-allocations of the right structs for the decoding. In that case, you'd get success in Go 1.14 but failure in the Go 1.15 beta, because the important pre-filled type information is thrown away.

I don't believe it makes sense to break this example by introducing such a big difference between unmarshal into struct and unmarshal into slice.

For Go 1.16, we might try doing what #21092 claimed to be about, namely ignoring pre-filled items beyond the len of the slice passed to unmarshal, while still respecting pre-filled items between 0 and the original len.

For Go 1.15, at this late stage, it looks to me like the best path forward is to revert the breaking change. (It's too late to try some other new semantics.)

neild · 2020-06-26T18:44:53Z

@neild can you share how that number compares to other past changes in the json library?

More than some, less than others? As I said, there are only 6 failures, which isn't a lot in the grand scheme of things. However, I'm concerned both that the first case I looked at (#39427 (comment)) isn't obviously doing anything unreasonable, and that there may be more places that aren't caught by test coverage.

dmitshur · 2020-06-30T16:03:56Z

@mvdan @dsnet Are you in agreement with the plan suggested for 1.15 in #39427 (comment)?

mvdan · 2020-07-01T11:28:41Z

I'm not in agreement; the pattern that Russ mentioned goes directly against the existing documentation, so I'm not sure how it's considered a breaking change to fix the code instead of rewriting the documentation.

Still, I appear to be alone in my understanding of the Go1 compatibility guarantee, and we've run out of time, so it's time to revert.

gopherbot · 2020-07-01T11:31:42Z

Change https://golang.org/cl/240657 mentions this issue: Revert "encoding/json: don't reuse slice elements when decoding"

golang/go#39427

dmitshur changed the title ~~encoding/json on master changed the behavior when decoding into []interface{}~~ encoding/json: behavior changed when decoding into []interface{} Jun 5, 2020

dmitshur added the NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. label Jun 5, 2020

dmitshur added this to the Go1.15 milestone Jun 5, 2020

bobotu mentioned this issue Jun 8, 2020

DecodeArgs relies on a wrong behavior of json.Unmarshal pingcap/parser#889

Closed

bobotu closed this as completed Jun 8, 2020

neild reopened this Jun 25, 2020

dmitshur added NeedsDecision Feedback is required from experts, contributors, and/or the community before a change can be made. release-blocker and removed NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. labels Jun 25, 2020

dmitshur changed the title ~~encoding/json: behavior changed when decoding into []interface{}~~ encoding/json: Unmarshal behavior changed to match its documented behavior about not reusing slice elements Jun 25, 2020

rsc mentioned this issue Jun 26, 2020

encoding/json: unmarshal into slice reuses element data between len and cap #21092

Open

dmitshur removed the NeedsDecision Feedback is required from experts, contributors, and/or the community before a change can be made. label Jul 1, 2020

dmitshur added the NeedsFix The path to resolution is known, but the work has not been done. label Jul 1, 2020

gopherbot closed this as completed in 5de90d3 Jul 2, 2020

btwiuse pushed a commit to btwiuse/k0s that referenced this issue Jul 18, 2020

jsoniter: go1.15beta1 encoding/json.Unmarshal bug workaround

89f6849

golang/go#39427

magical mentioned this issue May 13, 2021

Did json.Unmarshal change behaviour between 1.13 and 1.15? #46146

Closed

golang locked and limited conversation to collaborators Jul 2, 2021

gopherbot added the FrozenDueToAge label Jul 2, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

encoding/json: Unmarshal behavior changed to match its documented behavior about not reusing slice elements #39427

encoding/json: Unmarshal behavior changed to match its documented behavior about not reusing slice elements #39427

bobotu commented Jun 5, 2020 •

edited

Loading

dmitshur commented Jun 5, 2020

mvdan commented Jun 6, 2020 •

edited

Loading

bobotu commented Jun 7, 2020 •

edited

Loading

mvdan commented Jun 7, 2020 •

edited

Loading

dmitshur commented Jun 8, 2020

twmb commented Jun 20, 2020 •

edited

Loading

mvdan commented Jun 20, 2020

invidian commented Jun 25, 2020

mvdan commented Jun 25, 2020

neild commented Jun 25, 2020

dmitshur commented Jun 25, 2020 •

edited

Loading

twmb commented Jun 25, 2020

mvdan commented Jun 26, 2020

rsc commented Jun 26, 2020

neild commented Jun 26, 2020

dmitshur commented Jun 30, 2020

mvdan commented Jul 1, 2020

gopherbot commented Jul 1, 2020

encoding/json: Unmarshal behavior changed to match its documented behavior about not reusing slice elements #39427

encoding/json: Unmarshal behavior changed to match its documented behavior about not reusing slice elements #39427

Comments

bobotu commented Jun 5, 2020 • edited Loading

What version of Go are you using (go version)?

Does this issue reproduce with the latest release?

What did you do?

What did you expect to see?

What did you see instead?

dmitshur commented Jun 5, 2020

mvdan commented Jun 6, 2020 • edited Loading

bobotu commented Jun 7, 2020 • edited Loading

mvdan commented Jun 7, 2020 • edited Loading

dmitshur commented Jun 8, 2020

twmb commented Jun 20, 2020 • edited Loading

mvdan commented Jun 20, 2020

invidian commented Jun 25, 2020

mvdan commented Jun 25, 2020

neild commented Jun 25, 2020

dmitshur commented Jun 25, 2020 • edited Loading

twmb commented Jun 25, 2020

mvdan commented Jun 26, 2020

rsc commented Jun 26, 2020

neild commented Jun 26, 2020

dmitshur commented Jun 30, 2020

mvdan commented Jul 1, 2020

gopherbot commented Jul 1, 2020

bobotu commented Jun 5, 2020 •

edited

Loading

What version of Go are you using (`go version`)?

mvdan commented Jun 6, 2020 •

edited

Loading

bobotu commented Jun 7, 2020 •

edited

Loading

mvdan commented Jun 7, 2020 •

edited

Loading

twmb commented Jun 20, 2020 •

edited

Loading

dmitshur commented Jun 25, 2020 •

edited

Loading