Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

grpc-gateway: txn marshaling broken #7889

Closed
heyitsanthony opened this issue May 5, 2017 · 11 comments
Closed

grpc-gateway: txn marshaling broken #7889

heyitsanthony opened this issue May 5, 2017 · 11 comments
Labels
Milestone

Comments

@heyitsanthony
Copy link
Contributor

I try a txn with If(CreateRev("foo")=0) { Put("foo", "bar") }, generated with the gateway's marshaler from the pb struct:

'{"compare":[{"result":0,"target":1,"key":"Zm9v","TargetUnion":null}],"success":[{"Request":{"RequestPut":{"key":"Zm9v","value":"YmFy"}}}]}'

The %+v output for the pb structure gives:

{Compare:[target:CREATE key:"foo" ] Success:[request_put:<key:"foo" value:"bar" > ] Failure:[]}

However, the gateway decodes it into the pb structure:

{Compare:[target:CREATE key:"foo" ] Success:[] Failure:[]}

Then crashes out before returning from client.Txn:

[signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x45d7712]

goroutine 161 [running]:
github.com/coreos/etcd/cmd/vendor/github.com/coreos/etcd/etcdserver/etcdserverpb.(*ResponseOp).Size(0x0, 0x1)
	/Users/anthony/go/src/github.com/coreos/etcd/cmd/vendor/github.com/coreos/etcd/etcdserver/etcdserverpb/rpc.pb.go:6801 +0x22
github.com/coreos/etcd/cmd/vendor/github.com/coreos/etcd/etcdserver/etcdserverpb.(*TxnResponse).Size(0xc4201034a0, 0xc42079f8f8)
	/Users/anthony/go/src/github.com/coreos/etcd/cmd/vendor/github.com/coreos/etcd/etcdserver/etcdserverpb/rpc.pb.go:6916 +0x74
github.com/coreos/etcd/cmd/vendor/github.com/coreos/etcd/etcdserver/etcdserverpb.(*TxnResponse).Marshal(0xc4201034a0, 0x500b340, 0xc4201034a0, 0x68a5378, 0xc4201034a0, 0x5012201)
	/Users/anthony/go/src/github.com/coreos/etcd/cmd/vendor/github.com/coreos/etcd/etcdserver/etcdserverpb/rpc.pb.go:4385 +0x2f
github.com/coreos/etcd/cmd/vendor/github.com/gogo/protobuf/proto.Marshal(0x500b340, 0xc4201034a0, 0xc4201034a0, 0x500b340, 0xc4201034a0, 0x50122a0, 0xc420221440)
	/Users/anthony/go/src/github.com/coreos/etcd/cmd/vendor/github.com/gogo/protobuf/proto/encode.go:233 +0x34b
github.com/coreos/etcd/cmd/vendor/github.com/coreos/etcd/etcdserver/api/v3rpc.(*codec).Marshal(0x5054910, 0x4a6ff60, 0xc4201034a0, 0x4a78920, 0xc4206eeb40, 0x4a6ff60, 0xc4201034a0, 0x0)
	/Users/anthony/go/src/github.com/coreos/etcd/cmd/vendor/github.com/coreos/etcd/etcdserver/api/v3rpc/codec.go:22 +0x60
github.com/coreos/etcd/cmd/vendor/google.golang.org/grpc.encode(0x5009700, 0x5054910, 0x4a6ff60, 0xc4201034a0, 0x0, 0x0, 0x0, 0x0, 0x45be857, 0xc420196400, ...)
	/Users/anthony/go/src/github.com/coreos/etcd/cmd/vendor/google.golang.org/grpc/rpc_util.go:278 +0x2f9
github.com/coreos/etcd/cmd/vendor/google.golang.org/grpc.(*Server).sendResponse(0xc420148000, 0x5012420, 0xc420250a80, 0xc4201b8480, 0x4a6ff60, 0xc4201034a0, 0x0, 0x0, 0xc42020c4c8, 0x0, ...)
	/Users/anthony/go/src/github.com/coreos/etcd/cmd/vendor/google.golang.org/grpc/server.go:601 +0xb5
github.com/coreos/etcd/cmd/vendor/google.golang.org/grpc.(*Server).processUnaryRPC(0xc420148000, 0x5012420, 0xc420250a80, 0xc4201b8480, 0xc42074aa80, 0x504e008, 0x0, 0x0, 0x0)
	/Users/anthony/go/src/github.com/coreos/etcd/cmd/vendor/google.golang.org/grpc/server.go:763 +0xedd
github.com/coreos/etcd/cmd/vendor/google.golang.org/grpc.(*Server).handleStream(0xc420148000, 0x5012420, 0xc420250a80, 0xc4201b8480, 0x0)
	/Users/anthony/go/src/github.com/coreos/etcd/cmd/vendor/google.golang.org/grpc/server.go:932 +0x1339
github.com/coreos/etcd/cmd/vendor/google.golang.org/grpc.(*Server).serveStreams.func1.1(0xc420011a00, 0xc420148000, 0x5012420, 0xc420250a80, 0xc4201b8480)
	/Users/anthony/go/src/github.com/coreos/etcd/cmd/vendor/google.golang.org/grpc/server.go:497 +0xa9
created by github.com/coreos/etcd/cmd/vendor/google.golang.org/grpc.(*Server).serveStreams.func1
	/Users/anthony/go/src/github.com/coreos/etcd/cmd/vendor/google.golang.org/grpc/server.go:498 +0xa1
@heyitsanthony heyitsanthony added this to the v3.3.0 milestone May 9, 2017
@mangoslicer
Copy link
Contributor

Hi guys, if no one has started to investigate this bug, I would like to work on it. I read through the stack trace in order to get a general idea of the data flow. Correct me if I'm wrong, but I believe that the unmarshaling of transaction requests begins on line 724 of cmd/vendor/google.golang.org/grpc/server.go:
if err := s.opts.codec.Unmarshal(req, v); err != nil { return err }
and the actual processing of the request start on line 738 of the same server.go file:
reply, appErr := md.Handler(srv.server, stream.Context(), df, s.opts.unaryInt)

If that is the case, then I think a good start to the investigation would be stepping through those areas of the code with a debugger. I actually haven't been able to determine a good way of reproducing this marshaling defect. Could you please provide steps for reproducing the defect. I have been googling for a while, but I wasn't able to find documentation explaining how to initiate a transactions to etcd.

@heyitsanthony
Copy link
Contributor Author

@mangoslicer

This is how to repro:

curl -L http://localhost:2379/v3alpha/kv/txn \
        -X POST \
        -d '{"compare":[{"result":0,"target":1,"key":"Zm9v","TargetUnion":null}],"success":[{"Request":{"RequestPut":{"key":"Zm9v","value":"YmFy"}}}]}'

@mangoslicer
Copy link
Contributor

@heyitsanthony
Thanks for posting the reproduction info.

It looks like the marshaling code is fine. I recreated : {Compare:[target:CREATE key:"foo" ] Success:[request_put:<key:"foo" value:"bar" > ] Failure:[]} using pb structures and then marshaled and unmarshalled the structure. The result of unmarshalling was the original statement pb structure with a non-empty Success field.

The problem actually appears to be related to an incomplete reading of the transport stream. So since we're not reading the entire stream (in our specific case we're only reading 11 bytes when we should really be reading 23 bytes), the bytes related to the Success field are missing by the time unmarshaller gets the data. From what I can tell, the recvMsg function from rpc_util.go is determining the length of bytes to read from the stream based off of the header of the gRPC message. This is just a small update on this issue, I'm actually still in the process of investigating why the header does not contain the proper length, I just thought people might appreciate an update.

@mangoslicer
Copy link
Contributor

@heyitsanthony
I was wrong in my last post. After investigating further, I found that the problem is actually with the JSON being passed to the gateway.

'{"compare":[{"result":0,"target":1,"key":"Zm9v","TargetUnion":null}],"success":[{"Request":{"RequestPut":{"key":"Zm9v","value":"YmFy"}}}]}'

There are actually two problems with this JSON string. The first issue is that "Request" and "RequestPut" are capitalized instead of in camel case or with underscores between the lower cased words (e.g. "request_put"). We need to adhere to this format of JSON, because library that we're using to unmarshal JSON into protobufs only searches for those specific JSON formats in the provided JSON. Here's the code segment that enforces camel casing and underscoring:

       // Handle nested messages.
       if targetType.Kind() == reflect.Struct {
	         var jsonFields map[string]json.RawMessage
	         if err := json.Unmarshal(inputValue, &jsonFields); err != nil {
		            return err
	         }
	consumeField := func(prop *proto.Properties) (json.RawMessage, bool) {
		// Be liberal in what names we accept; both orig_name and camelName are okay.
		fieldNames := acceptedJSONFieldNames(prop)

		vOrig, okOrig := jsonFields[fieldNames.orig]
		vCamel, okCamel := jsonFields[fieldNames.camel]
		if !okOrig && !okCamel {
			return nil, false
		}
		// If, for some reason, both are present in the data, favour the camelName.
		var raw json.RawMessage
		if okOrig {
			raw = vOrig
			delete(jsonFields, fieldNames.orig)
		}
		if okCamel {
			raw = vCamel
			delete(jsonFields, fieldNames.camel)
		}
		return raw, true
	}

~From etcd/cmd/vendor/github.com/golang/protobuf/jsonpb/jsonpb.go

The second issue is that the "Request" field should not be included in the JSON. The success part of the JSON should actually be:
"success":[{"requestPut":{"key":"Zm9v","value":"YmFy"}}]

Since in RequestOp, "Request" is name of a field which is defined with a "oneof" property, I think that "Request" may have snuck into the JSON due to improper handling of the "oneof" property. If you want to verify that "Request" should not be included in the JSON, take a look at the pieces of code that I found in the protobuf library.

The following is the definition of a message with a oneof property:

       message MsgWithOneof {
         oneof union {
           string title = 1;
           int64 salary = 2;
           string Country = 3;
           string home_address = 4;
         }
       }

~From https://github.com/golang/protobuf/blob/master/jsonpb/jsonpb_test_proto/test_objects.proto

Here is a test case using the MsgWithOneof message:

       var unmarshalingTests = []struct {
           desc        string
           unmarshaler Unmarshaler
           json        string
           pb          proto.Message
       }{
                  "oneof", Unmarshaler{}, `{"salary":31000}`, &pb.MsgWithOneof{Union:&pb.MsgWithOneof_Salary{31000}}}
       }

~From https://github.com/golang/protobuf/blob/master/jsonpb/jsonpb_test.go

The expected JSON (i.e. {"salary":31000}) in the test case does not have the "Union" field defined, so I doubt we should include "Request" in our JSON string.

My question now is what generated the following JSON that was given to me to reproduce this issue?

@heyitsanthony
Copy link
Contributor Author

@mangoslicer

My question now is what generated the following JSON that was given to me to reproduce this issue?

It was generated using grpc-gateway's runtime.Marshaler from a TxnRequest.

@mangoslicer
Copy link
Contributor

mangoslicer commented May 31, 2017

@heyitsanthony

It was generated using grpc-gateway's runtime.Marshaler from a TxnRequest.

I figured as much. Before I go any further, I would appreciate it if you could tell me whether I have the correct high-level overview of this particular code. So according to api_grpc_gateway.md, the purpose of the grpc-gateway is to accept RESTful calls containing JSON data. Based off of this JSON support feature, I'm guessing that there must be an etcd client which takes protocol buffers and converts them into JSON, which is then sent to the grpc_gateway.

In the initial issue report, you state the following:

I try a txn with If(CreateRev("foo")=0) { Put("foo", "bar") }, generated with the gateway's marshaler from the pb struct:

'{"compare":[{"result":0,"target":1,"key":"Zm9v","TargetUnion":null}],"success":[{"Request":{"RequestPut":{"key":"Zm9v","value":"YmFy"}}}]}'

So if I am understanding you correctly, then you're saying that some protocol buffer was fed into the grpc_gateway runtime.Marshaler to produce the JSON at the top of this paragraph.

I'm confused about how "If(CreateRev("foo")=0) { Put("foo", "bar") }" is transformed into the JSON string. I'm sure I would be less confused, if I could find the code that deals with this transformation. As a start, it would be great if you could explain the origin of "If(CreateRev("foo")=0) { Put("foo", "bar") }". Is it pseudo code? Is it a command inputed into some executable client program? It would also be helpful if you could let me know where you got the string version of the JSON in the first place. The filename or function where you put the print statement to get that JSON would be perfect. With those two bits of information, I'll probably be in good shape to implement a fix to this issue.

@heyitsanthony
Copy link
Contributor Author

@mangoslicer

in request_KV_Txn_0 of the generated code I added:

        txn := &etcdserverpb.TxnRequest{
                Compare: []*etcdserverpb.Compare{
                        &etcdserverpb.Compare{
                                Key: []byte("foo"),
                                Result: etcdserverpb.Compare_EQUAL,
                                Target: etcdserverpb.Compare_CREATE,
                                TargetUnion: &etcdserverpb.Compare_CreateRevision{0},
                        },
                },
                Success: []*etcdserverpb.RequestOp{
                        {
                                Request: &etcdserverpb.RequestOp_RequestPut{
                                        RequestPut: &etcdserverpb.PutRequest{
                                                Key: []byte("foo"),
                                                Value: []byte("bar"),
                                        },
                                },
                        },
                },
        }
        fmt.Printf("start: %+v\n", txn)
        b, _ := marshaler.Marshal(txn)
        fmt.Println("json: ", string(b))
        txn = &etcdserverpb.TxnRequest{}
        marshaler.Unmarshal(b, txn)
        fmt.Printf("unjson: %+v\n", txn)

which gives:

start: compare:<target:CREATE key:"foo" create_revision:0 > success:<request_put:<key:"foo" value:"bar" > >
json:  {"compare":[{"target":"CREATE","key":"Zm9v","create_revision":"0"}],"success":[{"request_put":{"key":"Zm9v","value":"YmFy"}}]}
unjson: compare:<target:CREATE key:"foo" create_revision:0 > success:<request_put:<key:"foo" value:"bar" > >

which is from JsonPB with origName = true (cf. https://github.com/grpc-ecosystem/grpc-gateway/blob/master/runtime/marshaler_registry.go#L16)

with origName = false, I get:

{"compare":[{"target":"CREATE","key":"Zm9v","createRevision":"0"}],"success":[{"requestPut":{"key":"Zm9v","value":"YmFy"}}]}

which parses OK and gives the resposne:

{"header":{"cluster_id":"14841639068965178418","member_id":"10276657743932975437","revision":"2","raft_term":"2"},"succeeded":true,"responses":[{"response_put":{"header":{"revision":"2"}}}]}

I'll add some tests for this and close out this issue since marshaling is fine, just that the gateway seemingly can't unmarshal what it marshals. The crashing is still a problem, though.

@mangoslicer
Copy link
Contributor

mangoslicer commented May 31, 2017

@heyitsanthony

I'll add some tests for this and close out this issue since marshaling is fine, just that the gateway seemingly can't unmarshal what it marshals. The crashing is still a problem, though.

Thanks for the detailed explanation, but the question remains as to where the JSON that you gave me to reproduce the issue came from. None of the JSON you printed out in the previous post matches the following JSON.

'{"compare":[{"result":0,"target":1,"key":"Zm9v","TargetUnion":null}],"success":[{"Request":{"RequestPut":{"key":"Zm9v","value":"YmFy"}}}]}'

This JSON contains has the extra "Request" field defined and when this extra piece of JSON messes up the unmarshaling of the success protocol buffer structure, then the program crashes due to the nil success structure. So I don't understand why we would close this issue without figuring out what generated that JSON and fixing it.

@heyitsanthony
Copy link
Contributor Author

heyitsanthony commented May 31, 2017

@mangoslicer
Copy link
Contributor

mangoslicer commented May 31, 2017

@heyitsanthony

Alright that solves that mystery. It also looks like the JSONBuiltin is throwing a marshalling error as well as producing that faulty JSON. So are any of the etcd clients which talk to the grpc_gateway using the JSONBuiltin marshaler? If so, we probably want to switch over to a different marshaler.

@hexfusion
Copy link
Contributor

hexfusion commented Jun 16, 2017

What is is interesting is using 3.2.0 I will pass this through the grpc-gateway and it seems to randomly fail.

{"compare":[{"key":"Zm9v","target":"CREATE","create_revision":"0"}],"success":[{"RequestPut":{"value":"YmFy","key":"Zm9v"}}]}

vs your example

{"compare":[{"target":"CREATE","key":"Zm9v","createRevision":"0"}],"success":[{"requestPut":{"key":"Zm9v","value":"YmFy"}}]}

https://travis-ci.org/hexfusion/perl-net-etcd/builds/243610166

I guess the fact that it fails is one thing but what is your theory on the randomness. In the travis tests it is checking against different versions of Perl. But if you refreshed the tests it will fail in a different order. I don't see this is master that I can reproduce. But I did find the randomness odd.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Development

No branches or pull requests

4 participants