clientv3: initial kv order caching #8092

mangoslicer · 2017-06-14T10:39:23Z

heyitsanthony · 2017-06-20T20:15:14Z

etcdserver/api/v3rpc/rpctypes/error.go

@@ -29,6 +29,7 @@ var (
 	ErrGRPCDuplicateKey  = grpc.Errorf(codes.InvalidArgument, "etcdserver: duplicate key given in txn request")
 	ErrGRPCCompacted     = grpc.Errorf(codes.OutOfRange, "etcdserver: mvcc: required revision has been compacted")
 	ErrGRPCFutureRev     = grpc.Errorf(codes.OutOfRange, "etcdserver: mvcc: required revision is a future revision")
+	ErrGRPCPastRev       = grpc.Errorf(codes.FailedPrecondition, "etcdserver: required response revision to be higher than previous revisions")


please remove these errors, this file is only meant for errors sent out by the etcd server over grpc (and nothing in this patch uses the errors anyway)

heyitsanthony · 2017-06-20T20:16:19Z

clientv3/ordering/kv.go

+
+type kvOrderingCache struct {
+	clientv3.KV
+	switchEndpoint SwitchEndpointClosure


The endpoint switching shouldn't be baked into the type name; it's a policy decision which can be left to the function. Try type OrderViolationFunc func(op clientv3.Op, resp clientv3.OpResponse, prevRev int64)

Alright that makes sense. Regardless of the how to switch endpoints and the retry rate is implemented in the OrderViolationFunc type, the Get method should still keep looping through and calling the kv.orderViolation method, right? So I was thinking of using:

r, err := kv.KV.Do(ctx, clientv3.OpGet(key, opts...)) currRev := r.Get().Header.Revision for err != nil && prevRev > currRev { kv.orderViolation(op, resp, prevRev) r, err = kv.KV.Do(ctx, clientv3.OpGet(key, opts...)) }

Does the above look right?

idea is OK, but needs to fetch the prev rev and update it:

op := clientv3.OpGet(key, opts...) lastRev := kv.prevRev() // safely get prev rev w/ atomic or rwmutex for { r, err := kv.KV.Do(ctx, op) if err != nil { return nil, err } resp := r.Get() if resp.Header.Revision >= lastRev { kv.update(r.Header.Revision) // safely update prev rev return resp, nil } kv.orderViolation(op, r, lastRev) }

heyitsanthony · 2017-06-20T20:17:25Z

clientv3/ordering/kv.go

+}
+
+func (kv *kvOrderingCache) Get(ctx context.Context, key string, opts ...clientv3.OpOption) (*clientv3.GetResponse, error) {
+	if len(key) == 0 {


I don't think this is necessary; the namespace code does a check here to avoid prefixing an empty key and having the request succeed when it shouldn't

heyitsanthony · 2017-06-20T20:20:51Z

clientv3/ordering/kv.go

+	if len(key) == 0 {
+		return nil, rpctypes.ErrEmptyKey
+	}
+	r, err := kv.KV.Do(ctx, clientv3.OpGet(key, opts...))


get the previous revision before submitting the RPC; otherwise, this breaks with concurrent accesses. Example:req1 has prevRev=1, req2 has prevRev=1, req2 returns and sets prevRev=2, req1 returns and uses prevRev=2 when it should be comparing with prevRev=1

req1 has prevRev=1, req2 has prevRev=1, req2 returns and sets prevRev=2, req1 returns and uses prevRev=2 when it should be comparing with prevRev=1

In your example, is the same client is sending out req1 and req2 in some sort of asynchronous fashion (maybe through seperate goroutines)?

If so, maybe I can just store a local variable set to the kv.prevRev at the time of sending req1 and then checking the returned rev of res1 against that local variable, so that the comparision is not affected by the changes to the prevRev that req2 might cause.

Yeah, the requests could come in from separate goroutines (e.g., proxy serves each RPC request in a separate go-routine but uses only one client). A local variable would handle it.

heyitsanthony · 2017-06-20T20:21:18Z

clientv3/ordering/kv.go

+	res := r.Get()
+	if kv.prevRev > res.Header.Revision {
+		kv.switchEndpoint()
+		kv.Get(ctx, key, opts...)


this ignores the get response

heyitsanthony · 2017-06-20T20:21:50Z

clientv3/ordering/kv.go

+		kv.switchEndpoint()
+		kv.Get(ctx, key, opts...)
+	} else {
+		kv.prevRev = res.Header.Revision


kv.prevRev either needs to be protected with a mutex or with atomics; otherwise there will be data races

heyitsanthony · 2017-06-20T20:23:12Z

clientv3/ordering/kv.go

+		return nil, err
+	}
+	res := r.Get()
+	if kv.prevRev > res.Header.Revision {


There needs to be some kind of looping here in case the reconnect goes to another outdated server. For example, a 5 node cluster can have two out-dated members, so switching from one to the other could still give an out of order result.

heyitsanthony · 2017-06-20T20:23:51Z

clientv3/ordering/kv.go

+		_, ok := ((*resOp).Response).(*pb.ResponseOp_ResponseRange)
+		if ok && txn.kv.prevRev > resp.Header.Revision {
+			txn.kv.switchEndpoint()
+			txn.Commit()


what happens to the server response?

heyitsanthony · 2017-06-20T20:26:41Z

clientv3/ordering/kv.go

+	pb "github.com/coreos/etcd/etcdserver/etcdserverpb"
+)
+
+type kvOrderingCache struct {


just kvOrdering? this doesn't behave like a cache

heyitsanthony · 2017-06-26T18:32:50Z

clientv3/ordering/kv.go

+}
+
+func (kv *kvOrdering) setPrevRev(currRev int64) {
+	atomic.StoreInt64(&kv.prevRev, currRev)


Probably want to prevent this from going backwards if there are concurrent calls to setPrevRev:

for { pr := kv.getPrevRev() if currentRev <= pr { return } if atomic.CompareAndSwapInt64(&kv.prevRev, pr, currRev) { return } }

Hadn't considered that. Good point.

heyitsanthony · 2017-06-26T18:37:26Z

clientv3/ordering/util.go

+
+type KvOrderViolationFunc func(op clientv3.Op, resp clientv3.OpResponse, prevRev int64)
+
+type TxnOrderViolationFunc func(resp *clientv3.TxnResponse, prevRev int64)


This shouldn't be necessary necessary as of 9cb12de; txns can be wrapped as OpRequest/OpResponse

I don't think I can wrap the TxnResponse because OpResponse does not export any of it's fields. clientv3.OpResponse{txn: resp} won't work.

Build a OpTxn on Commit, then issue the RPC with Do to get an OpResponse?

Since I couldn't get the Txn's cmps, thenOps, and elseOps required to construct the OpTxn through the Txn interface, I created a wrapper implementation around the If, Then, and Else to store the cmps, thenOps, and elseOps. Is this approach the cleanest or is there another more intuitive way?

the wrapper approach is OK

heyitsanthony · 2017-06-26T18:38:31Z

clientv3/ordering/kv.go

+			return nil, err
+		}
+		var ordered bool = true
+		for _, resOp := range resp.Responses {


only need to check resp.Header.Revision; no need to iterate through the responses

Just for reference, how is the resp.Header.Revision determined? Are the individual responses involved in the process of determining the resp.Header.Revision?

The header revision is the etcd store revision at the time of servicing the RPC. Since txns are atomic, there only needs to be one header revision (e.g., two puts in a txn will commit under the same revision)

heyitsanthony

A few last nits; looks almost ready to merge. Also, please squash into a single patch titled ordering: add client ordering wrapper so the commit history is tidy. Thanks!

heyitsanthony · 2017-07-07T21:27:12Z

clientv3/ordering/util.go

+
+type OrderViolationFunc func(op clientv3.Op, resp clientv3.OpResponse, prevRev int64)
+
+func switchEndpointClosure(c clientv3.Client) OrderViolationFunc {


func NewOrderViolationSwitchEndpoint(c clientv3.Client) so other packages can use this?

heyitsanthony · 2017-07-07T21:28:11Z

clientv3/ordering/kv_test.go

+		for _, r := range res.Responses {
+			_, ok := ((*r).Response).(*pb.ResponseOp_ResponseRange)
+			if rev := res.Header.Revision; ok && rev < tt.prevRev {
+				t.Errorf("Response revision: %d was less than the previous revision: %d\n", rev, tt.prevRev)


idiomatic go would be closer to

t.Errorf("#%d.%d: expected revision %d, got %d", i, j, tt.prevRev, rev)

heyitsanthony · 2017-07-07T21:29:00Z

clientv3/ordering/kv_test.go

+		txn := &txnOrdering{mTxn, kv}
+		res, err := txn.Commit()
+		if err != nil {
+			t.Errorf("For response: %+v, received the following error: %v\n", tt.response, err)


idiomatic go would be closer to

t.Errorf("#%d: expected response %+v, got error %+v", i, tt.response, err)

heyitsanthony · 2017-07-07T21:29:44Z

clientv3/ordering/kv_test.go

+}
+
+func TestTxnOrdering(t *testing.T) {
+	for _, tt := range txnTests {


Probably use the index for the error messages:

for i, tt := range txnTests {

mangoslicer · 2017-07-10T03:39:02Z

@heyitsanthony

In order to do any testing, I would have to mock the KV interface's Do method, which is difficult at the moment because its return type of OpResponse is composed of private fields. So in order to test, I would need to either make the OpResponse fields public or create setter methods for that struct's fields. Are these options feasible?

heyitsanthony · 2017-07-10T19:20:57Z

@mangoslicer maybe have functions like func (resp *PutResponse) ToOpResponse() OpResponse { ... } instead of resorting to public fields / setters?

mangoslicer · 2017-07-11T04:40:52Z

@heyitsanthony
Can you take a look at the last few changes that I made to the tests?

Also, what happens if the old member, which has the lastest rev for a key, sets the prevRev and then goes down? Assuming that no other member in the cluster has a rev higher than the prevRev, would there be am infinite loop of calling NewOrderViolationSwitchEndpointClosure?

heyitsanthony · 2017-07-12T19:47:39Z

Also, what happens if the old member, which has the lastest rev for a key, sets the prevRev and then goes down?

So long as the cluster has quorum, at least one member will have the latest revision.

Assuming that no other member in the cluster has a rev higher than the prevRev, would there be am infinite loop of calling NewOrderViolationSwitchEndpointClosure?

If the cluster doesn't have quorum, the request could loop forever. One way around this is to have OrderViolationFunction return an error so the user can provide a policy to terminate the looping.

It might be good to have a test like the ones in clientv3/integration that checks this works against a cluster instead of only mocks. Easiest way to do it is create a cluster with 3 nodes. Create a client pointing to member 0. Put a key, get the key through clus.Client(1) so member 1 is known to have it, stop member 2, put the key again, get with the ordering client. Then, stop members 0 and 1, restart member 2. Member 2 will have an outdated key Switch the ordering client's endpoints to point to member 2. Try to get the key and confirm the ordering client detects the order violation with a test violation function.

heyitsanthony · 2017-07-12T19:26:41Z

clientv3/kv.go

+func (resp *TxnResponse) ToOpResponse() OpResponse {
+	return OpResponse{txn: resp}
+}
+


include PutResponse and DelResponse for completeness?

heyitsanthony · 2017-07-12T19:31:18Z

clientv3/ordering/kv.go

+	txn.mu.Lock()
+	txn.cmps = cs
+	txn.mu.Unlock()
+	return txn.Txn.If(cs...)


this should be

txn.mu.Lock() defer txn.mu.Unlock() txn.cmps = cs txn.Txn = txn.Txn.If(cs...) return txn

otherwise Commit will use txn.Txn.Commit instead of txnOrdering.Commit

heyitsanthony · 2017-07-12T19:31:48Z

clientv3/ordering/kv.go

+	txn.mu.Lock()
+	txn.thenOps = ops
+	txn.mu.Unlock()
+	return txn.Txn.Then(ops...)


txn.mu.Lock() defer txn.mu.Unlock() txn.thenOps = ops txn.Txn = txn.Txn.Then(ops...) return txn

heyitsanthony · 2017-07-12T19:32:47Z

clientv3/ordering/kv.go

+	txn.mu.Lock()
+	txn.elseOps = ops
+	txn.mu.Unlock()
+	return txn.Txn.Else(ops...)


txn.mu.Lock() defer txn.mu.Unlock() txn.elseOps = ops txn.Txn = txn.Txn.Else(ops...) return txn

heyitsanthony · 2017-07-12T19:33:24Z

clientv3/ordering/kv.go

+	// middle of the Commit operation.
+	prevRev := txn.getPrevRev()
+	for {
+		opTxn := clientv3.OpTxn(txn.cmps, txn.thenOps, txn.elseOps)


move this before the for {} since it never changes in the loop?

mangoslicer · 2017-07-17T15:11:40Z

@heyitsanthony

After I kill the 0th and 1st member, how is the 2nd member supposed to become the leader? It seems that after I try to kill the first two members, the third member keeps requesting votes but it doesn't receive votes from the stopped nodes, so the term keeps increasing without a leader. As a result, the Range request doesn't complete and the unit test times out. I tried using clientv3.WithSerializable as an option in the Range call, but that didn't prevent the leader election from hanging. Do you have any idea?

One way around this is to have OrderViolationFunction return an error so the user can provide a policy to terminate the looping.

Since the signature of the Get method can not be changed from Get(ctx context.Context, key string, opts ...clientv3.OpOption) (*clientv3.GetResponse, error), it might be hard for the user to provide a policy when the error can't be returned in the Get call. Should the OrderViolationFunction just take a callback that handles any errors?

heyitsanthony · 2017-07-17T18:07:44Z

@mangoslicer the point of the test isn't to get a new leader, it's to get a member that's behind and check that the ordering kv detects the outdated revision. The OrderViolationFunc supplied by the test would return an error and that would be returned by the client so there's no need to elect a new leader to make forward progress.

mangoslicer · 2017-07-19T14:15:02Z

@heyitsanthony

I added unit tests for the ordering wrapper and for the endpoint-switching code. There are also two changes that I made to the endpoint-switching code. One of them is that the endpoint-switching code return an error if it switches through all of the endpoints and is not able to find a higher rev. I thought that this change would prevent the infinite loop that would occur if the the member that set the prevRev crashes before replicating the write log to members with lower revs. The other change I made was a one second time out added after setting the endpoint of the client to a specific member, because while unit-testing, I found that without some sort of delay, the SetEndpoints call was not correctly connecting to the specified endpoint. I'm not sure if this one second delay is acceptable in production code, but I thought that it was the easiest way to resolve the issue of not actually connecting to the specified input.

heyitsanthony · 2017-07-19T16:55:44Z

@mangoslicer OK this is almost ready to merge, but CI is failing.

Segfault on semaphore (https://semaphoreci.com/coreos/etcd/branches/pull-request-8092/builds/10).

--- FAIL: TestDetectKvOrderViolation (1.23s)
panic: runtime error: invalid memory address or nil pointer dereference [recovered]
	panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x804934b]

goroutine 83 [running]:
testing.tRunner.func1(0x18f6e800)
	/usr/local/golang/1.8.3/go/src/testing/testing.go:622 +0x226
panic(0x87bc580, 0x8c71830)
	/usr/local/golang/1.8.3/go/src/runtime/panic.go:489 +0x22a
sync/atomic.LoadUint64(0x1b78c26c, 0x0, 0x0)
	/usr/local/golang/1.8.3/go/src/sync/atomic/asm_386.s:159 +0xb
github.com/coreos/etcd/clientv3/ordering.(*kvOrdering).getPrevRev(0x1b78c260, 0x0, 0x0)
	github.com/coreos/etcd/clientv3/ordering/_test/_obj_test/kv.go:38 +0x30
github.com/coreos/etcd/clientv3/ordering.(*kvOrdering).Get(0x1b78c260, 0x8c3f000, 0x18f0aa5c, 0x8880338, 0x3, 0x0, 0x0, 0x0, 0x0, 0x0, ...)
	github.com/coreos/etcd/clientv3/ordering/_test/_obj_test/kv.go:61 +0x3c
github.com/coreos/etcd/clientv3/ordering.TestDetectKvOrderViolation(0x18f6e800)
	/home/runner/workspace/src/github.com/coreos/etcd/gopath/src/github.com/coreos/etcd/clientv3/ordering/kv_test.go:75 +0x76e
testing.tRunner(0x18f6e800, 0x88b7f2c)
	/usr/local/golang/1.8.3/go/src/testing/testing.go:657 +0x7e
created by testing.(*T).Run
	/usr/local/golang/1.8.3/go/src/testing/testing.go:697 +0x242
FAIL	github.com/coreos/etcd/clientv3/ordering	1.260s

Commit format checking is failing on travis. The fix would be squashing the patches into a single commit titled clientv3/ordering: kv order caching

Thanks!

heyitsanthony

lgtm thanks

xiang90 · 2017-07-25T05:28:37Z

@mangoslicer Thanks!

mangoslicer mentioned this pull request Jun 19, 2017

clientv3: kv ordering wrapper for serialized requests #7623

Closed

heyitsanthony suggested changes Jun 20, 2017

View reviewed changes

heyitsanthony suggested changes Jun 26, 2017

View reviewed changes

heyitsanthony suggested changes Jul 7, 2017

View reviewed changes

heyitsanthony suggested changes Jul 12, 2017

View reviewed changes

clientv3/ordering: kv order caching

887df72

mangoslicer force-pushed the kv-ordering-wrapper branch from 6f0b938 to 887df72 Compare July 20, 2017 02:04

heyitsanthony approved these changes Jul 24, 2017

View reviewed changes

xiang90 merged commit d2654f8 into etcd-io:master Jul 25, 2017

gyuho added the kind/release-note label Jul 25, 2017

gyuho changed the title ~~Added initial kv order caching functionality~~ clientv3: initial kv order caching Aug 8, 2017


		type KvOrderViolationFunc func(op clientv3.Op, resp clientv3.OpResponse, prevRev int64)

		type TxnOrderViolationFunc func(resp *clientv3.TxnResponse, prevRev int64)


		type OrderViolationFunc func(op clientv3.Op, resp clientv3.OpResponse, prevRev int64)

		func switchEndpointClosure(c clientv3.Client) OrderViolationFunc {

clientv3: initial kv order caching #8092

clientv3: initial kv order caching #8092

Conversation

mangoslicer commented Jun 14, 2017 • edited Loading

heyitsanthony Jun 20, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mangoslicer Jun 23, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mangoslicer Jun 23, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

heyitsanthony left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mangoslicer commented Jul 10, 2017

heyitsanthony commented Jul 10, 2017

mangoslicer commented Jul 11, 2017 • edited Loading

heyitsanthony commented Jul 12, 2017

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mangoslicer commented Jul 17, 2017

heyitsanthony commented Jul 17, 2017

mangoslicer commented Jul 19, 2017

heyitsanthony commented Jul 19, 2017

heyitsanthony left a comment

Choose a reason for hiding this comment

xiang90 commented Jul 25, 2017

mangoslicer commented Jun 14, 2017 •

edited

Loading

heyitsanthony Jun 20, 2017 •

edited

Loading

mangoslicer Jun 23, 2017 •

edited

Loading

mangoslicer Jun 23, 2017 •

edited

Loading

mangoslicer commented Jul 11, 2017 •

edited

Loading