Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Compare not equal operator is broken in etcd #10566

Open
purpleidea opened this issue Mar 20, 2019 · 40 comments
Open

Compare not equal operator is broken in etcd #10566

purpleidea opened this issue Mar 20, 2019 · 40 comments
Labels

Comments

@purpleidea
Copy link
Contributor

I couldn't believe it myself, and after a long day of hacking, I finally assumed it was something in etcd's code that was broken instead of mine. Here's a simple reproducer and logs.

What are we using?

$ etcd --version
etcd Version: 3.3.12
Git SHA: d57e8b8d9
Go Version: go1.10.8
Go OS/Arch: linux/amd64

Run etcd in an empty dir:

etcd
2019-03-20 14:49:36.168882 I | etcdmain: etcd Version: 3.3.12
2019-03-20 14:49:36.168944 I | etcdmain: Git SHA: d57e8b8d9
2019-03-20 14:49:36.168955 I | etcdmain: Go Version: go1.10.8
2019-03-20 14:49:36.168963 I | etcdmain: Go OS/Arch: linux/amd64
2019-03-20 14:49:36.168971 I | etcdmain: setting maximum number of CPUs to 4, total number of available CPUs is 4
2019-03-20 14:49:36.168982 W | etcdmain: no data-dir provided, using default data-dir ./default.etcd
2019-03-20 14:49:36.169433 I | embed: listening for peers on http://localhost:2380
2019-03-20 14:49:36.169547 I | embed: listening for client requests on localhost:2379
2019-03-20 14:49:36.169939 I | etcdserver: name = default
2019-03-20 14:49:36.169953 I | etcdserver: data dir = default.etcd
2019-03-20 14:49:36.169963 I | etcdserver: member dir = default.etcd/member
2019-03-20 14:49:36.169968 I | etcdserver: heartbeat = 100ms
2019-03-20 14:49:36.169972 I | etcdserver: election = 1000ms
2019-03-20 14:49:36.169979 I | etcdserver: snapshot count = 100000
2019-03-20 14:49:36.169993 I | etcdserver: advertise client URLs = http://localhost:2379
2019-03-20 14:49:36.170001 I | etcdserver: initial advertise peer URLs = http://localhost:2380
2019-03-20 14:49:36.170014 I | etcdserver: initial cluster = default=http://localhost:2380
2019-03-20 14:49:36.179266 I | etcdserver: starting member 8e9e05c52164694d in cluster cdf818194e3a8c32
2019-03-20 14:49:36.179303 I | raft: 8e9e05c52164694d became follower at term 0
2019-03-20 14:49:36.179319 I | raft: newRaft 8e9e05c52164694d [peers: [], term: 0, commit: 0, applied: 0, lastindex: 0, lastterm: 0]
2019-03-20 14:49:36.179328 I | raft: 8e9e05c52164694d became follower at term 1
2019-03-20 14:49:36.180643 W | auth: simple token is not cryptographically signed
2019-03-20 14:49:36.181095 I | etcdserver: starting server... [version: 3.3.12, cluster version: to_be_decided]
2019-03-20 14:49:36.181467 I | etcdserver: 8e9e05c52164694d as single-node; fast-forwarding 9 ticks (election ticks 10)
2019-03-20 14:49:36.182013 I | etcdserver/membership: added member 8e9e05c52164694d [http://localhost:2380] to cluster cdf818194e3a8c32
2019-03-20 14:49:37.179884 I | raft: 8e9e05c52164694d is starting a new election at term 1
2019-03-20 14:49:37.179965 I | raft: 8e9e05c52164694d became candidate at term 2
2019-03-20 14:49:37.180035 I | raft: 8e9e05c52164694d received MsgVoteResp from 8e9e05c52164694d at term 2
2019-03-20 14:49:37.180098 I | raft: 8e9e05c52164694d became leader at term 2
2019-03-20 14:49:37.180139 I | raft: raft.node: 8e9e05c52164694d elected leader 8e9e05c52164694d at term 2
2019-03-20 14:49:37.180583 I | etcdserver: setting up the initial cluster version to 3.3
2019-03-20 14:49:37.180998 N | etcdserver/membership: set the initial cluster version to 3.3
2019-03-20 14:49:37.181059 I | embed: ready to serve client requests
2019-03-20 14:49:37.181239 I | etcdserver/api: enabled capabilities for version 3.3
2019-03-20 14:49:37.181303 I | etcdserver: published {Name:default ClientURLs:[http://localhost:2379]} to cluster cdf818194e3a8c32
2019-03-20 14:49:37.181539 E | etcdmain: forgot to set Type=notify in systemd service file?
2019-03-20 14:49:37.182692 N | embed: serving insecure client requests on 127.0.0.1:2379, this is strongly discouraged!

Here's a golang example to trigger the issue:

package main

import (
	"context"
	"log"

	etcd "github.com/coreos/etcd/clientv3"
)

func main() {
	log.Printf("hello")
	cfg := etcd.Config{
		Endpoints: []string{"http://127.0.0.1:2379"},
		// 0 disables auto-sync. By default auto-sync is disabled.
		AutoSyncInterval: 0,
	}
	client, err := etcd.New(cfg) // connect!
	if err != nil {
		log.Fatal(err)
	}
	log.Printf("connected")

	ifs := []etcd.Cmp{}
	ops := []etcd.Op{}
	//els := []etcd.Op{}

	key := "/_hello/foo"
	data := "world"

	// works
	//ifs = append(ifs, etcd.Compare(etcd.Value(key), "=", data)) // desired state
	//els = append(ops, etcd.OpPut(key, data)) // TODO: add a TTL? (etcd.WithLease)

	ifs = append(ifs, etcd.Compare(etcd.Value(key), "!=", data)) // desired state
	ops = append(ops, etcd.OpPut(key, data))                     // TODO: add a TTL? (etcd.WithLease)

	log.Printf("txn1...")
	//resp1, err := client.KV.Txn(context.TODO()).If(ifs...).Then([]etcd.Op{}...).Else(els...).Commit()
	resp1, err := client.KV.Txn(context.TODO()).If(ifs...).Then(ops...).Else([]etcd.Op{}...).Commit()
	log.Printf("result1: %+v || %+v", resp1, err) // succeeded is set to true if the compare evaluated to true

	log.Printf("txn2...")
	//resp2, err := client.KV.Txn(context.TODO()).If(ifs...).Then([]etcd.Op{}...).Else(els...).Commit()
	resp2, err := client.KV.Txn(context.TODO()).If(ifs...).Then(ops...).Else([]etcd.Op{}...).Commit()
	log.Printf("result2: %+v || %+v", resp2, err) // succeeded is set to true if the compare evaluated to true

	// test with: ETCDCTL_API=3 etcdctl get / --prefix
}

When the golang code is run, the following appears in the etcd server logs:

proto: no coders for int
proto: no encoder for ValueSize int [GetProperties]
proto: no coders for int
proto: no encoder for ValueSize int [GetProperties]

And running etcdctl produces no output:

$ ETCDCTL_API=3 etcdctl get / --prefix

If you modify the above golang example to use = instead of !=, and switch the then for the else case, then it works correctly as expected.

Thoughts welcome!

Thanks

@kragniz
Copy link
Contributor

kragniz commented Mar 20, 2019

With your reproducer I get the following:

kgz@sakura ~/g/etcd-10566> go run main.go
2019/03/20 20:21:39 hello
2019/03/20 20:21:39 connected
2019/03/20 20:21:39 txn1...
2019/03/20 20:21:39 result1: &{Header:cluster_id:14841639068965178418 member_id:10276657743932975437 revision:1 raft_term:3  Succeeded:false Responses:[]} || <nil>
2019/03/20 20:21:39 txn2...
2019/03/20 20:21:39 result2: &{Header:cluster_id:14841639068965178418 member_id:10276657743932975437 revision:1 raft_term:3  Succeeded:false Responses:[]} || <nil>
proto: no coders for int
proto: no encoder for ValueSize int [GetProperties]
proto: no coders for int
proto: no encoder for ValueSize int [GetProperties]

This is with etcd

kgz@sakura ~> etcd --version
etcd Version: 3.3.12
Git SHA: efb3add
Go Version: go1.12.1
Go OS/Arch: linux/amd64

@purpleidea
Copy link
Contributor Author

@kragniz Much appreciated, AFAICT, bug is confirmed.

If there was NO bug, you would expect to see: Succeeded:true in result1 and Succeeded:false in the result1.

@xiang90
Copy link
Contributor

xiang90 commented Mar 20, 2019

@purpleidea Can you spend sometime to help us dig into this bug?

@purpleidea
Copy link
Contributor Author

@xiang90 I'm currently unemployed and trying to turn https://github.com/purpleidea/mgmt/ into something that can survive, so tbh, no :( I was hoping my clear bug report and easy reproducer would help! Also willing to write a test and send a PR.

I'm fairly certain that any consumer of etcd (eg: kube) will have major issues if they hit this bug. Maybe they can have a look?

@mfycheng
Copy link

From briefly reading some of the transaction code, it looks like comparisons against values always fail if the key doesn't exist[1]? I can't seem to find any documentation highlighting this behaviour, though. It appears that != works as expected if the key already exists.

[1] https://github.com/etcd-io/etcd/blob/release-3.3/etcdserver/apply.go#L443-L447

@purpleidea
Copy link
Contributor Author

@mfycheng No idea, however reversing the operation (shown in my example by switching the comments off) makes things work. So if anything this is a mistake.

I was fairly sure this used to work as I expected... I didn't bisect it though.

@mfycheng
Copy link

If I understand the reverse operation in your example correctly, it appears that it's doing (paraphrasing):

If(etcd.Compare(etcd.Value(key), "=", data)).Else(etcd.OpPut(key, data)).Commit()

Which would always result in the the Else condition being hit, because calling = on a key that doesn't exist will always return false for a value compare, just like how != aways returns false for a value compare.

@purpleidea
Copy link
Contributor Author

because calling = on a key that doesn't exist will always return false for a value compare

I think it's a bug, because we only see the error messages proto: no coders for int when this happens.

In addition, since this basically causes the else case to happen, it is equivalent to the code saying that they are equal, which is obviously wrong. If you really wanted this to be invalid, then resp should contain an error saying "key not exist" or similar.

@yuqitao
Copy link

yuqitao commented Mar 28, 2019

@purpleidea @xiang90
As @mfycheng said, it always return false regardless of equal or not equal when the key doesn't exist and the Compare Target is Compare_VALUE .

if len(rr.KVs) == 0 {

// Always fail if comparing a value on a key/keys that doesn't exist;
// nil == empty string in grpc; no way to represent missing value

But nil == empty string in grpc confouses me. it means, etcd.Compare(etcd.Value(key), "=", value) , the value is empty string, is ambiguous, if the key exists or the value of the key is empty ?
Anyway, not equal return false make us uncomfortable. Why not treat the case when the key doesn't
exist as empty string?
What's your opinion ?

As for proto: no coders for int, please look this issue:
grpc-ecosystem/grpc-gateway#259

There is a custom proto Golang file in etcd.

ValueSize int `protobuf:"bytes,7,opt,name=value_size,proto3"`

These structs seem only be used in log for format print of the request.
The similar issue seems that int field can't appear in the struct of proto go file, cause the length of int is uncertain, 32 or 64?
https://github.com/golang/protobuf/blob/4bd1920723d7b7c925de087aa32e2187708897f7/proto/properties.go#L349.

@xiang90
Copy link
Contributor

xiang90 commented Mar 28, 2019

@yuqitao

Can you create a PR and help to fix the stringer thing? Thanks!

@purpleidea
Copy link
Contributor Author

I added this design issue to work around the possibility of needing to do an IF_EXISTS AND IF_VALUE_IS sort of thing. Perhaps someone could have a look and comment on the viability.

#10571

Thanks!

@yylt
Copy link

yylt commented May 25, 2019

so what we expect is result decided by bytes.Compare and NOT_EXISTS mean "" ?

@jingyih
Copy link
Contributor

jingyih commented May 30, 2019

I agree it is confusing (and probably incorrect) that comparing value always return false if the key does not exist. The behavior should be better defined.

At the meantime, I propose to mitigate the issue by adding an additional check when using value compare, i.e., always check if the key exists before comparing its value.
clientv3.Compare(clientv3.ModRevision("key"), ">", 0)

@jingyih
Copy link
Contributor

jingyih commented May 30, 2019

I do not think the current behavior is logically correct, because both of the following comparisons will be evaluated as false. But I am not sure what the "correct" behavior should be. Regarding the idea of returning error when this happens, logically this is probably the right thing to do, but I think it is a pretty hard break on user-facing API.

  • clientv3.Compare(clientv3.Value(non-exist-key), "=", any-value)
  • clientv3.Compare(clientv3.Value(non-exist-key), "!=", any-value)

@purpleidea
Copy link
Contributor Author

@jingyih

Regarding the idea of returning error when this happens, logically this is probably the right thing to do, but I think it is a pretty hard break on user-facing API.

Yeah, I agree. I think it is something we should definitely consider for 3.4, and in particular, I do not think it should be a worry to "break" this behaviour because anyone who depended on it before most definitely had either incorrect or buggy code. So in the worst case, someone's code will "break" and they'll notice that they previously had a bug hidden there.

resp, err := client.KV.Txn(context.TODO()).If(ifs .... )

In the above statement, anytime we do something illogical, the err should be set.

@csuzhangxc
Copy link

Is this issue still active? and is there any plan to fix it?

@purpleidea
Copy link
Contributor Author

Is this issue still active? and is there any plan to fix it?

@csuzhangxc patches welcome I think.

@stale
Copy link

stale bot commented Apr 19, 2020

This issue has been automatically marked as stale because it has not had recent activity. It will be closed after 21 days if no further activity occurs. Thank you for your contributions.

@stale stale bot added the stale label Apr 19, 2020
@purpleidea
Copy link
Contributor Author

hi bot

@stale stale bot removed the stale label Apr 20, 2020
@ItalyPaleAle
Copy link

ItalyPaleAle commented May 4, 2020

I ran into the same issue and it took me a while to figure it out.

I was able to use a workaround:

func (s *store) setIfDifferent(key string, value string) (*clientv3.TxnResponse, error) {
	ctx, cancel := s.GetContext() // Get a context
	txn := s.client.Txn(ctx)
	resp, err := txn.If(
		clientv3.Compare(clientv3.ModRevision(key), "=", 0),
	).Then(
		clientv3.OpPut(key, value),
	).Commit()
	cancel()
	if err != nil {
		return nil, err
	}
	if resp.Succeeded {
		return resp, nil
	}
	ctx, cancel = s.GetContext() // Get a context
	txn = s.client.Txn(ctx)
	resp, err = txn.If(
		clientv3.Compare(clientv3.Value(key), "!=", value),
	).Then(
		clientv3.OpPut(key, value),
	).Commit()
	cancel()
	if err != nil {
		return nil, err
	}

	return resp, nil
}

Important: Note that this works for my app because the key I am trying to set to a desired state is never deleted. If the key you want to set can be deleted, there's the potential of a race condition causing unexpected behaviors

In general, may I recommend updating the documentation for etcd to clearly explain that comparisons are always false if the key doesn't exist? That would have saved me a couple of hours of debugging.

@purpleidea
Copy link
Contributor Author

@ItalyPaleAle, I think if TXN is to be used seriously, someone should just patch the operator... I'm too buried with other stuff, but would be happy to review your patch to the best of my abilities.

@ItalyPaleAle
Copy link

@purpleidea I totally agree with you. I shared my code in case it can help others, but I clearly wrote it's a workaround that, while it works for me, it might be dangerous for others. In fact, it does not perform the write in a single transaction.

I read the code and it looks like a patch won't be very simple. At least, I wouldn't know how to start fixing it.

Alternatively, something like #10571 could work as well, allowing us to use OR conditions too.

@stale
Copy link

stale bot commented Aug 2, 2020

This issue has been automatically marked as stale because it has not had recent activity. It will be closed after 21 days if no further activity occurs. Thank you for your contributions.

@stale stale bot added the stale label Aug 2, 2020
@ItalyPaleAle
Copy link

Not resolved

@stale
Copy link

stale bot commented Jan 29, 2021

This issue has been automatically marked as stale because it has not had recent activity. It will be closed after 21 days if no further activity occurs. Thank you for your contributions.

@stale stale bot added the stale label Jan 29, 2021
@ItalyPaleAle
Copy link

Hi bot

@stale stale bot removed the stale label Jan 29, 2021
@stale
Copy link

stale bot commented Apr 30, 2021

This issue has been automatically marked as stale because it has not had recent activity. It will be closed after 21 days if no further activity occurs. Thank you for your contributions.

@stale stale bot added the stale label Apr 30, 2021
@purpleidea
Copy link
Contributor Author

notstale

@stale stale bot removed the stale label Apr 30, 2021
@stale
Copy link

stale bot commented Jul 29, 2021

This issue has been automatically marked as stale because it has not had recent activity. It will be closed after 21 days if no further activity occurs. Thank you for your contributions.

@stale stale bot added the stale label Jul 29, 2021
@ItalyPaleAle
Copy link

👋 bot!

@stale stale bot removed the stale label Jul 29, 2021
@stale
Copy link

stale bot commented Oct 27, 2021

This issue has been automatically marked as stale because it has not had recent activity. It will be closed after 21 days if no further activity occurs. Thank you for your contributions.

@stale stale bot added the stale label Oct 27, 2021
@ItalyPaleAle
Copy link

Hi bot 👋

@stale stale bot removed the stale label Oct 27, 2021
@stale
Copy link

stale bot commented Jan 29, 2022

This issue has been automatically marked as stale because it has not had recent activity. It will be closed after 21 days if no further activity occurs. Thank you for your contributions.

@stale stale bot added the stale label Jan 29, 2022
@ItalyPaleAle
Copy link

Hi bot 👋

@stale stale bot removed the stale label Jan 29, 2022
@stale
Copy link

stale bot commented Apr 30, 2022

This issue has been automatically marked as stale because it has not had recent activity. It will be closed after 21 days if no further activity occurs. Thank you for your contributions.

@stale stale bot added the stale label Apr 30, 2022
@calmitchell617
Copy link

domo arigato mr roboto

@stale stale bot removed the stale label Apr 30, 2022
@stale
Copy link

stale bot commented Jul 31, 2022

This issue has been automatically marked as stale because it has not had recent activity. It will be closed after 21 days if no further activity occurs. Thank you for your contributions.

@stale stale bot added the stale label Jul 31, 2022
@stale stale bot closed this as completed Oct 16, 2022
@ahrtr ahrtr reopened this May 15, 2024
@ahrtr ahrtr removed the stale label May 15, 2024
@ahrtr
Copy link
Member

ahrtr commented May 15, 2024

@ArkaSaha30

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Development

Successfully merging a pull request may close this issue.