Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Manipulating Keys and Manipulating Users Concurrency Causes Segmentation Violation in Client #14764

Closed
Tracked by #14750
Magnitus- opened this issue Nov 15, 2022 · 3 comments · Fixed by #14770
Closed
Tracked by #14750

Comments

@Magnitus-
Copy link

Magnitus- commented Nov 15, 2022

What happened?

I created a bunch of keys concurrently with goroutines. I created a bunch of users concurrency with goroutines in a separate process.

The process generating the keys aborted with this:

{"level":"warn","ts":"2022-11-15T08:33:12.382-0500","logger":"etcd-client","caller":"[email protected]/retry_interceptor.go:62","msg":"retrying of unary invoker failed","target":"etcd-endpoints://0xc000215340/127.0.0.1:32379","attempt":0,"error":"rpc error: code = InvalidArgument desc = etcdserver: revision of auth store is old"}
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x30 pc=0x95a390]

goroutine 29 [running]:
go.etcd.io/etcd/client/v3.(*Client).unaryClientInterceptor.func1({0xb7ee68?, 0xc000122000?}, {0xaa0935, 0x14}, {0xa79540, 0xc0002ff1a0}, {0xa6f500, 0xc000376e70}, 0xc000266700, 0xadf000, ...)
	/home/eric/go/pkg/mod/go.etcd.io/etcd/client/[email protected]/retry_interceptor.go:81 +0x7f0
google.golang.org/grpc.(*ClientConn).Invoke(0xc000078a98?, {0xb7ee68?, 0xc000122000?}, {0xaa0935?, 0x8?}, {0xa79540?, 0xc0002ff1a0?}, {0xa6f500?, 0xc000376e70?}, {0xf69120, ...})
	/home/eric/go/pkg/mod/google.golang.org/[email protected]/call.go:35 +0x21f
go.etcd.io/etcd/api/v3/etcdserverpb.(*kVClient).Put(0xc000139230, {0xb7ee68, 0xc000122000}, 0x40f107?, {0xf69120, 0x3, 0x3})
	/home/eric/go/pkg/mod/go.etcd.io/etcd/api/[email protected]/etcdserverpb/rpc.pb.go:6469 +0xc9
go.etcd.io/etcd/client/v3.(*retryKVClient).Put(0xc000123bf0?, {0xb7ee68?, 0xc000122000?}, 0xc000078b38?, {0xf69120?, 0x8?, 0x0?})
	/home/eric/go/pkg/mod/go.etcd.io/etcd/client/[email protected]/retry.go:109 +0x30
go.etcd.io/etcd/client/v3.(*kv).Do(_, {_, _}, {0x2, {0xc000123be0, 0xa, 0x10}, {0x0, 0x0, 0x0}, ...})
	/home/eric/go/pkg/mod/go.etcd.io/etcd/client/[email protected]/kv.go:156 +0x2f8
go.etcd.io/etcd/client/v3.(*kv).Put(0x1?, {0xb7ee68, 0xc000122000}, {0xc000123bd0?, 0x0?}, {0xa961c5?, 0x0?}, {0x0, 0x0, 0x0})
	/home/eric/go/pkg/mod/go.etcd.io/etcd/client/[email protected]/kv.go:114 +0xd3
main.churnKeys.func1()
	/home/eric/Projects/test/etcd-bug/main.go:61 +0x15c
created by main.churnKeys
	/home/eric/Projects/test/etcd-bug/main.go:57 +0x53

What did you expect to happen?

Ideally, I expected the client manipulating the keys (root user not impacted by the user manipulations) to handle the user changes gracefully.

However, I would have settled for getting an error in the *clientv3.Client.Put call return value without the panic.

How can we reproduce it (as minimally and precisely as possible)?

  1. Run etcd on a local kubernetes with the following terraform scripts: https://github.com/Ferlab-Ste-Justine/terraform-provider-etcd/tree/main/test-environment/server

Commands to launch etcd (assumes you have a localhost kubernetes not running in a vm):

terraform apply
  1. In a separate directory paste the following code in main.go:
package main

import (
	"context"
	"crypto/tls"
	"crypto/x509"
	"errors"
	"fmt"
	"io/ioutil"
	"os"
	"sync"
	"sync/atomic"

	clientv3 "go.etcd.io/etcd/client/v3"
)

func connect() (*clientv3.Client, error) {
	tlsConf := &tls.Config{}

	//User Cert
	certData, err := tls.LoadX509KeyPair("./certs/root.pem", "./certs/root.key")
	if err != nil {
		return nil, errors.New(fmt.Sprintf("Failed to load user credentials: %s", err.Error()))
	}
	(*tlsConf).Certificates = []tls.Certificate{certData}

	//Ca Cert
	caCertContent, err := ioutil.ReadFile("./certs/ca.pem")
	if err != nil {
		return nil, errors.New(fmt.Sprintf("Failed to read root certificate file: %s", err.Error()))
	}
	roots := x509.NewCertPool()
	ok := roots.AppendCertsFromPEM(caCertContent)
	if !ok {
		return nil, errors.New("Failed to parse root certificate authority")
	}
	(*tlsConf).RootCAs = roots

	//Connection
	cli, connErr := clientv3.New(clientv3.Config{
		Endpoints:   []string{"127.0.0.1:32379"},
		TLS:         tlsConf,
	})
	if connErr != nil {
		return nil, errors.New(fmt.Sprintf("Failed to connect to etcd servers: %s", connErr.Error()))
	}

	return cli, nil
}

func churnKeys(cli *clientv3.Client, cycles int64, paralellism int) {
	var wg sync.WaitGroup
	counter := int64(0)

	for i := 0; i < paralellism; i++ {
		wg.Add(1)
		go func() {
			defer wg.Done()
			for counter < cycles {
				idx := atomic.AddInt64(&counter, 1)
				_, putErr := cli.Put(context.Background(), fmt.Sprintf("test-key-%d", idx), "test")
				if putErr != nil {
					fmt.Println(putErr)
					continue
				}

				_, delErr := cli.Delete(context.Background(), fmt.Sprintf("test-key-%d", idx))
				if delErr != nil {
					fmt.Println(putErr)
					continue
				}			
			}	
		}()
	}

	wg.Wait()
}

func churnUsers(cli *clientv3.Client, cycles int64, paralellism int) {
	var wg sync.WaitGroup
	counter := int64(0)

	for i := 0; i < paralellism; i++ {
		wg.Add(1)
		go func() {
			defer wg.Done()
			for counter < cycles {
				idx := atomic.AddInt64(&counter, 1)

				_, addErr := cli.UserAdd(context.Background(), fmt.Sprintf("test-user-%d", idx), "test")
				if addErr != nil {
					fmt.Println(addErr)
					continue
				}

				_, delErr := cli.UserDelete(context.Background(), fmt.Sprintf("test-user-%d", idx))
				if delErr != nil {
					fmt.Println(delErr)
					continue
				}
			}
		}()
	}

	wg.Wait()
}

func main() {
	cli, connErr := connect()
	if connErr != nil {
		fmt.Println(connErr)
		os.Exit(1)
	}

	//churnKeys(cli, 1000, 10)
	//churnUsers(cli, 1000, 10)
}
  1. Copy the generated "certs" from your terraform scripts with the main.go file and run:
go mod init etcd-bug
go mod tidy
  1. Uncomment the churnKeys(cli, 1000, 10) line and run:
go build .
mv etcd-bug etcd-bug-keys
  1. Comment the churnKeys(cli, 1000, 10) line, uncomment the churnUsers(cli, 1000, 10) line and run:
go build .
mv etcd-bug etcd-bug-users
  1. In two separate shells: Run etcd-bug-users and before it finishes, run etcd-bug-keys in the other shell. The etcd-bug-keys process will panic with the above error.

Anything else we need to know?

Interestingly, when the server was running 3.4, the client only produced a user error without the panic and segmentation violation. That started occurring once I switched the etcd server to 3.5.

Etcd version (please run commands below)

# etcd --version
etcd Version: 3.5.5
Git SHA: 19002cfc6
Go Version: go1.16.15
Go OS/Arch: linux/amd64

Etcd configuration (command line flags or environment variables)

paste your configuration here

Etcd debug information (please run commands blow, feel free to obfuscate the IP address or FQDN in the output)

$ etcdctl member list -w table
# paste output here

$ etcdctl --endpoints=<member list> endpoint status -w table
# paste output here

Relevant log output

No response

@ahrtr
Copy link
Member

ahrtr commented Nov 16, 2022

Thanks @Magnitus- for reporting this issue. I reproduced it locally, and confirmed that it's a real issue. Will fix it in 3.5.6.

ahrtr added a commit to ahrtr/etcd that referenced this issue Nov 16, 2022
ahrtr added a commit to ahrtr/etcd that referenced this issue Nov 16, 2022
ahrtr added a commit to ahrtr/etcd that referenced this issue Nov 16, 2022
ahrtr added a commit to ahrtr/etcd that referenced this issue Nov 16, 2022
ahrtr added a commit to ahrtr/etcd that referenced this issue Nov 16, 2022
ahrtr added a commit to ahrtr/etcd that referenced this issue Nov 16, 2022
@serathius serathius mentioned this issue Nov 16, 2022
22 tasks
@Magnitus-
Copy link
Author

Thanks :)

@ahrtr
Copy link
Member

ahrtr commented Nov 16, 2022

etcd 3.4 is also affected, so I need to backport the fix to 3.4 as well.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

Successfully merging a pull request may close this issue.

3 participants