Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Scylla Manager, under certain condition, is unable to use only SSL port (9142) to restore data #4079

Open
ppalczak opened this issue Oct 24, 2024 · 4 comments · May be fixed by #4114
Open
Assignees
Labels
bug Something isn't working
Milestone

Comments

@ppalczak
Copy link

scylla-manager and scylla-manager-agent : 3.2.8
Scylla version: 2024.1.9

ISSUE:

While we were testing data restore through SSL (port 9142) we were unable to proceed due to below errors.

The whole cluster nodes are configured to use only SSL(9142 and 9042 is commented in scylla.yaml file in cluster's node).

scylla.yaml config in each scylla node

#native_transport_port: 9042 
native_transport_port_ssl: 9142

client_encryption_options:
    enabled: true
    certificate: /etc/ssl/db.crt
    keyfile: /etc/ssl/db.key
    truststore: /etc/ssl/cadb.pem

scylla node does not listen on 9042 port

[root@node3 ~]# netstat -ant | grep 9042
[root@node3 ~]# netstat -ant | grep 9142
tcp        0      0 192.168.254.85:19142    0.0.0.0:*               LISTEN
tcp        0      0 192.168.254.85:9142     0.0.0.0:*               LISTEN

sctool cluster add : works fine
sctool backup : works fine
sctool restore ... --restore-schema/--restore-tables : it fails with the same error

# sctool restore -c test-cluster-ssl --snapshot-tag sm_20241024064123UTC --restore-schema  --location s3:scylla-backup
Error: create restore target, units and views: create worker: get CQL cluster session: gocql: unable to create session: unable to discover protocol version: dial tcp :0->192.168.254.48:9042: connect: connection refused
Trace ID: AkZV8BhoTPG1zy-wHT5qIg (grep in scylla-manager logs)

Even if we provide --ssl-user-cert-file and --ssl-user-key-file, it doesn't seem to be working.
Once we re-enable non-ssl port on scylla node (scylla.yaml), we're hit the following error.

root@scyllamanager:~# sctool restore -c test-cluster-ssl --snapshot-tag sm_20241024064123UTC --restore-schema  --location s3:scylla-backup
Error: create restore target, units and views: create worker: get CQL cluster session: gocql: unable to create session: unable to discover protocol version: tls: first record does not look like a TLS handshake
Trace ID: H9d3CeRGQb-gWXY_LHx8dA (grep in scylla-manager logs)

Next step, set --force-tls-disabled=true. Now, it does not complain about TLS anymore.

root@scyllamanager:~# sctool cluster update -c test-cluster-ssl --force-tls-disabled=true
# sctool restore -c test-cluster-ssl --snapshot-tag sm_20241024064123UTC  --restore-schema --location s3:scylla-backup
restore/4175e435-c52b-4aa1-a95e-927e4218c854
@karol-kokoszka
Copy link
Collaborator

@pidiaquez
Copy link

@karol-kokoszka from error and repro it's seems clear to me it's not a username/password issue
Error is coming from the transport layer, specifically SM it's failing to recognize if must use TLS or clear text

unable to discover protocol version: tls: first record does not look like a TLS handshake

username and password are sent over the socket/channel once the communication is established ,
here the session is never established. thus you can't send a username and password yet.

And if you see below the exact same command will succeed when the "--force-tls-disabled=true" is used
this confirm username and password was correctly configured, and not related.

@maesta2
Copy link

maesta2 commented Oct 30, 2024

@ppalczak Are there CQL credentials added to the cluster here ? https://manager.docs.scylladb.com/stable/sctool/cluster.html#p-password https://manager.docs.scylladb.com/stable/sctool/cluster.html#u-username

Yes, it was added when adding cluster to SM

@Michal-Leszczynski Michal-Leszczynski self-assigned this Oct 30, 2024
@Michal-Leszczynski
Copy link
Collaborator

This is a bug in SM implementation.
Even though we choose the CQL port correctly in extendClusterConfigWithTLS:

	cqlPort := ni.CQLPort()
	if ni.ClientEncryptionEnabled && !cluster.ForceTLSDisabled {
		if !cluster.ForceNonSSLSessionPort {
			cqlPort = ni.CQLSSLPort()
		}

We pass hosts extended with non-SSL port directly to session creation.

	// Fill hosts if they weren't specified by the options
	if len(cfg.Hosts) == 0 {
		sessionHosts, err := GetRPCAddresses(ctx, client, client.Config().Hosts) // <- here we get hosts extended with non-SSL ports
		if err != nil {
			s.logger.Info(ctx, "Gets session", "err", err)
			if errors.Is(err, ErrNoRPCAddressesFound) {
				return session, err
			}
		}
		cfg.Hosts = sessionHosts // <- here we set hosts in session cfg
	}

	ni, err := client.AnyNodeInfo(ctx)
	if err != nil {
		return session, errors.Wrap(err, "fetch node info")
	}
	if err := s.extendClusterConfigWithAuthentication(clusterID, ni, cfg); err != nil {
		return session, err
	}
	if err := s.extendClusterConfigWithTLS(ctx, clusterID, ni, cfg); err != nil {
		return session, err
	}

	return gocqlx.WrapSession(cfg.CreateSession())
// GetRPCAddresses accepts client and hosts parameters that are used later on to query client.NodeInfo endpoint
// returning RPC addresses for given hosts.
// RPC addresses are the ones that scylla uses to accept CQL connections.
func GetRPCAddresses(ctx context.Context, client *scyllaclient.Client, hosts []string) ([]string, error) {
	var sessionHosts []string
	var combinedError error
	for _, h := range hosts {
		ni, err := client.NodeInfo(ctx, h)
		if err != nil {
			combinedError = multierr.Append(combinedError, err)
			continue
		}
		sessionHosts = append(sessionHosts, ni.CQLAddr(h)) // <- here we always take non-SSL CQL addr
	}

	if len(sessionHosts) == 0 {
		combinedError = multierr.Append(ErrNoRPCAddressesFound, combinedError)
	}

	return sessionHosts, combinedError
}

This means that the default port set to SSL CQL port is overwritten by the per host non-SSL CQL port.

@Michal-Leszczynski Michal-Leszczynski added this to the 3.4.1 milestone Oct 30, 2024
@Michal-Leszczynski Michal-Leszczynski added the bug Something isn't working label Oct 30, 2024
@VAveryanov8 VAveryanov8 linked a pull request Nov 15, 2024 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants