Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"ERROR: ssh: subsystem request failed" with openssh client to node with clustername in node_name #19524

Open
programmerq opened this issue Dec 20, 2022 · 3 comments
Labels
bug OpenSSH For customers using Teleport and OpenSSH sales-onboarding Issues related to prospects

Comments

@programmerq
Copy link
Contributor

Expected behavior:

I'm not entirely sure that this is truly a bug since one must always include the .clustername portion when connecting using openssh compatibility with the out-of-the-box ssh config file. It is much less intuitive to figure out since the host does match the Host block in the generated config file.

At the very least, it would be nice if tsh proxy ssh could be a little bit smarter about these sorts of situations and advise, or be willing to tack on the cluster name if needed. It has the --cluster argument in the generated ProxyCommand, and it blindly passes the user-provided name to the proxy without validating or warning.

At the very least, it could have similar behavior to tsh ssh:

$ tsh ssh d
ERROR: failed connecting to node d. Teleport proxy failed to connect to "node" agent "d:3022" over direct dial:

  dial tcp: lookup d on 10.100.0.10:53: no such host

This usually means that the agent is offline or has disconnected. Check the
agent logs and, if the issue persists, try restarting it or re-registering it
with the cluster.
$ ssh -F ssh_config d.example.com
ERROR: failed connecting to node d. Teleport proxy failed to connect to "node" agent "d:3022" over direct dial:

  dial tcp: lookup d on 10.100.0.10:53: no such host

This usually means that the agent is offline or has disconnected. Check the
agent logs and, if the issue persists, try restarting it or re-registering it
with the cluster.

Current behavior:

When using the ssh config from tsh config, and connecting to a node that happens to have a node_name that includes .clustername as seen in tsh ls, the ssh client exits with the error "ERROR: ssh: subsystem request failed"

$ tsh ls
Node Name                 Address        Labels
------------------------- -------------- ------------------------------------------------------
d.example.com             ⟵ Tunnel       hostname=d.example.com
$ tsh ssh d.example.com whoami
jeff
$ tsh config > ssh_config 
$ ssh -F ssh_config d.example.com whoami
ERROR: ssh: subsystem request failed
$ "/usr/local/bin/tsh" proxy ssh --cluster=example.com --proxy=teleport.example.com d.example.com:3022
ERROR: ssh: subsystem request failed

When this error occurs, the following appears on the proxy logs:

User Message: Teleport proxy failed to connect to "node" agent "d:3022" over direct dial:

  dial tcp: lookup d on 10.100.0.10:53: no such host

This usually means that the agent is offline or has disconnected. Check the
agent logs and, if the issue persists, try restarting it or re-registering it
with the cluster.] regular/sshserver.go:1979

To work around this issue, the end user can supply the .example.com portion twice:

$ ssh d.example.com.d.example.com whoami
jeff

This is counterintuitive

Bug details:

Teleport version - Teleport Enterprise v11.1.2 git:v11.1.2-0-g2494343f5 go1.19.2

Recreation steps -

Debug logs -

@programmerq programmerq added bug OpenSSH For customers using Teleport and OpenSSH labels Dec 20, 2022
@webvictim
Copy link
Contributor

webvictim commented Apr 6, 2023

Just run into this bug myself. Adding the cluster name twice works but is ugly and hard for end-users to understand. We should fix this or provide a better workaround.

@webvictim webvictim added the sales-onboarding Issues related to prospects label Apr 6, 2023
@webvictim
Copy link
Contributor

This issue appears to have got slightly worse in 13.0.0-alpha.1 as you now need to quote the cluster domain name THREE times in the command for this to work, rather than twice. Going to try and get a repro when I have time and post more details.

@AlexPads
Copy link

Hello, after upgrading to v14.1.0 and enabling case_insensitive_routing: true in the proxy teleport.yaml file, the default ansible config / setup from the guide https://goteleport.com/docs/server-access/guides/ansible/ is working as expected.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug OpenSSH For customers using Teleport and OpenSSH sales-onboarding Issues related to prospects
Projects
None yet
Development

No branches or pull requests

3 participants