Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

All queries type fail with same error message #303

Open
cengpxf opened this issue Nov 24, 2024 · 15 comments
Open

All queries type fail with same error message #303

cengpxf opened this issue Nov 24, 2024 · 15 comments
Assignees
Labels
possible-bug Something isn't working

Comments

@cengpxf
Copy link

cengpxf commented Nov 24, 2024

Deployment Type

Docker

Version

v2.0.4

Steps to Reproduce

Any type of query against any router says "Something went Wrong"
Using debian 12.8 with latest python3, nodejs, etc in docker. This is an initial setup of hyperglass from scratch.
Setting up devices and getting authenticated was successful. Logging in via cli and running as the hyperglass looking glass user works, but not via site. It looks like a timeout issue based on the logs.
I haven't created a config.yaml (not sure where that file should go yet, /etc/hyperglass ?) so that file isn't included.

Expected Behavior

BGP route, Ping or Traceroute should show the some result. In theory, the same as when logged in via cli.

Observed Behavior

I found this post, I was able to apply it, but it does not change or fix the issue.
#274

Configuration

No response

Devices

devices:
  - name: Customer Router 1
    address: <ip>
    credential:
      username: hg_lg_user
      password: passwd_here
    platform: juniper
    attrs:
      source4: ipv4_IP_here
      source6: ipv6_IP_here

Logs

hyperglass-1  | INFO - 2024-11-24 12:43:34,266 - paramiko.transport - transport - Connected (version 2.0, client OpenSSH_7.5)
hyperglass-1  | INFO - 2024-11-24 12:43:35,538 - paramiko.transport - transport - Authentication (password) successful!
hyperglass-1  | 2024-11-24 12:43:45.915 | CRITICAL | **hyperglass.api.error_handlers:default_handler:48 - Error**
hyperglass-1  | 2024-11-24 12:59:59.646 | CRITICAL | hyperglass.exceptions._common:__init__:34 - Request timed out. (Connection timed out)
hyperglass-1  | ERROR - 2024-11-24 12:59:59,647 - asyncio - runners - Exception in callback Loop._read_from_self
hyperglass-1  | handle: <Handle Loop._read_from_self>
hyperglass-1  | Traceback (most recent call last):
hyperglass-1  |   File "uvloop/cbhandles.pyx", line 66, in uvloop.loop.Handle._run
hyperglass-1  |   File "uvloop/loop.pyx", line 399, in uvloop.loop.Loop._read_from_self
hyperglass-1  |   File "uvloop/loop.pyx", line 404, in uvloop.loop.Loop._invoke_signals
hyperglass-1  |   File "uvloop/loop.pyx", line 379, in uvloop.loop.Loop._ceval_process_signals
hyperglass-1  |   File "/opt/hyperglass/hyperglass/execution/main.py", line 41, in handler
hyperglass-1  |     raise DeviceTimeout(**exc_args)
hyperglass-1  | hyperglass.exceptions.public.DeviceTimeout: Request timed out. (Connection timed out)
@cengpxf cengpxf added the possible-bug Something isn't working label Nov 24, 2024
@WalkerD243
Copy link

WalkerD243 commented Dec 2, 2024

A problem with firewall rules or netconf being disabled on the target device could be the source of your problem. Test both of these with the command:

ssh hg_lg_user@ip-s netconf -p 22 and ssh hg_lg_user@ip -s netconf -p 830.

Login with the password at the prompt and for juniper devcies you should see:

<!-- No zombies were killed during the creation of this user interface -->
<!-- user xyz, class j-su -->
<hello xmlns="urn:ietf:params:xml:ns:netconf:base:1.0">
  <capabilities>
    <capability>urn:ietf:params:netconf:base:1.0</capability>
    <capability>urn:ietf:params:netconf:capability:candidate:1.0</capability>
    <capability>urn:ietf:params:netconf:capability:confirmed-commit:1.0</capability>
    <capability>urn:ietf:params:netconf:capability:validate:1.0</capability>
    <capability>urn:ietf:params:netconf:capability:url:1.0?scheme=http,ftp,file</capability>
    <capability>urn:ietf:params:xml:ns:netconf:base:1.0</capability>
    <capability>urn:ietf:params:xml:ns:netconf:capability:candidate:1.0</capability>
    <capability>urn:ietf:params:xml:ns:netconf:capability:confirmed-commit:1.0</capability>
    <capability>urn:ietf:params:xml:ns:netconf:capability:validate:1.0</capability>
    <capability>urn:ietf:params:xml:ns:netconf:capability:url:1.0?scheme=http,ftp,file</capability>
    <capability>urn:ietf:params:xml:ns:yang:ietf-netconf-monitoring</capability>
    <capability>http://xml.juniper.net/netconf/junos/1.0</capability>
    <capability>http://xml.juniper.net/dmi/system/1.0</capability>
  </capabilities>
  <session-id>12507</session-id>
</hello>
]]>]]>

That will show you:

  • your server can eastablish a netconf session with your target device on the specified ipv4 address on port 22 & 830
  • the target devices netconf capabilitys

@cengpxf
Copy link
Author

cengpxf commented Dec 2, 2024

So, it works on port 22, but not 830, which means netconf is not working. I must have missed the part of the instructions where configuring netconf is required.

@cengpxf
Copy link
Author

cengpxf commented Dec 2, 2024

Well,
This command does not work:
ssh user@host -s netconf -p 830
subsystem request failed on channel 0

but this command does work:
ssh -p 830 user@host -s netconf

So, is the issue the order of options in the command that is being sent to the Juniper router?

@WalkerD243
Copy link

Sorry, I didn't try the -p 830 flag myself, just thought it should work based on the syntax. If port 22 is showing you the compatabilitys this seems good. Is there something in the router log ?

@cengpxf
Copy link
Author

cengpxf commented Dec 3, 2024

From /var/log/message: the ssh login for the lg user is accepted from /var/log/messages

Dec 3 05:09:26 r-pxf-ce1 sshd[37205]: Accepted password for lg from port 33458 ssh2

@WalkerD243
Copy link

You're using the default directives right ? Does your LG user have the needed privileges on the router ?

@cengpxf
Copy link
Author

cengpxf commented Dec 3, 2024

By default directives, what are you referring to?

My config.yaml

devices:

  • name: Router 1
    address: <{mgmt ip of router>}
    credential:
    username: lg
    password: {pwd}
    platform: juniper
    attrs:
    source4: {ipv4 loopback of router}
    source6: {ipv6 loopback of router}

permissions for the user:

class looking-glass {
permissions view-configuration;
allow-commands "(show)|(ping)|(traceroute)";
deny-commands "(clear)|(file)|(file show)|(help)|(load)|(monitor)|(op)|(request)|(save)|(set)|(start)|(test)";
allow-configuration show;
deny-configuration all;
}
user lg {
class looking-glass;
authentication {
{encrypted password} ## SECRET-DATA
}
}

@cengpxf
Copy link
Author

cengpxf commented Dec 6, 2024

Yes, I confirmed that I am using the default directives. I have not changed anything.
I know the user works.
Are there any logs anywhere that would help give me a better error description of what it is doing?

The console of the docker shows whenever I try to run a ping, traceroute, etc.
CRITICAL| hyperglass.api.error_handlers:default_handler:48 - Error

@cengpxf
Copy link
Author

cengpxf commented Dec 11, 2024

I turned on debug mode

hyperglass-1 | [DEBUG] 20241211 20:58:13 |51 | collect → Connecting to device {'device': 'Core Router 1', 'address': 'None:None', 'proxy': None}
hyperglass-1 | [CRITICAL] 20241211 20:58:25 |48 | default_handler → Error {'method': 'POST', 'path': '/api/query', 'detail': "\n\nPattern not detected: 'Screen width set to' in output.\n\nThings you might try to fix this:\n1. Adjust the regex pattern to better identify the terminating string. Note, in\nmany situations the pattern is automatically based on the network device's prompt.\n2. Increase the read_timeout to a larger value.\n\nYou can also look at the Netmiko session_log or debug log for more information.\n\n"}
hyperglass-1 | ERROR - 2024-12-11 20:58:25,563 - litestar - config - Uncaught exception (connection_type=http, path=/api/query):
hyperglass-1 | Traceback (most recent call last):
hyperglass-1 | File "/usr/local/lib/python3.12/site-packages/litestar/middleware/_internal/exceptions/middleware.py", line 159, in call
hyperglass-1 | await self.app(scope, receive, capture_response_started)
hyperglass-1 | File "/usr/local/lib/python3.12/site-packages/litestar/routes/http.py", line 80, in handle
hyperglass-1 | response = await self._get_response_for_request(
hyperglass-1 | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
hyperglass-1 | File "/usr/local/lib/python3.12/site-packages/litestar/routes/http.py", line 132, in _get_response_for_request
hyperglass-1 | return await self._call_handler_function(
hyperglass-1 | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
hyperglass-1 | File "/usr/local/lib/python3.12/site-packages/litestar/routes/http.py", line 152, in _call_handler_function
hyperglass-1 | response_data, cleanup_group = await self._get_response_data(
hyperglass-1 | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
hyperglass-1 | File "/usr/local/lib/python3.12/site-packages/litestar/routes/http.py", line 192, in _get_response_data
hyperglass-1 | else await route_handler.fn(**parsed_kwargs)
hyperglass-1 | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
hyperglass-1 | File "/opt/hyperglass/hyperglass/api/routes.py", line 111, in query
hyperglass-1 | output = await execute(data)
hyperglass-1 | ^^^^^^^^^^^^^^^^^^^
hyperglass-1 | File "/opt/hyperglass/hyperglass/execution/main.py", line 67, in execute
hyperglass-1 | response = await driver.collect()
hyperglass-1 | ^^^^^^^^^^^^^^^^^^^^^^
hyperglass-1 | File "/opt/hyperglass/hyperglass/execution/drivers/ssh_netmiko.py", line 87, in collect
hyperglass-1 | nm_connect_direct = ConnectHandler(**driver_kwargs)
hyperglass-1 | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
hyperglass-1 | File "/usr/local/lib/python3.12/site-packages/netmiko/ssh_dispatcher.py", line 365, in ConnectHandler
hyperglass-1 | return ConnectionClass(*args, **kwargs)
hyperglass-1 | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
hyperglass-1 | File "/usr/local/lib/python3.12/site-packages/netmiko/base_connection.py", line 439, in init
hyperglass-1 | self._open()
hyperglass-1 | File "/usr/local/lib/python3.12/site-packages/netmiko/base_connection.py", line 445, in _open
hyperglass-1 | self._try_session_preparation()
hyperglass-1 | File "/usr/local/lib/python3.12/site-packages/netmiko/base_connection.py", line 904, in _try_session_preparation
hyperglass-1 | self.session_preparation()
hyperglass-1 | File "/usr/local/lib/python3.12/site-packages/netmiko/juniper/juniper.py", line 24, in session_preparation
hyperglass-1 | self.set_terminal_width(command=cmd, pattern=r"Screen width set to")
hyperglass-1 | File "/usr/local/lib/python3.12/site-packages/netmiko/base_connection.py", line 1260, in set_terminal_width
hyperglass-1 | output = self.read_until_pattern(pattern=pattern)
hyperglass-1 | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
hyperglass-1 | File "/usr/local/lib/python3.12/site-packages/netmiko/base_connection.py", line 672, in read_until_pattern
hyperglass-1 | raise ReadTimeout(msg)
hyperglass-1 | netmiko.exceptions.ReadTimeout:
hyperglass-1 |
hyperglass-1 | Pattern not detected: 'Screen width set to' in output.
hyperglass-1 |
hyperglass-1 | Things you might try to fix this:
hyperglass-1 | 1. Adjust the regex pattern to better identify the terminating string. Note, in
hyperglass-1 | many situations the pattern is automatically based on the network device's prompt.
hyperglass-1 | 2. Increase the read_timeout to a larger value.
hyperglass-1 |
hyperglass-1 | You can also look at the Netmiko session_log or debug log for more information.

@cengpxf
Copy link
Author

cengpxf commented Dec 11, 2024

Looks like I have the same issue as:
#179

@WalkerD243
Copy link

Interesting. I got that same problem with netmiko failing to parse the output/ prompt with the command show route receive-protocol bgp because juniper first outputs a blinking cursor while its loading the result. Netmiko interprets the cursor as the end result.

However you're using the default commands, which should work. When you're logging in with the hyperglass user on the router and executing the same command ( show bgp route for the default route command) do you also get the 'Screen width set to' displayed ?

@cengpxf
Copy link
Author

cengpxf commented Dec 13, 2024

When I login as the hyperglass user on the router, and run "show route receive-protocol bgp {neighbor-ip}", it pauses for 1 to 5 seconds, and then starts outputting the prefixesc. These routers in question have full route tables, as in, all ipv4/ipv6 on the internet. The timeout is set to 120 seconds, per editing the ssh_netmiko.py file line 56.

I spun up a new linux VM and did a clean install (docker) with the same results before increasing the timeout, removing 3DES, etc to reduce errors/warnings.
I even tried the add_pyez_driver fork, but that would not install.
I have not tried a manual installation yet.

I have tested the same user with different software that doesn't have as many features as hyperglass without issues. So I don't believe it is a permissions issue.

@WalkerD243
Copy link

I meant this here:
image
as the router executed on also has a fulltable the command takes a while to load though 1.1.1.1 is not even a neighbor. The blinking yellow cursor is read by the screen grabber of netmiko and interpreted as the answer to the command. Therefore the timeout should be not relevant as the cursor will appear after a second.

That's why I asked "When you're logging in with the hyperglass user on the router and executing the same command ( show bgp route for the default route command) do you also get the 'Screen width set to' displayed ?"

I think 'Screen width set to' triggers the same problem as the blinking Cursor does for me

@cengpxf
Copy link
Author

cengpxf commented Dec 16, 2024

If the cursor is the issue, that would explain why ping 1.1.1.1 produces the same error as traceroute, show route receieve-protocol bgp, etc.

@WalkerD243
Copy link

There are parameters for netmiko to change the expected prompt for the screen grabbing in these scenarios, but i don't know how to set them on hyperglass

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
possible-bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants