Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gNMI for Ciena SAOS 10.x devices fails path string validation #11903

Closed
whizkidTRW opened this issue Sep 28, 2022 · 32 comments · Fixed by #12272
Closed

gNMI for Ciena SAOS 10.x devices fails path string validation #11903

whizkidTRW opened this issue Sep 28, 2022 · 32 comments · Fixed by #12272
Assignees
Labels
area/gnmi bug unexpected problem or unintended behavior plugin/input 1. Request for new input plugins 2. Issues/PRs that are related to input plugins

Comments

@whizkidTRW
Copy link

Relevant telegraf.conf

[[inputs.gnmi.subscription]]
     name = "ifcounters"
     origin = "openconfig-interfaces"
     path = "/oc-if:interfaces/oc-if:interface[name=7]/oc-if:state/oc-if:counters"
     subscription_mode = "sample"
     sample_interval = "10s"

Logs from Telegraf

[telegraf] Error running agent: starting input inputs.gnmi: invalid string path /o-if:interfaces/oc-if:interface[name=7]/oc-if:state/oc-if:counters: invalid node name: "o-if:interfaces"

System info

Telegraf 1.22.0+, Debian 11 (bullseye), Docker 20.10.5

Docker

telegraf:
image: telegraf
container_name: telegraf
restart: always
volumes:
- /opt/TIGstack/telegraf/etc:/etc/telegraf:rw
- /opt/TIGstack/telegraf.conf:/etc/telegraf/telegraf.conf:rw
- /opt/TIGstack/var/lib/mibs/ietf:/var/lib/mibs:rw
- /opt/TIGstack/var/lib/mibs/ietf:/usr/share/snmp/mibs:rw
- /opt/TIGstack/telegraf/etc/ca.cert.pem:/etc/telegraf/ca.cert.pem
- /opt/TIGstack/telegraf/etc/client.cert.pem:/etc/telegraf/client.cert.pem
- /opt/TIGstack/telegraf/etc/client.key.pem:/etc/telegraf/client.key.pem
depends_on:
- influxdb
links:
- influxdb
ports:
- '8125:8125'

Steps to reproduce

  1. Using telegraf v1.22 and up

Expected behavior

Successful gNMI subscription consistent with version 1.21 and lower

Actual behavior

Telegraf fails with "invalid string path" error and exits

Additional info

No response

@whizkidTRW whizkidTRW added the bug unexpected problem or unintended behavior label Sep 28, 2022
@peterbaumert
Copy link

If you specify "openconfig-interfaces" as origin you dont have to put "oc-if:" part in the path afaik

@whizkidTRW
Copy link
Author

Hmm, pretty sure I tried that, but I'll give it a go just as soon as I can and report back.

@whizkidTRW
Copy link
Author

No go. Regardless of origin="openconfig-interfaces" or origin="oc-if", I now get:

2022-09-28T20:23:49Z E! [inputs.gnmi] Subscribe error (7), "Access Denied"
2022-09-28T20:23:49Z E! [inputs.gnmi] Subscribe error (7), "Access Denied"
2022-09-28T20:23:49Z E! [inputs.gnmi] Subscribe error (7), "Access Denied"
2022-09-28T20:23:50Z E! [inputs.gnmi] Subscribe error (7), "Access Denied"
2022-09-28T20:23:50Z E! [inputs.gnmi] Subscribe error (7), "Access Denied"

with path = "/interfaces/interface[name=7]/state/counters"

@peterbaumert
Copy link

But access denied seems like the user you use doesnt have rights to subscribe to gnmi. You should check.

@whizkidTRW
Copy link
Author

whizkidTRW commented Oct 2, 2022 via email

@ddichev-hub
Copy link

ddichev-hub commented Oct 14, 2022

I am running into the same problem with telegraf 1.24 and finding out that this is a problem for any sensor. Not only oc-interfaces.
Also confirmed that in the fail scenario, telegraf doesn't even reach to the device.

Here debug logs with 1.21.4 (working) and 1.24.2 (fail) using exact same config file.

Attaching to network-monitoring-grafana-1, network-monitoring-influxdb-1, network-monitoring-telegraf-1
network-monitoring-telegraf-1  | 2022-10-14T05:41:23Z I! Starting Telegraf 1.21.4
network-monitoring-telegraf-1  | 2022-10-14T05:41:23Z I! Using config file: /etc/telegraf/telegraf.conf
network-monitoring-telegraf-1  | 2022-10-14T05:41:23Z I! Loaded inputs: gnmi
network-monitoring-telegraf-1  | 2022-10-14T05:41:23Z I! Loaded aggregators: 
network-monitoring-telegraf-1  | 2022-10-14T05:41:23Z I! Loaded processors: 
network-monitoring-telegraf-1  | 2022-10-14T05:41:23Z I! Loaded outputs: file influxdb_v2
network-monitoring-telegraf-1  | 2022-10-14T05:41:23Z I! Tags enabled: host=Laptop
network-monitoring-telegraf-1  | 2022-10-14T05:41:23Z I! [agent] Config: Interval:10s, Quiet:false, Hostname:"Laptop", Flush Interval:10s
network-monitoring-telegraf-1  | 2022-10-14T05:41:23Z D! [agent] Initializing plugins
network-monitoring-telegraf-1  | 2022-10-14T05:41:23Z D! [agent] Connecting outputs
network-monitoring-telegraf-1  | 2022-10-14T05:41:23Z D! [agent] Attempting connection to [outputs.file]
network-monitoring-telegraf-1  | 2022-10-14T05:41:23Z D! [agent] Successfully connected to outputs.file
network-monitoring-telegraf-1  | 2022-10-14T05:41:23Z D! [agent] Attempting connection to [outputs.influxdb_v2]
network-monitoring-telegraf-1  | 2022-10-14T05:41:23Z D! [agent] Successfully connected to outputs.influxdb_v2
network-monitoring-telegraf-1  | 2022-10-14T05:41:23Z D! [agent] Starting service inputs
network-monitoring-telegraf-1  | 2022-10-14T05:41:23Z D! [inputs.gnmi] Connection to gNMI device 10.181.34.72:6702 established
Attaching to network-monitoring-grafana-1, network-monitoring-influxdb-1, network-monitoring-telegraf-1
network-monitoring-telegraf-1  | 2022-10-14T05:52:22Z I! Using config file: /etc/telegraf/telegraf.conf
network-monitoring-telegraf-1  | 2022-10-14T05:52:22Z I! Starting Telegraf 1.24.2
network-monitoring-telegraf-1  | 2022-10-14T05:52:22Z I! Available plugins: 222 inputs, 9 aggregators, 26 processors, 20 parsers, 57 outputs
network-monitoring-telegraf-1  | 2022-10-14T05:52:22Z I! Loaded inputs: gnmi
network-monitoring-telegraf-1  | 2022-10-14T05:52:22Z I! Loaded aggregators: 
network-monitoring-telegraf-1  | 2022-10-14T05:52:22Z I! Loaded processors: 
network-monitoring-telegraf-1  | 2022-10-14T05:52:22Z I! Loaded outputs: file influxdb_v2
network-monitoring-telegraf-1  | 2022-10-14T05:52:22Z I! Tags enabled: host=Laptop
network-monitoring-telegraf-1  | 2022-10-14T05:52:22Z I! [agent] Config: Interval:10s, Quiet:false, Hostname:"Laptop", Flush Interval:10s
network-monitoring-telegraf-1  | 2022-10-14T05:52:22Z D! [agent] Initializing plugins
network-monitoring-telegraf-1  | 2022-10-14T05:52:22Z D! [agent] Connecting outputs
network-monitoring-telegraf-1  | 2022-10-14T05:52:22Z D! [agent] Attempting connection to [outputs.influxdb_v2]
network-monitoring-telegraf-1  | 2022-10-14T05:52:22Z D! [agent] Successfully connected to outputs.influxdb_v2
network-monitoring-telegraf-1  | 2022-10-14T05:52:22Z D! [agent] Attempting connection to [outputs.file]
network-monitoring-telegraf-1  | 2022-10-14T05:52:22Z D! [agent] Successfully connected to outputs.file
network-monitoring-telegraf-1  | 2022-10-14T05:52:22Z D! [agent] Starting service inputs
network-monitoring-telegraf-1  | 2022-10-14T05:52:22Z E! [telegraf] Error running agent: starting input inputs.gnmi: invalid string path /al:alarm-notification: invalid node name: "al:alarm-notification"
network-monitoring-telegraf-1 exited with code 1

ddichev

@MyaLongmire
Copy link
Contributor

MyaLongmire commented Oct 17, 2022

@ddichev-hub could you possibly test 1.23.1. It will help narrow down exactly what pr caused this.

As this pr is after 1.23.1 but these prs (pr1) (pr2) were before 1.23.1 but after your working version of 1.21.4.

Thanks in advance :)

@ddichev-hub
Copy link

ddichev-hub commented Oct 20, 2022

hi @MyaLongmire ,
With 1.23.1 i get the same error

network-monitoring-telegraf-1  | 2022-10-20T20:41:17Z I! Using config file: /etc/telegraf/telegraf.conf
network-monitoring-telegraf-1  | 2022-10-20T20:41:17Z I! : Plugin "inputs.gnmi" deprecated since version  and will be removed in : 
network-monitoring-telegraf-1  | 2022-10-20T20:41:17Z I! : Plugin "outputs.influxdb_v2" deprecated since version  and will be removed in : 
network-monitoring-telegraf-1  | 2022-10-20T20:41:17Z I! : Plugin "outputs.file" deprecated since version  and will be removed in : 
network-monitoring-telegraf-1  | 2022-10-20T20:41:17Z I! Starting Telegraf 1.23.1
network-monitoring-telegraf-1  | 2022-10-20T20:41:17Z I! Loaded inputs: gnmi
network-monitoring-telegraf-1  | 2022-10-20T20:41:17Z I! Loaded aggregators: 
network-monitoring-telegraf-1  | 2022-10-20T20:41:17Z I! Loaded processors: 
network-monitoring-telegraf-1  | 2022-10-20T20:41:17Z I! Loaded outputs: file influxdb_v2
network-monitoring-telegraf-1  | 2022-10-20T20:41:17Z I! Tags enabled: host=Laptop
network-monitoring-telegraf-1  | 2022-10-20T20:41:17Z I! [agent] Config: Interval:10s, Quiet:false, Hostname:"Laptop", Flush Interval:10s
network-monitoring-telegraf-1  | 2022-10-20T20:41:17Z D! [agent] Initializing plugins
network-monitoring-telegraf-1  | 2022-10-20T20:41:17Z D! [agent] Connecting outputs
network-monitoring-telegraf-1  | 2022-10-20T20:41:17Z D! [agent] Attempting connection to [outputs.influxdb_v2]
network-monitoring-telegraf-1  | 2022-10-20T20:41:17Z D! [agent] Successfully connected to outputs.influxdb_v2
network-monitoring-telegraf-1  | 2022-10-20T20:41:17Z D! [agent] Attempting connection to [outputs.file]
network-monitoring-telegraf-1  | 2022-10-20T20:41:17Z D! [agent] Successfully connected to outputs.file
network-monitoring-telegraf-1  | 2022-10-20T20:41:17Z D! [agent] Starting service inputs
network-monitoring-telegraf-1  | 2022-10-20T20:41:17Z E! [telegraf] Error running agent: starting input inputs.gnmi: invalid string path /al:alarm-notification: invalid node name: "al:alarm-notification"
network-monitoring-telegraf-1 exited with code 1

@MyaLongmire
Copy link
Contributor

@bewing you put up both pr#11008 and pr#11010.

Do you have any idea what could be causing this error?

@jgeorge1234
Copy link

jgeorge1234 commented Nov 14, 2022

where we will get the certificates and key . What all are mandatory things required to configure telemetry on ciena Sios 10. Could someone share the full telegraf configuration for the above scenario ?

@whizkidTRW
Copy link
Author

whizkidTRW commented Nov 18, 2022

Here's my production gNMI config for the Ciena's. We only have 5 devices in there currently.

[[inputs.gnmi]]
addresses = ["10.255.30.5:6702","10.255.30.6:6702","10.255.30.14:6702","10.255.30.33:6702","10.255.32.14:6702"]
username = "XXXXXXX"
password = "XXXXXXX"
encoding = "proto"
redial = "10s"
enable_tls = true
tls_ca = "/etc/telegraf/ca.cert.pem"
insecure_skip_verify = true
tls_cert = "/etc/telegraf/client.cert.pem"
tls_key = "/etc/telegraf/client.key.pem"
name_override = "saos10xgnmi"
updates_only = true

[[inputs.gnmi.subscription]]
name = "ifcounters"
origin = "openconfig-interfaces"
path = "/oc-if:interfaces/oc-if:interface[name=7]/oc-if:state/oc-if:counters"
subscription_mode = "sample"
sample_interval = "30s"

[[inputs.gnmi.subscription]]
name = "ifcounters"
origin = "openconfig-interfaces"
path = "/oc-if:interfaces/oc-if:interface[name=9]/oc-if:state/oc-if:counters"
subscription_mode = "sample"
sample_interval = "30s"

[[inputs.gnmi.subscription]]
name = "ifcounters"
origin = "openconfig-interfaces"
path = "/oc-if:interfaces/oc-if:interface[name=28]/oc-if:state/oc-if:counters"
subscription_mode = "sample"
sample_interval = "30s"

[[inputs.gnmi.subscription]]
name = "ifcounters"
origin = "openconfig-interfaces"
path = "/oc-if:interfaces/oc-if:interface[name=33]/oc-if:state/oc-if:counters"
subscription_mode = "sample"
sample_interval = "30s"

[[inputs.gnmi.subscription]]
name = "ifcounters"
origin = "openconfig-interfaces"
path = "/oc-if:interfaces/oc-if:interface[name=36]/oc-if:state/oc-if:counters"
subscription_mode = "sample"
sample_interval = "30s"

@jgeorge1234
Copy link

jgeorge1234 commented Nov 19, 2022 via email

@jgeorge1234
Copy link

jgeorge1234 commented Nov 19, 2022

telegraf error:

[inputs.gnmi] Error in plugin: failed to setup subscription: rpc error: code = Unavailable desc = connection error: desc = "transport: authentication handshake failed: remote error: tls: unknown certificate authority"

@jgeorge1234
Copy link

jgeorge1234 commented Nov 19, 2022

certificates generated from server on which telegraf is installed

ca-key.pem
ca-cert.pem -- ca cert installed on ciena device
client-req.pem
client-key.pem
client-cert.pem
client-cert.p12 --- certificate instaled on ciena device

@whizkidTRW
Copy link
Author

I don't understand. The error is happening when parsing the path, why do you need the certs?

Just turn off verification with insecure_skip_verify = true

@jgeorge1234
Copy link

jgeorge1234 commented Nov 19, 2022

I have configured same as per the configuration shared by you.

And I am getting below error

telegraf error:

[inputs.gnmi] Error in plugin: failed to setup subscription: rpc error: code = Unavailable desc = connection error: desc = "transport: authentication handshake failed: remote error: tls: unknown certificate authority"

looks like thats the tls certificate related.

i have followed the below steps to create certificates on the telegraf server

https://campus.barracuda.com/product/webapplicationfirewall/doc/12193120/creating-a-client-certificate/

Copied the CA certificate and client certificate to ciena device SIOS

Installed a CA certificate on ciena device
pkix-ca install ca-cert-name ca-cert remote-file-uri sftp:////ca-cert.pem

Client certificate

pkix-certificates install cert-name client-cert cert-passphrase secret cert-only false remote-file uri sftp://client-cert.p12

I am actually not sure about whether I am following it correctly.

I want to know how to get these certificates and key

tls_ca = "/etc/telegraf/ca.cert.pem"
tls_cert = "/etc/telegraf/client.cert.pem"
tls_key = "/etc/telegraf/client.key.pem"

@whizkidTRW
Copy link
Author

As far as I know, the certs came with Telegraf. We didn't have to put any certs on the Ciena devices.

@jgeorge1234
Copy link

can you confirm whether there is any config we need to do on ciena device ?

Also I cannot find these certificates.

tls_ca = "/etc/telegraf/ca.cert.pem"
tls_cert = "/etc/telegraf/client.cert.pem"
tls_key = "/etc/telegraf/client.key.pem"

@jrventer
Copy link

I see same issue with Huawei NE40 device. They do not implement origin so the path contains a ":".

[telegraf] Error running agent: starting input inputs.gnmi: invalid string path /huawei-ifm:ifm/interfaces/interface/mib-statistics: invalid node name: "huawei-ifm:ifm"

When I use gnmic to subscribe it works and I start receiving data.
gnmic -a 192.168.33.164 -u xxxx -p 'xxx' --skip-verify sub --mode=stream --stream-mode=sample -d --path "huawei-ifm:ifm/interfaces/interface"
Debug:
sending gNMI SubscribeRequest: subscribe='subscribe:{subscription:{path:{elem:{name:"huawei-ifm:ifm"} elem:{name:"interfaces"} elem:{name:"interface"}} mode:SAMPLE}}', mode='STREAM', encoding='JSON', to 192.168.33.164

When I tried to look at the code is seems like xpath.ToGNMIPath() function causes the error to be raised.

@srebhan
Copy link
Member

srebhan commented Nov 23, 2022

@whizkidTRW and @ddichev-hub can you please test PR #12272? CI will build a binary for you... Let me know if it fixes the issue!

@srebhan srebhan self-assigned this Nov 23, 2022
@srebhan srebhan added area/gnmi plugin/input 1. Request for new input plugins 2. Issues/PRs that are related to input plugins labels Nov 23, 2022
@jrventer
Copy link

@srebhan I have tested #12272 and that fixes the issue with path validation. So I do not see the invalid node name anymore. Also now as per the updated debug logs below it seem to be parsing the path correct. I however still have issue that the subscription is rejected by the router. When I compare with the gnmic request the main difference is the empty prefix that is included in the request. Could the prefix be removed if null?

2022-11-23T13:37:05Z D! [inputs.gnmi] Request subscribe:{prefix:{} subscription:{path:{elem:{name:"huawei-ifm:ifm"} elem:{name:"interfaces"} elem:{name:"interface"}} mode:SAMPLE}}
2022-11-23T13:37:05Z D! [inputs.gnmi] Connection to gNMI device 192.168.33.164:57400 established
2022-11-23T13:37:05Z D! [inputs.gnmi] Connection to gNMI device 192.168.33.164:57400 closed
2022-11-23T13:37:05Z E! [inputs.gnmi] Error in plugin: aborted gNMI subscription: rpc error: code = InvalidArgument desc = Argument 'elem' error.

@srebhan
Copy link
Member

srebhan commented Nov 23, 2022

@jrventer would you be so kind and open another issue for this!? Otherwise we get lost here. Feel free to ping me in the new issue.

@jrventer
Copy link

@jrventer would you be so kind and open another issue for this!? Otherwise we get lost here. Feel free to ping me in the new issue.

Ok will open new issue for the prefix issue.

@whizkidTRW
Copy link
Author

whizkidTRW commented Nov 23, 2022 via email

@jrventer
Copy link

jrventer commented Nov 23, 2022

Sven, I’m running Telegraf in a container and not sure how to test that PR. I’m more than happy to if someone can show me how. Thanks!
On Wed, Nov 23, 2022 at 8:08 AM Rudolf Venter @.> wrote: @jrventer https://github.com/jrventer would you be so kind and open another issue for this!? Otherwise we get lost here. Feel free to ping me in the new issue. Ok will open new issue for the prefix issue. — Reply to this email directly, view it on GitHub <#11903 (comment)>, or unsubscribe https://github.com/notifications/unsubscribe-auth/AHWZ2KULUYOLUMCK2QYB7ZLWJYQOPANCNFSM6AAAAAAQYBDNBE . You are receiving this because you were mentioned.Message ID: @.>
-- Todd Witten @.***

Hi @whizkidTRW You can try use this Dockerfile to compile a built version with the PR:
Dockerfile

# Build Stage
# Test Build Container for Telegraf
FROM golang:1.19-bullseye as git-telegraf
RUN DEBIAN_FRONTEND=noninteractive apt update && \
    DEBIAN_FRONTEND=noninteractive apt install -y --no-install-recommends \
    bash \
    git \
    unzip \
    wget \
    curl \
    apt-utils \
    && rm -rf /var/lib/apt/lists/*
WORKDIR /app
### clone the PR Fix
RUN git clone --branch gnmi_issue_11903 https://github.com/srebhan/telegraf.git

# Build telegraf binary
WORKDIR /app/telegraf
RUN make build

# Build the telegraf container with the PR fix
FROM telegraf:1.24

COPY --from=git-telegraf /app/telegraf/telegraf /usr/bin/telegraf

Run:
docker build -t srebhan/telegraf:gnmi_issue_11903 .

then use "srebhan/telegraf:gnmi_issue_11903" as the container name once you built it

@srebhan
Copy link
Member

srebhan commented Nov 23, 2022

@whizkidTRW you can also download the static binary for your arch (it's only one file without dependencies) and run it in your existing container e.g. by providing it in a mount-point or downloading it in the container.

@srebhan
Copy link
Member

srebhan commented Nov 23, 2022

@jrventer please note that you should use go 1.19 as this is the current requirement for building Telegraf. :-)

@whizkidTRW
Copy link
Author

whizkidTRW commented Nov 23, 2022 via email

@whizkidTRW
Copy link
Author

I was able to pull it down into my Telegraf container and test. It does parse the config and subscribe, but now the errors are different and it still doesn't return valid data. I'm attaching the output. I exported just the Ciena relevant sections of my main telegraf.conf file to telegraf-ciena.conf and tested with that using --test-wait 30

telegraf-ciena.log

@jrventer
Copy link

@jrventer please note that you should use go 1.19 as this is the current requirement for building Telegraf. :-)

Thanks I have updated was mostly building plugin that was based on telegraf 1.20 so adapted from that environment which was using older golang version.

@srebhan
Copy link
Member

srebhan commented Nov 23, 2022

@whizkidTRW thanks for testing! Can you please open another issue for this so others can find the issue if they experience similar problems... It would also be helpful to provide debug output (running telegraf with --debug) for the new issue.

@whizkidTRW
Copy link
Author

whizkidTRW commented Nov 23, 2022 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/gnmi bug unexpected problem or unintended behavior plugin/input 1. Request for new input plugins 2. Issues/PRs that are related to input plugins
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants