Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unable to get SNMP plugin work with version 3 authentication #3655

Closed
ashuw018 opened this issue Jan 10, 2018 · 41 comments · Fixed by #3973
Closed

Unable to get SNMP plugin work with version 3 authentication #3655

ashuw018 opened this issue Jan 10, 2018 · 41 comments · Fixed by #3973
Labels
area/snmp bug unexpected problem or unintended behavior
Milestone

Comments

@ashuw018
Copy link

Directions

I am trying to get telegraf working with SNMP plugin, everything works fine if the device ip in agent section is local network IP, but it does not work if i try same with any external networks device IP.

**Everything is working with snmpwalk, snmptable with local and external network device, but through telegraf it doesn't work for external device.

Relevant telegraf.conf:

[[inputs.snmp]]
agents = [ "x.x.x.x:161" ]
version = 3

sec_name = "myuser"
auth_protocol = "SHA"
auth_password = "my_Auth"
sec_level = "authPriv"
context_name = ""
priv_protocol = "AES"
priv_password = "my_PWD"

name = "system"
[[inputs.snmp.field]]
name = "hostname"
oid = "RFC1213-MIB::sysName.0"
is_tag = true

[[inputs.snmp.table]]
name = "snmp"
inherit_tags = [ "hostname" ]
oid = "IF-MIB::ifXTable"

[[inputs.snmp.table.field]]
  name = "ifName"
  oid = "IF-MIB::ifName"
  is_tag = true

System info:

Telegraf version = v1.5.0
Net snmp
Windows 10 and windows 2008 R2

Steps to reproduce:

  1. ...
  2. ...

Expected behavior:

It should work as it is working with local network devices

Actual behavior:

It is working with local device but not with external device, also it should give some exact logs that exactly whats going wrong.

Additional info:

Errors which i am getting

2018-01-09T09:19:24Z E! Error in plugin [inputs.snmp]: agent x.x.x.x:161: performing get on field hostname: Request timeout (after 3 retries)
2018-01-09T09:19:34Z E! Error in plugin [inputs.snmp]: agent x.x.x.x:161: gathering table snmp: performing bulk walk for field ifName: Request timeout (after 3 retries)

@danielnelson
Copy link
Contributor

What do you mean by external device, are you referring to another host on the same subnet or to a device on a routed network? There is a known issue where the response must be received from the same target host, I wonder if you are running into this: #3320

The logs seem to show what is going wrong, no response was received and request timeout, so I don't think they can be improved.

@danielnelson danielnelson added bug unexpected problem or unintended behavior area/snmp labels Jan 10, 2018
@ashuw018
Copy link
Author

Hi Danielnelson,

By external device, i mean the device is not is our premises, it resides outside of our network. net-snmp and other tools like snmp-exporter + prometheus works fine with it. But through telegraf it doesn't work.

Might be i am running into #3320, not sure, but yes it works through telegraf as well if i try to perform same things on any local/on premise device.

Thanks,

@danielnelson
Copy link
Contributor

That is interesting that snmp-exporter is working, since we are both using the same gosnmp library. @phemmer do you have any idea what could cause this?

@phemmer
Copy link
Contributor

phemmer commented Jan 12, 2018

Without seeing packet captures, not really. The only immediate thought is the default 5s timeout.

@danielnelson
Copy link
Contributor

@ashuw018 Do you think you could collect a packet dump? There is an example of how to do it on unix systems here https://github.com/soniah/gosnmp#packet-captures. It would be helpful to collect it once with snmpget, and once using telegraf --input-filter snmp --test.

@phemmer Do you think it would be useful if I wrote a clone of snmpget/snmpwalk using the gosnmp library? We could add debugging code as needed and then users could run this when they have a problem.

@ashuw018
Copy link
Author

Hi Daniel and Phemmer,

Thanks for taking efforts. Here is another update from my end.
I am now more surprised as now i have installed telegraf and netsnmp on one of the external server which is internal to the external snmp device regarding which this issue has been created. Unfortunately it is not working from there as well. And the issue seems to be SNMPv3 specific as from there it ia working fine with SNMPv2. Don't know what i am doing wrong with SNMPv3 as i get data over SNMPv3 by using netsnmp snmpwalk snmpget snmptable, with snmp exporter as well but not with telegraf.

This might change our investigation direction with this issue.. to me as of now snmpv2 is working and snmpv3 is not.

Thanks,

@danielnelson
Copy link
Contributor

This might be gosnmp/gosnmp#95

@rrasale
Copy link

rrasale commented Jan 17, 2018

Can you run a snmpwalk to your destination device from your telegraf server ? Do you see any output ?

@ashuw018
Copy link
Author

@rrasale Yes i do see valid output from snmpwalk over version 3. in fact using snmp exporter and prometheus it works perfect over snmpv3.

@glazzari
Copy link

Hi, I'm getting the same error. I can snmpwalk to the production server from my machine, but telegraf is returning request timeout. For example:

$ snmpwalk -mALL -On -v2c -cintcacti <target> 1.3.6.1.4.1.35750.10.2.1.1.1

.1.3.6.1.4.1.35750.10.2.1.1.1.0 = Counter64: 768886
$ sudo tcpdump -s 0 -i eno1 host [target] and port 161

17:35:43.398564 IP [src].51646 > [target].snmp:  C="intcacti" GetNextRequest(33)  E:35750.10.2.1.1.1
17:35:43.425641 IP [target].snmp > [src].51646:  C="intcacti" GetResponse(37)  E:35750.10.2.1.1.1.0=769168
17:35:43.425777 IP [src].51646 > [target].snmp:  C="intcacti" GetNextRequest(34)  E:35750.10.2.1.1.1.0
17:35:43.451014 IP [target].snmp > [src].51646:  C="intcacti" GetResponse(38)  E:35750.10.2.1.1.2.0=240797616

I'm running the official docker image of telegraf, which exposes the following ports:

8125/udp 8092/udp 8094

SNMP plugin is configured as:

[[inputs.snmp]]
  agents = [ "<target>" ]
  version = 2
  community = "cintcacti"
  name = "system"

  [[inputs.snmp.field]]
    name = "total"
    oid = "1.3.6.1.4.1.35750.10.2.1.1.1"
telegraf    | 2018-01-17T19:46:20Z E! Error in plugin [inputs.snmp]: took longer to collect than collection interval (10s)
telegraf    | 2018-01-17T19:46:20Z E! Error in plugin [inputs.snmp]: agent [target]: performing get on field total: Request timeout (after 3 retries)
$ telegraf --input-filter snmp --test

2018/01/17 19:46:34 I! Using config file: /etc/telegraf/telegraf.conf
* Plugin: inputs.snmp, Collection 1
2018-01-17T19:46:44Z E! Error in plugin [inputs.snmp]: agent [target]: performing get on field total: Request timeout (after 3 retries)
$ sudo tcpdump -s 0 -i eno1 host [target] and port 161

17:41:00.012307 IP [src].34905 > [target].snmp:  C="cintcacti" GetRequest(33)  E:35750.10.2.1.1.1
17:41:01.262567 IP [src].34905 > [target].snmp:  C="cintcacti" GetRequest(33)  E:35750.10.2.1.1.1
17:41:02.512643 IP [src].34905 > [target].snmp:  C="cintcacti" GetRequest(33)  E:35750.10.2.1.1.1
17:41:03.762995 IP [src].34905 > [target].snmp:  C="cintcacti" GetRequest(33)  E:35750.10.2.1.1.1

@danielnelson
Copy link
Contributor

@glazzari Can you test with this field:

[[inputs.snmp.field]]
  name = "hostname"
  oid = ".1.3.6.1.2.1.1.5.0"

and save the capture with:

sudo tcpdump -s 0 -i eno1 -w test.pcap host [target] and port 161

Capturing both commands would be helpful:

snmpget -v2c -c public 10.79.40.63:161 .1.3.6.1.2.1.1.5.0
telegraf --input-filter snmp --test

@glazzari
Copy link

glazzari commented Jan 18, 2018

@danielnelson same results. I can snmpwalk correctly, but the plugins fails with request timeout.

$ snmpwalk -On -v2c -cintcacti [target] .1.3.6.1.2.1.1.5.0

.1.3.6.1.2.1.1.5.0 = STRING: [returns sysName]
$ sudo tcpdump -s 0 -i eno1 host [target] and port 161             

09:44:05.601227 IP [src].59290 > [target].snmp:  C="intcacti" GetNextRequest(28)  system.sysName.0
09:44:05.629768 IP [target].snmp > [src].59290:  C="intcacti" GetResponse(52)  system.sysLocation.0="[returns sysLocation]"
09:44:05.629909 IP [src].59290 > [target].snmp:  C="intcacti" GetRequest(28)  system.sysName.0
09:44:05.659471 IP [target].snmp > [src].59290:  C="intcacti" GetResponse(35)  system.sysName.0="[returns sysName]"
$ telegraf --input-filter snmp --test
2018/01/18 11:51:47 I! Using config file: /etc/telegraf/telegraf.conf
* Plugin: inputs.snmp, Collection 1
2018-01-18T11:51:57Z E! Error in plugin [inputs.snmp]: agent [target]: performing get on field hostname: Request timeout (after 3 retries)
$ sudo tcpdump -s 0 -i eno1 host [target] and port 161
10:12:30.008255 IP [src].41406 > [target].snmp:  C="cintcacti" GetRequest(28)  system.sysName.0
10:12:31.258367 IP [src].41406 > [target].snmp:  C="cintcacti" GetRequest(28)  system.sysName.0
10:12:32.508635 IP [src].41406 > [target].snmp:  C="cintcacti" GetRequest(28)  system.sysName.0
10:12:33.758892 IP [src].41406 > [target].snmp:  C="cintcacti" GetRequest(28)  system.sysName.0

@glazzari
Copy link

Not sure if it's related to #3320.

@danielnelson
Copy link
Contributor

If the response source matches the request target then I don't think it is #3320

I notice that tcpdump reports the community name differently:

- C="intcacti"
+ C="cintcacti"

@glazzari
Copy link

glazzari commented Jan 18, 2018

Good catch! I've just noticed the community name in telegraf.conf was including the 'c' as part of its name. This is probably a copy/paste error, because I was running snmpwalk with "-cintcacti" instead of "-c intcacti". Note the space after the "-c".

[[inputs.snmp]]
  agents = [ "target:161" ]
  version = 2
  community = "intcacti"

@danielnelson
Copy link
Contributor

Did fixing that help?

@glazzari
Copy link

Yes! Thank you very much for your help.

@danielnelson
Copy link
Contributor

@ashuw018 Could you upload your working snmp_exporter configuration for comparison against the Telegraf plugin?

@ashuw018
Copy link
Author

@danielnelson Requested is attached.

prometheus.txt
snmpexporter.txt

@danielnelson
Copy link
Contributor

@ashuw018 Is this the agent you are unable to connect to? I notice the priv_protocl is DES while above in Telegraf it was set to AES.

if_mib:

  version: 3
  auth:
    username: user
    password: password
    #
    auth_protocol: SHA
    security_level: authPriv
    priv_protocol: DES
    priv_password: Passwordpriv

@ashuw018
Copy link
Author

Hi Daniel

Thats just a typo while filling dummy data for posting.

I am using AES in both. And tried with DES also.

@danielnelson
Copy link
Contributor

We also have the report about complex passwords causing problems, would it be possible to test, temporarily of course, if it helps to use a weak password with only ascii letters?

@ashuw018
Copy link
Author

ashuw018 commented Feb 2, 2018

Hi Daniel,

Sorry for the delayed response. Actually we are going to testing phase with influxdb so the password were already kept simple. No character variations have been used.

@danielnelson
Copy link
Contributor

@phemmer Do you know if it is possible to collect packet captures of version 3? I assume we would need sec_level = NoAuthNoPriv?

@phemmer
Copy link
Contributor

phemmer commented Feb 2, 2018

oh, hrm. It's been so long since I've worked with anything using v3 I'm not sure what the protocol looks like. But to be safe yes, NoAuthNoPriv should ensure that you can view all request/response fields.

@danielnelson
Copy link
Contributor

@ashuw018 Would it be possible to use NoAuthNoPriv temporarily? If you could capture the packets like this for both Telegraf and snmpget:

sudo tcpdump -s 0 -i [interface] -w test.pcap host [target] and port 161

BTW I looked through the snmp_exporter code for usage differences, but didn't see anything that stood out.

@ashuw018
Copy link
Author

ashuw018 commented Feb 3, 2018

@danielnelson I will reach to our NetSec support this Monday and will check if they can make such changes to get the required. If they allow me. I will definitely do that.

Also is there any net snmp compatible command for above?. As i do not have any linux box out there. All are windows.

Thanks.

@danielnelson
Copy link
Contributor

It should be possible to use this windows version: https://www.winpcap.org/windump/, though I have not tested it. You run the tcpdump command in one shell while in another you run the net-snmp command, then you stop the tcpdump command and it should print that it captured some packets.

@ashuw018
Copy link
Author

ashuw018 commented Feb 6, 2018

@danielnelson I approached to our support staff with this request but they denied to make config with NoAuthNoPriv as it is against there data center norms they said to me. Unfortunately in my local office i do not have any device which can snmp v3 so that i can get this tested here.

@danielnelson
Copy link
Contributor

Understandable, I will try to setup a v3 device for testing but it might take me awhile.

@danielnelson danielnelson changed the title Unable to get SNMP plugin work with external device(Public IP) Unable to get SNMP plugin work with version 3 authentication Feb 6, 2018
@cpajr
Copy link

cpajr commented Apr 4, 2018

Any update on this issue? I'm encountering the same problem.

@danielnelson
Copy link
Contributor

@cpajr I did do a sanity test using snmp v3 to a net-snmp server and didn't have any trouble. Are you also using Windows and what kind of device are you querying?

@cpajr
Copy link

cpajr commented Apr 4, 2018

@danielnelson I'm running on Centos 7, Telegraf v1.5.3. I'm trying to query a Cisco ASA via SNMPv3. Oddly enough, I'm able poll other Cisco devices via SNMPv3 but I'm only encountering issues with the ASA. I can successfully perform an snmpwalk without issue on the ASA.

@danielnelson
Copy link
Contributor

@cpajr Can you create a packet capture doing a snmpget and an equivalent query with Telegraf (with a single top level field). I think even with v3 security enabled it may be of some use, here is an example tcpdump:

sudo tcpdump -s 0 -i eth0 -w telegraf.pcap host 203.50.251.17 and port 161

Then upload these files along with your Telegraf snmp configuration (don't forget to remove your passwords or use a testing password).

@ashuw018
Copy link
Author

ashuw018 commented Apr 5, 2018

Hi Just FYI.
Mine device was also cisco ASA 5515.

@cpajr
Copy link

cpajr commented Apr 5, 2018

@danielnelson I'll work to get this. It also appears that we have a commonality on the trouble device: Cisco ASA.

@danielnelson
Copy link
Contributor

I just found this gosnmp/gosnmp#108, I'll update our gosnmp dependency if you both can test it out.

@cpajr
Copy link

cpajr commented Apr 5, 2018

Let me know what needs to be updated and I will test it.

@danielnelson danielnelson mentioned this issue Apr 5, 2018
3 tasks
@danielnelson
Copy link
Contributor

Here are some builds with the updated gosnmp for testing:

@cpajr
Copy link

cpajr commented Apr 5, 2018

That did the trick. Thank you.

@danielnelson danielnelson added this to the 1.6.0 milestone Apr 5, 2018
@danielnelson
Copy link
Contributor

I'll include this change in 1.6.0-rc3

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/snmp bug unexpected problem or unintended behavior
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants