-
Notifications
You must be signed in to change notification settings - Fork 5.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add ifname processor plugin #7763
Conversation
…. Update snmp input to use %w wrapped errors. Add ifname processor plugin.
… used or when interface names can't be retrieved.
## Name of tag of the SNMP agent to request the interface name from | ||
# agent = "agent" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should have a way to set this statically to an agent host:port, how about having two options: agent
and agent_tag
with the agent_tag
overriding the agent
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I considered having an agent setting but didn't think it was worth the potential trouble. The interface index is a required tag and it's only valid with the agent it came from. If we add in a way for the index to potentially refer to some other agent, people are going to do that and silently get an interface name from the wrong agent. I thought requiring the agent to come in as a tag would decrease the possibility that an index is ever used on the wrong agent.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can't this be set to agent_host
as default? Since this is the tag that is being returned from snmp plugin, It would reduce the need to set this value every time.
plugins/processors/ifname/ifname.go
Outdated
d.Log.Infof("couldn't parse source tag as uint") | ||
return |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since we are returning anyway just return this as an error, this usually helps writing tests though I'm not sure how much in this case since we can't return the error all the way out. At least you can log them all in one spot though.
I think these should be logged as errors and remember to Capitalize log messages (errors are lowercase). (error strings)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know if there's currently a good way for streaming processors to report errors. The add func in StreamingProcessor lets you return an error, but that's too early if you're processing in a different thread. If processing is successful you call Accumulator.AddMetric, so it feels like you should also give errors to the accumulator.
I changed IfName.addTag to return an error and changed the caller to log the error in one place with a capitalized message.
plugins/processors/ifname/ifname.go
Outdated
func (d *IfName) Apply(metrics ...telegraf.Metric) []telegraf.Metric { | ||
for _, metric := range metrics { | ||
d.addTag(metric) | ||
} | ||
return metrics | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe this is leftovers from earlier, but we should remove this function so that we don't implement the old Processor interface.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I left it in because one of the tests was calling Apply. I changed the test to call addTag directly and removed Apply.
plugins/processors/ifname/ifname.go
Outdated
gs, err := snmp.NewWrapper(d.ClientConfig, agent) | ||
if err != nil { | ||
//can't translate toml to gosnmp.GoSNMP | ||
return nil, err | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we do this in Start so that we can report the configuration error? Seems like not knowing the agent might be a problem.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I split the initialization so it only parses the host tag here and it parses the config in Start. Now it will tell the user if any toml values are not the expected type or format. Any snmp errors will still happen here.
plugins/processors/ifname/ifname.go
Outdated
|
||
//try ifXtable and ifName first. if that fails, fall back to | ||
//ifTable and ifDescr | ||
var m *nameMap |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
By convention maps aren't normally pointers, it isn't a problem but not needed since the actual type is small with the data stored on the heap.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried to have the LRU cache be reusable so I chose to use a pointer as the cache value type. I didn't want to assume the value was small enough to copy. Since nameMap is the cache value type it made sense to hold pointers elsewhere.
Since it looked strange to both of you, I unpointerized it everywhere and I won't worry as much about reusing the cache.
@@ -0,0 +1,106 @@ | |||
//+build localsnmp |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a switch to avoid running unless you have a local snmp server? The normal way we would do this is to add to each test a few lines like:
if testing.Short() {
t.Skip("Skipping integration test in short mode")
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's a switch to turn off these tests in CI because they don't have a local snmp agent. I've been running 'go test -tags localsnmp'.
I've changed the tests to avoid doing this. TestIfName now mocks the snmp response. TestTable and TestXTable are now skipped for short tests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Automated unit testing just reminded me that this needs snmptranslate and mibs to be installed locally so mocking the snmp response isn't enough to get TestIfName as a short test. It's now skipped too.
return nil, err | ||
} | ||
|
||
d.cache.Put(agent, m) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can these maps be updated if the interfaces change? Maybe something for later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interfaces change so rarely I didn't think it was important to add a way to invalidate and reload the maps in the first version. I agree it would be a good future change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Virtual interfaces get added and removed all the time. Even frequently for physical ones…
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @Hipska, do you have a suggestion on how to handle caching of interface names? We can't get the names from the agent every time they're needed because it's a relatively slow network transaction. We also don't want to cache the names forever because they change. Would it be sufficient to expire cached names after a set amount of time, configurable in the plugin settings? Maybe we should also invalidate the cache entry when a request comes in for an interface number that isn't in the cached list?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, a timeout based on the polling interval is what I would expect, that also handles the invalidate problem when a new interface is found. Otherwise indeed invalidate the cache if number of interfaces change and also after some timeout.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since ifname is a processor it doesn't have a polling interval. Metrics will come into ifname from various inputs (snmp, sflow, possibly others) each of which could have different intervals. The processor api also doesn't have a way to determine which input a given metric came from. These limitations mean it's more practical to add a new setting for cache time to live.
I'm working on changes to do this. I'll also have it invalidate the cache entry if the interface number isn't in the cached list.
func (gsw GosnmpWrapper) Get(oids []string) (*gosnmp.SnmpPacket, error) { | ||
var err error | ||
var pkt *gosnmp.SnmpPacket | ||
for i := 0; i < 2; i++ { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
these kind of retries can be really slow in aggregate where there are a lot of failures. Does it fail often?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I haven't seen it fail but this has been in the snmp input for a long time and I expect users are used to how it behaves so I don't want to change it with this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good. There's an open issue for one of the other plugins that retries like this, so I'd like to avoid creating more issues if we can help it, but I'm okay with solving this in a later PR.
sp.AuthoritativeEngineID = s.EngineID | ||
|
||
sp.AuthoritativeEngineBoots = s.EngineBoots | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
probably don't need the spacing here.
# community = "public" | ||
|
||
## Number of retries to attempt. | ||
# retries = 3 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we default to zero retries?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is another case of copying the snmp input. I thought it would be best to have the same behavior there and here but I'm open to changing it there's something wrong with it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See the last comment on #7300
plugins/processors/ifname/ifname.go
Outdated
|
||
err = gs.Connect() | ||
if err != nil { | ||
//can't connect |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can wrap the error here for context, since connection errors may be vague
plugins/processors/ifname/ifname.go
Outdated
var m *nameMap | ||
m, err = buildMap(&gs, d.ifXTable, "ifName") | ||
if err != nil { | ||
var err2 error | ||
m, err2 = buildMap(&gs, d.ifTable, "ifDescr") | ||
if err2 != nil { | ||
return nil, err2 | ||
} | ||
} | ||
return m, nil |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We never do see what err
is, if that matters.
…ip integration tests during short tests.
…r will know as soon as possible if there are toml config parse errors. (Host tag parsing stays in Add)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 Don't forget to test for races :)
# dest = "ifName" | ||
|
||
## Name of tag of the SNMP agent to request the interface name from | ||
# agent = "agent" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can't this be default set to agent_host
as this is the tag that is being returned from snmp
plugin? It would reduce the need to set this value every time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we should change the default now that this has been released. Changing the default would silently break anyone who is using the default.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can't figure out why anyone would have "agent" as default? There is no plugin that provides this tag by default.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You have a valid point that it's unlikely anyone is using the default. They would have to rename the tag from the snmp input or the sflow input (which uses "agent_address").
While trying out this plugin, I came to the conclusion of maybe allow the community string also coming from a tag, or some other way to retrieve all the snmp settings? |
cc @reimda |
I can see this being useful but I'm not sure how it would work. Providing the settings in tags could work for v1 and v2c because they only need the snmp version and community string, so two tags. Snmp v3 would need version and up to seven other settings. Providing 8 tags just for snmp connection info doesn't sound pleasant. Combining multiple settings into one tag also doesn't sound good. We could have multiple sets of connection settings in the config file and then add a tag to select which settings to use, but that isn't as flexible as passing the settings in tags. If you can think of a nice way to do this, I'd be happy to consider a PR. |
I don't know of any way how to do this. Should I request it via an issue? I also noticed that when using this plugin, after a while all inputs stop working (even like cpu and disk), will create an issue for that or should we look into it on Slack first? |
a new issue (or issues) would probably be best. |
I think we've identified three issues: 1) a feature request for providing snmp connection info outside of the conf file; 2) a bug that the default tag doesn't match the snmp or sflow inputs; 3) a bug that all inputs stop working after a while. @Hipska would you open issues for these? For the third one, let's start with opening an issue and then use slack if needed. Please include your telegraf.conf, the telegraf output, and the conditions that cause it, like how long it takes to happen. |
I'm still figuring out on the last one what and how. As I can't get this plugin run stable, I'm still figuring out how. Sometimes it doesn't do anything and without any error. |
closes #7246