Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add ifname processor plugin #7763

Merged
merged 17 commits into from
Jul 7, 2020
Merged

Add ifname processor plugin #7763

merged 17 commits into from
Jul 7, 2020

Conversation

reimda
Copy link
Contributor

@reimda reimda commented Jun 29, 2020

closes #7246

@danielnelson danielnelson added this to the 1.15.0 milestone Jun 30, 2020
plugins/processors/ifname/ifname.go Outdated Show resolved Hide resolved
Comment on lines +24 to +25
## Name of tag of the SNMP agent to request the interface name from
# agent = "agent"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should have a way to set this statically to an agent host:port, how about having two options: agent and agent_tag with the agent_tag overriding the agent?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I considered having an agent setting but didn't think it was worth the potential trouble. The interface index is a required tag and it's only valid with the agent it came from. If we add in a way for the index to potentially refer to some other agent, people are going to do that and silently get an interface name from the wrong agent. I thought requiring the agent to come in as a tag would decrease the possibility that an index is ever used on the wrong agent.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can't this be set to agent_host as default? Since this is the tag that is being returned from snmp plugin, It would reduce the need to set this value every time.

Comment on lines 142 to 143
d.Log.Infof("couldn't parse source tag as uint")
return
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we are returning anyway just return this as an error, this usually helps writing tests though I'm not sure how much in this case since we can't return the error all the way out. At least you can log them all in one spot though.

I think these should be logged as errors and remember to Capitalize log messages (errors are lowercase). (error strings)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know if there's currently a good way for streaming processors to report errors. The add func in StreamingProcessor lets you return an error, but that's too early if you're processing in a different thread. If processing is successful you call Accumulator.AddMetric, so it feels like you should also give errors to the accumulator.

I changed IfName.addTag to return an error and changed the caller to log the error in one place with a capitalized message.

Comment on lines 161 to 166
func (d *IfName) Apply(metrics ...telegraf.Metric) []telegraf.Metric {
for _, metric := range metrics {
d.addTag(metric)
}
return metrics
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe this is leftovers from earlier, but we should remove this function so that we don't implement the old Processor interface.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left it in because one of the tests was calling Apply. I changed the test to call addTag directly and removed Apply.

Comment on lines 228 to 232
gs, err := snmp.NewWrapper(d.ClientConfig, agent)
if err != nil {
//can't translate toml to gosnmp.GoSNMP
return nil, err
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we do this in Start so that we can report the configuration error? Seems like not knowing the agent might be a problem.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I split the initialization so it only parses the host tag here and it parses the config in Start. Now it will tell the user if any toml values are not the expected type or format. Any snmp errors will still happen here.

plugins/processors/ifname/ifname.go Outdated Show resolved Hide resolved

//try ifXtable and ifName first. if that fails, fall back to
//ifTable and ifDescr
var m *nameMap
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By convention maps aren't normally pointers, it isn't a problem but not needed since the actual type is small with the data stored on the heap.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried to have the LRU cache be reusable so I chose to use a pointer as the cache value type. I didn't want to assume the value was small enough to copy. Since nameMap is the cache value type it made sense to hold pointers elsewhere.

Since it looked strange to both of you, I unpointerized it everywhere and I won't worry as much about reusing the cache.

@@ -0,0 +1,106 @@
//+build localsnmp
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a switch to avoid running unless you have a local snmp server? The normal way we would do this is to add to each test a few lines like:

	if testing.Short() {
		t.Skip("Skipping integration test in short mode")
	}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a switch to turn off these tests in CI because they don't have a local snmp agent. I've been running 'go test -tags localsnmp'.

I've changed the tests to avoid doing this. TestIfName now mocks the snmp response. TestTable and TestXTable are now skipped for short tests.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Automated unit testing just reminded me that this needs snmptranslate and mibs to be installed locally so mocking the snmp response isn't enough to get TestIfName as a short test. It's now skipped too.

plugins/processors/ifname/ifname_test.go Outdated Show resolved Hide resolved
return nil, err
}

d.cache.Put(agent, m)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can these maps be updated if the interfaces change? Maybe something for later.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interfaces change so rarely I didn't think it was important to add a way to invalidate and reload the maps in the first version. I agree it would be a good future change.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Virtual interfaces get added and removed all the time. Even frequently for physical ones…

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @Hipska, do you have a suggestion on how to handle caching of interface names? We can't get the names from the agent every time they're needed because it's a relatively slow network transaction. We also don't want to cache the names forever because they change. Would it be sufficient to expire cached names after a set amount of time, configurable in the plugin settings? Maybe we should also invalidate the cache entry when a request comes in for an interface number that isn't in the cached list?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, a timeout based on the polling interval is what I would expect, that also handles the invalidate problem when a new interface is found. Otherwise indeed invalidate the cache if number of interfaces change and also after some timeout.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since ifname is a processor it doesn't have a polling interval. Metrics will come into ifname from various inputs (snmp, sflow, possibly others) each of which could have different intervals. The processor api also doesn't have a way to determine which input a given metric came from. These limitations mean it's more practical to add a new setting for cache time to live.

I'm working on changes to do this. I'll also have it invalidate the cache entry if the interface number isn't in the cached list.

internal/snmp/config.go Show resolved Hide resolved
func (gsw GosnmpWrapper) Get(oids []string) (*gosnmp.SnmpPacket, error) {
var err error
var pkt *gosnmp.SnmpPacket
for i := 0; i < 2; i++ {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these kind of retries can be really slow in aggregate where there are a lot of failures. Does it fail often?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't seen it fail but this has been in the snmp input for a long time and I expect users are used to how it behaves so I don't want to change it with this PR.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good. There's an open issue for one of the other plugins that retries like this, so I'd like to avoid creating more issues if we can help it, but I'm okay with solving this in a later PR.

sp.AuthoritativeEngineID = s.EngineID

sp.AuthoritativeEngineBoots = s.EngineBoots

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

probably don't need the spacing here.

internal/snmp/wrapper.go Show resolved Hide resolved
# community = "public"

## Number of retries to attempt.
# retries = 3
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we default to zero retries?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is another case of copying the snmp input. I thought it would be best to have the same behavior there and here but I'm open to changing it there's something wrong with it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See the last comment on #7300

plugins/processors/ifname/ifname.go Outdated Show resolved Hide resolved
plugins/processors/ifname/ifname.go Outdated Show resolved Hide resolved

err = gs.Connect()
if err != nil {
//can't connect
Copy link
Contributor

@ssoroka ssoroka Jun 30, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can wrap the error here for context, since connection errors may be vague

Comment on lines 242 to 251
var m *nameMap
m, err = buildMap(&gs, d.ifXTable, "ifName")
if err != nil {
var err2 error
m, err2 = buildMap(&gs, d.ifTable, "ifDescr")
if err2 != nil {
return nil, err2
}
}
return m, nil
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We never do see what err is, if that matters.

plugins/processors/ifname/ifname.go Outdated Show resolved Hide resolved
@reimda reimda requested a review from ssoroka July 7, 2020 04:39
Copy link
Contributor

@ssoroka ssoroka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 Don't forget to test for races :)

@reimda reimda merged commit 6f9c623 into master Jul 7, 2020
@reimda reimda deleted the ifname branch July 7, 2020 21:38
rhajek pushed a commit to bonitoo-io/telegraf that referenced this pull request Jul 13, 2020
# dest = "ifName"

## Name of tag of the SNMP agent to request the interface name from
# agent = "agent"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can't this be default set to agent_host as this is the tag that is being returned from snmp plugin? It would reduce the need to set this value every time.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we should change the default now that this has been released. Changing the default would silently break anyone who is using the default.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can't figure out why anyone would have "agent" as default? There is no plugin that provides this tag by default.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You have a valid point that it's unlikely anyone is using the default. They would have to rename the tag from the snmp input or the sflow input (which uses "agent_address").

@Hipska
Copy link
Contributor

Hipska commented Aug 12, 2020

While trying out this plugin, I came to the conclusion of maybe allow the community string also coming from a tag, or some other way to retrieve all the snmp settings?

@ssoroka
Copy link
Contributor

ssoroka commented Aug 13, 2020

cc @reimda

@reimda
Copy link
Contributor Author

reimda commented Aug 18, 2020

While trying out this plugin, I came to the conclusion of maybe allow the community string also coming from a tag, or some other way to retrieve all the snmp settings?

I can see this being useful but I'm not sure how it would work. Providing the settings in tags could work for v1 and v2c because they only need the snmp version and community string, so two tags. Snmp v3 would need version and up to seven other settings. Providing 8 tags just for snmp connection info doesn't sound pleasant. Combining multiple settings into one tag also doesn't sound good.

We could have multiple sets of connection settings in the config file and then add a tag to select which settings to use, but that isn't as flexible as passing the settings in tags.

If you can think of a nice way to do this, I'd be happy to consider a PR.

@Hipska
Copy link
Contributor

Hipska commented Aug 19, 2020

I don't know of any way how to do this. Should I request it via an issue?

I also noticed that when using this plugin, after a while all inputs stop working (even like cpu and disk), will create an issue for that or should we look into it on Slack first?

@ssoroka
Copy link
Contributor

ssoroka commented Aug 19, 2020

a new issue (or issues) would probably be best.

@reimda
Copy link
Contributor Author

reimda commented Aug 19, 2020

I think we've identified three issues: 1) a feature request for providing snmp connection info outside of the conf file; 2) a bug that the default tag doesn't match the snmp or sflow inputs; 3) a bug that all inputs stop working after a while.

@Hipska would you open issues for these? For the third one, let's start with opening an issue and then use slack if needed. Please include your telegraf.conf, the telegraf output, and the conditions that cause it, like how long it takes to happen.

@Hipska
Copy link
Contributor

Hipska commented Aug 20, 2020

I'm still figuring out on the last one what and how. As I can't get this plugin run stable, I'm still figuring out how. Sometimes it doesn't do anything and without any error.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add processor to lookup SNMP ifName from ifIndex
4 participants