-
Notifications
You must be signed in to change notification settings - Fork 451
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[target-allocator] fix updating scrape configs #1620
[target-allocator] fix updating scrape configs #1620
Conversation
c4c0d82
to
45996a9
Compare
@@ -67,31 +67,33 @@ func NewDiscoverer(log logr.Logger, manager *discovery.Manager, hook discoveryHo | |||
|
|||
func (m *Discoverer) ApplyConfig(source allocatorWatcher.EventSource, cfg *config.Config) error { | |||
m.configsMap[source] = cfg | |||
newJobToScrapeConfig := make(map[string]*config.ScrapeConfig) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i'm worried about the memory implications of doing this. I think rather than doing it this way we should be listening on the events we receive from the kubernetes API for delete/update and use them to do the desired change here. Right now we just call it an event and then regrab everything, when it would probably be more efficient to just use what Kubernetes gives us.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That sounds like a much larger refactor that should have its own issue, which I'd rather not mix with this (fairly straightforward) bug fix. I can change this to pre-allocate based on the length of the existing map and run some kind of benchmark on this, but intuitively it doesn't seem like deal breaker - populating a map from (small) string to pointer should be fast even at 1000s of elements.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
a benchmark would definitely be helpful here – we have some existing benchmarks in the repo that you should be able to run iirc. cc @kristinapathak what're your thoughts on this? The worry i have is churning a map which contains some pretty beefy configs at unknown intervals.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Correct me if I'm misunderstanding your point, but we don't actually allocate anything for the configs themselves here, just pointers to them. At the point where this update happens, those have already been allocated. What this change adds is additional deallocations for configs we don't need anymore, which would've normally stayed in this map forever.
To be clear, I think your point that doing this may be excessive is valid, but this bugfix doesn't change this behaviour much.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this map is pretty small given its value objects are pointers, and the bug of a map growing unbounded seems worse than this fix.
27bdc2a
to
611e5e7
Compare
Only take scrape configs from the most recently applied config, and only save them if we've successfully updated.
611e5e7
to
5dc5e72
Compare
@jaronoff97 I added a simple benchmark for the affected function. Performance is slightly lower, but still less than 100ns per call on a 1000 element map. Before the change:
After the change:
|
thanks so much for running the benchmark. Going to let the tests the run, then review again :) |
* [target-allocator] fix updating scrape configs Only take scrape configs from the most recently applied config, and only save them if we've successfully updated. * [target-allocator] drop unnecessary job to scrape config map * [tatget-allocator] add discoverer benchmark
Only take scrape configs from the most recently applied config, and only save them if we've successfully updated. As a result, we now correctly get rid of targets after a ServiceMonitor or PodMonitor is deleted, fixing #1415.
I've also fixed an issue where we would overwrite the job to scrapeConfig mapping even if the actual update failed - the test for this was incorrect, as it didn't deepcopy the mapping, and basically asserted that two references to the same map were equal.
I've modified the existing discovery test to look at scrape config updates, and added another job to the test data to make sure it gets deleted after an update.
I've also done a proper manual e2e test on this, and it worked as expected.