-
Notifications
You must be signed in to change notification settings - Fork 230
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
in multi-worker, call reset will cause other worker's lookup different with key_index #168
Comments
Thank you for debugging this issue, and for proposing a fix. This is a good example of how various performance optimizations added to this library over time (e.g. the key index) make cross-worker coordination tricky, especially when metrics are deleted. I believe your proposed fix would work, however adding a key to the key index on the hot path will result in There might be a few other ways we could approach this:
I'd love to hear feedback on these options from users of this library, as well as other ideas we could explore here. Option 1 seems pretty appealing. While it will result in temporary performance degradation every time a key is deleted, it would not change the stable-state performance or restrict available functionality. Perhaps a simple approach would be to add a timer to each worker that will reset the key lookup table every time the It would also be helpful to add a test for this. Not sure if there's a way to do this purely in Lua, or if we'll need to expand the integration test that runs a real Nginx instance in a container. |
I'm sorry that I only consider how to fix the problem and not realize that there is KeyIndex:sync while adding or removing. In Option 1, as you memtioned, seems pretty appealing, but it needs include likes timer or event or other component, which may increase the complexity of the module, and it also have problem in some extreme condition.
maybe add a function to check the metric is existed in shared_memory is a good way, but I don't sure.
|
I've expanded existing integration test to support running multiple tests in parallel, and added a metric reset test that can be used to reproduce this reliably: https://github.com/knyar/nginx-lua-prometheus/blob/test_reset/integration/test_reset.go |
also I found that reset will cause performance issues by the worker number and metric increase: |
maybe we can provide reset local likes kong: https://github.com/Kong/kong/blob/master/kong/plugins/prometheus/prometheus.lua#L695 |
Deleting time series (or resetting a metric completely) is quite expensive, because a bunch of worker-local state needs to be flushed or re-synced. Doing it on every collection is wild. If you are exporting gauges that are only set at collection time, you probably don't need a metric library at all, you can just A better solution here would be to implement support for callback metrics - I've created #170 to track that separately. Let's keep this issue focused on the specific bug with metric resets. |
as title.
Question
Step:
Expected Result:
metric_data return metric "A"
Actual Result:
nothing got.
Reason
Solution
mayb it can add full_name to key_index in lookup_or_create(self, label_values) after line 414. likes below
The text was updated successfully, but these errors were encountered: