-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add option to disable per table metrics collection #5649
Add option to disable per table metrics collection #5649
Conversation
@deepthi if you have a chance could you take a look at this? Also, do you have suggestions on how to test a change like this? |
I'd like to just build/run something based on the in-tree code to at least verify this. Does the test suite have something that can do this? |
There is a unit test for these metrics at https://github.com/vitessio/vitess/blob/master/go/vt/vttablet/tabletserver/schema/engine_test.go#L293 You should be able to copy/extend that to test with the flag set to false. |
Thanks @deepthi! I looked at that and I'm not sure it tests exactly this change, but it's definitely a good starting point. I'll take another look and see if there's a way to make this more testable. I think the issue is that in the way I implemented this the metrics are still being collected, just not exported to the prometheus metrics page. If I instead make the option not collect those metrics at all, that may be strictly better and make it more testable at the same time. Even if I do that though, I would want to do the full test to make sure it works all together and that the page shows what I expect. Is there some kind of end to end integration test focused on the vttablet binary that I can hook into for that? Otherwise, I'll probably just try to set up a lot of this manually, or try to get the docker compose example working with a local binary. Just wondering if something like this has already been done. |
Hmm, it looks like to actually disable stats collection I'll either need to change this query or not run stats collection at all. I'm going to keep looking to see if those stats are used in other places, but my first idea is that disabling running the query for per-table stats at all is the correct way to do this. It still doesn't fix the problem of the end to end tests, but I can figure that out if there isn't an existing integration test tool that already does this. |
Actually, I'm realizing this is actually more than just an implementation question, but also a desired result question.
I'm leaning towards 2, and that's what I'm working on now, because some of these stats (like "table rows") don't seem very useful without the corresponding table name. |
If the user is creating a large number of tables, this could cause a spike in memory. This change allows the user to turn off any metrics that scale with the number of tables. Signed-off-by: Shaun Verch <[email protected]>
bada152
to
f741175
Compare
Actually, I followed this to the end and I don't think it's a good idea. This is not just touching code related to metrics, it's also loading the table schema, so in practice the code was a bit odd. It loaded everything and just disabled the one line that actually adds them to the table object, not really saving anything. I think I'll stick with this approach that's here now then. @deepthi Ultimately where I am now is that I'd like to test the page that's exposed on the |
I don't think such a test exists, you might have to test manually. You should still create a unit test similar to what we already have. |
I'm trying to figure out what kind of test to add, but the only thing this PR changes is what is passed to this function: https://github.com/vitessio/vitess/blob/master/go/stats/counters.go#L401 That seems to eventually call into an extension to https://golang.org/pkg/expvar/, which is normally used to export variables on Specifically, I think this function is what tells the library to output prometheus compatible output. There are tests for that exporter library, but those are for functionality of the exporter itself: https://github.com/vitessio/vitess/blob/master/go/stats/export_test.go Given that this is a wrapper around expvar, I think the code that is actually calling these handlers is in that library. Sure enough, it's called by the http handler. If this was a function that I could call to get the output it would be obvious how to test this, and I could do something similar to what's in the existing unit test (where the test calls this engine object in various ways and tests the result). In this case I need to actually find some way to call the handler through the normal channels since there's not an obvious function to call. This looks promising actually https://blog.questionable.services/article/testing-http-handlers-go/. Maybe that can give me what I want actually. I might be able to directly make a request to the |
Currently, this is returning something unexpected in my local test. It should return prometheus metrics but instead it returns a JSON object. However this is the outline for how I would test this. Signed-off-by: Shaun Verch <[email protected]>
I pushed a commit with the skeleton of the test, but the result is unexpected. I would expect the |
I see the same tests of the prometheus backend but with different results: https://github.com/vitessio/vitess/blob/master/go/stats/prometheusbackend/prometheusbackend_test.go#L344 I'm going to try import that library into the test and see what I get. |
I tried copying this initialization pattern from the prometheus backend tests. It still didn't return something that looked like prometheus metrics. I was able to get something that returned prometheus metrics by copying this line but it didn't have the per table metrics. I suspect that prometheus just hasn't had a chance to run and log any metrics for tables, or it might just be that it doesn't do any scraping in the test stub because it's not querying a real database. |
All right, here's what I've found so far:
In the test, all the right internal structures are updated, so I think there's another thread somehow collecting them and exporting them on this endpoint. All the other tests sidestep this by calling everything directly. In the prometheus tests for example, the stats are collected in the tests, rather than registered in another function. In our case, we're trying to test that these registered metrics handlers are working properly, so we don't have access to them directly in the test. I can at least test whether they are registered though, by seeing if they even show up in the JSON object returned by the expvar handler. To me, this raises another question. Essentially, will this actually reduce memory usage? It seems like for core functionality, we have to actually load the table information anyway, so all I'm changing is whether we export it. If that's the case, I'd expect the scalability issue to be very similar. |
@deepthi based on my last comment, I have two questions for you:
|
Actually, it would still be useful, because it will reduce memory usage on prometheus instances that scrape this. Still good to know whether this is the case though. |
I believe #5809 will resolve the original intent of this in a more structural way. Although this line in the PR description gives me pause:
@sougou Does this mean that PR doesn't address this per table issue, because these are mostly gauges? |
Sorry, I didn't realize that they were affected by gauges also. Since it doesn't make sense to drop a dimension for gauges, I think the best approach will be to drop those variables altogether. Something like |
I've updated #5809 to support a new flag |
Thanks @sougou ! I looked at your commit and can confirm that it solves the problem that this PR was trying to address. Closing it out now. |
If the user is creating a large number of tables, this could cause a spike in memory. This change allows the user to turn off any metrics that scale with the number of tables.