Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

promxy query is too slow #661

Open
resurgence72 opened this issue Jul 11, 2024 · 4 comments
Open

promxy query is too slow #661

resurgence72 opened this issue Jul 11, 2024 · 4 comments

Comments

@resurgence72
Copy link

resurgence72 commented Jul 11, 2024

hi, i use promxy to proxy my victoriametrics-cluster datasource,the data link is grafana -> promxy -> vm-cluster;now I find that promxy queries take much longer than vm queries.

test promql:topk(5,(sum(rate(container_cpu_usage_seconds_total{container!=''}[5m])) by (pod,namespace,project_mark,app_type,cluster,container) / sum(container_spec_cpu_quota{container!=''}/container_spec_cpu_period{container!=''}) by (pod,namespace,project_mark,app_type,cluster,container) * 100 !=+inf) )
range_query: 24h
image

This is my query time with promxy:17s+ :(
When I switch back to the vm datasource,the query duration is only 400ms+
image

my promxy config:
image

promxy version:v0.0.86

@jacksontj
Copy link
Owner

First off, welcome! Glad to see more users :)

That performance gap definitely doesn't seem normal -- so we'll have to dig in a bit more. Ideally if you could provide a tcpdump or trace logging of an instance doing this query -- that would give us some good insight into what is happening.

@resurgence72
Copy link
Author

hi @jacksontj ,A few months later, when I deployed the latest version of promxy(v0.0.91) again, the problem was still there. This time in the log, I found a possible reason.

image

When I execute count(xxx), the response is fast, and you can tell from the log that promxy passes promql directly to the backend promtheus like node and receives the result

but, when I execute agg type promql (((container_memory_working_set_bytes/container_spec_memory_limit_bytes)*100 !=+inf ) > 70), the response is very slow (timeout and oom). Check the log promxy performs similar push-down operations on promql, causing it to obtain all the data of the backend node and aggregate it in memory in real time.
image

Does promxy provide an option to control whether the push-down function should be enabled?

@resurgence72
Copy link
Author

We currently use multi-level vmselect to aggregate multiple vm-clusters to provide a unified data source, but when the query volume is large, the query can be slow because the top-level vmselect will pull all data and do calculations in its own memory. I would like to have a component that can only aggregate the results (in the current scenario, the data from the data source is not repeated), could you provide some suggestions?Or is there a problem with my understanding or usage of promxy ?

@jacksontj
Copy link
Owner

Does promxy provide an option to control whether the push-down function should be enabled?

Promxy will do as much pushdown as it can do without altering the results. The issue here is that the query is a bit too complex to guarantee a pushdown would work. To explain this I will simplify your query a bit -- so it is hopefully easier to follow.
Your query is effectively (a/b)!=10 which seems straightforward. The complexity here comes from the BinaryExpr a/b -- promxy doesn't know if each node has all of the matching a and b series on the same node. To illustrate this with the most extreme example; assume you had one servergroup which had a and one servergroup that had b -- in that case we couldn't do a pushdown to either because we'd end up with no results.

I would like to have a component that can only aggregate the results (in the current scenario, the data from the data source is not repeated), could you provide some suggestions?
The complexities I discussed above are the real limiting factor here. If the servergroups are layed out such that a given series pair is on a single servergroup we could theoretically add some grouping to signal to promxy about this... I'd have to think on this a bit more; but as a naive thought if we had something like shard as a label attached to the servergroup and we know that these series will match based on a superset of labels which include that shard label -- then we can safely do pushdown.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants