Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix scaling dashboard to work on multi-zone ingesters #365

Merged
merged 3 commits into from
Jul 28, 2021

Conversation

pracucci
Copy link
Collaborator

What this PR does:
We have some clusters running Cortex ingesters in multi-zone. Each zone is a StatefulSet whose name matches this pattern ingester-zone-[a-z], so their pod names are like ingester-zone-a-0 or ingester-zone-b-1. All dashboards work correctly except for the scaling dashboard and this PR proposes a fix for that.

Reason why the scaling dashboard doesn't work is that some recording rule computes the expected scaling value summing up all ingesters (eg. cluster_namespace_deployment_reason:required_replicas:count) while the actual usage metrics are grouped by deployment/statefulset name, so they're splitted by zone.

I think the desired behaviour is summing up all ingesters, regardless their zone, so in this PR I'm proposing to remove the -zone-[a-z] suffix (if found) when we compute the deployment name.

A couple of notes:

  • I manually tested all updated recording rules and should work
  • I had to wrap with another label_replace() due to conflicts with regex greediness

Which issue(s) this PR fixes:
N/A

Checklist

  • CHANGELOG.md updated - the order of entries should be [CHANGE], [FEATURE], [ENHANCEMENT], [BUGFIX]

@pracucci pracucci requested a review from tomwilkie July 28, 2021 12:10
@pracucci pracucci requested a review from a team as a code owner July 28, 2021 12:10
label_replace(kube_statefulset_replicas, "deployment", "$1", "statefulset", "(.*)")
label_replace(
kube_deployment_spec_replicas,
"deployment", "$1", "deployment", "(.*?)(?:-zone-[a-z])?"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is the first question mark necessary? Shouldn't it be:

Suggested change
"deployment", "$1", "deployment", "(.*?)(?:-zone-[a-z])?"
"deployment", "$1", "deployment", "(.*)(?:-zone-[a-z])?"

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The first question mark is to make it non-greedy. Since the (?:-zone-[a-z])? is optional (ending ?), if the first (.*) is greedy then it always match everything and never removes the zone. Adding (.*?) we make the first .* non greedy.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TIL! Can you add a comment to this effect please?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, done.

Comment on lines 83 to 86
label_replace(
label_replace(kube_statefulset_replicas, "deployment", "$1", "statefulset", "(.*)"),
"deployment", "$1", "deployment", "(.*?)(?:-zone-[a-z])?"
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The inner label replace is just moving the statefuleset label to the deployment label, so could be done with this I believe:

Suggested change
label_replace(
label_replace(kube_statefulset_replicas, "deployment", "$1", "statefulset", "(.*)"),
"deployment", "$1", "deployment", "(.*?)(?:-zone-[a-z])?"
)
label_replace(kube_statefulset_replicas, "deployment", "$1", "statefulset", "(.*?)(?:-zone-[a-z])?"),

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's right. I've applied the suggested change and manually tested it.

@pracucci pracucci merged commit dccf32a into main Jul 28, 2021
@pracucci pracucci deleted the fix-scaling-dashboard-for-multi-zone-deployments branch July 28, 2021 14:13
simonswine pushed a commit to grafana/mimir that referenced this pull request Oct 18, 2021
…g-dashboard-for-multi-zone-deployments

Fix scaling dashboard to work on multi-zone ingesters
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants