Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix the proxy server backend metric error #295

Closed
wants to merge 1 commit into from

Conversation

YRXING
Copy link

@YRXING YRXING commented Nov 24, 2021

What type of PR is this?

/kind bug

What this PR does / why we need it:

the backend metric should be recorded by proxy server

Which issue(s) this PR fixes:

Fixes #294

@k8s-ci-robot k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Nov 24, 2021
@linux-foundation-easycla
Copy link

linux-foundation-easycla bot commented Nov 24, 2021

CLA Signed

The committers are authorized under a signed CLA.

@k8s-ci-robot
Copy link
Contributor

Welcome @YRXING!

It looks like this is your first PR to kubernetes-sigs/apiserver-network-proxy 🎉. Please refer to our pull request process documentation to help your PR have a smooth ride to approval.

You will be prompted by a bot to use commands during the review process. Do not be afraid to follow the prompts! It is okay to experiment. Here is the bot commands documentation.

You can also check if kubernetes-sigs/apiserver-network-proxy has its own contribution guidelines.

You may want to refer to our testing guide if you run into trouble with your tests not passing.

If you are having difficulty getting your pull request seen, please follow the recommended escalation practices. Also, for tips and tricks in the contribution process you may want to read the Kubernetes contributor cheat sheet. We want to make sure your contribution gets all the attention it needs!

Thank you, and welcome to Kubernetes. 😃

@k8s-ci-robot
Copy link
Contributor

Hi @YRXING. Thanks for your PR.

I'm waiting for a kubernetes-sigs member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot added needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. size/S Denotes a PR that changes 10-29 lines, ignoring generated files. labels Nov 24, 2021
@@ -164,6 +167,16 @@ func (a *ServerMetrics) SetBackendCount(count int) {
a.backend.WithLabelValues().Set(float64(count))
}

// BackendCountInc increments a new backend connection.
func (a *ServerMetrics) BackendCountInc(manager string, idType string) {
Copy link
Contributor

@mainred mainred Nov 27, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your contribution.
we have only one managed supported by the server once, so I guess it's not necessary to record metrics for each backend manager. the introduction of idType is valuable, but for current use cases, I'd suggest we keep the code as it is now.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

multiple backend managers are supported for one proxy-sever, close this comment

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

anything else I need to do about this issue?

@@ -200,14 +200,17 @@ func (s *ProxyServer) addBackend(agentID string, conn agent.AgentService_Connect
for _, ipv4 := range agentIdentifiers.IPv4 {
klog.V(5).InfoS("Add the agent to DestHostBackendManager", "agent address", ipv4)
s.BackendManagers[i].AddBackend(ipv4, pkgagent.IPv4, conn)
metrics.Metrics.BackendCountInc("DestHostBackendManager", "ipv4")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's leaving this logic in AddBackend is preferable, which makes adding backend logic more clear.
How about adding filed to backend storage to differentiate backend count for different kinds of backend managers.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please review the code.

@k8s-ci-robot k8s-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/S Denotes a PR that changes 10-29 lines, ignoring generated files. labels Dec 2, 2021
dibm.mu.RLock()
defer dibm.mu.RUnlock()
if len(dibm.backends) == 0 {
func (drbm *DefaultRouteBackendManager) Backend(ctx context.Context) (Backend, error) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@@ -204,7 +214,6 @@ func (s *DefaultBackendStorage) AddBackend(identifier string, idType pkgagent.Id
return addedBackend
}
s.backends[identifier] = []*backend{addedBackend}
metrics.Metrics.SetBackendCount(len(s.backends))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your contribution, @YRXING .
We are complicating the logic of the backend count metrics. we don't have to rewrite AddBackend for each manager to collect backend count metrics. code maintenance and reusability should be taken into consideration.

How about we keep SetBackendCount here and add the manager name, idtype into it. manager name, can be defined as member of backendstore.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does the DefaultBackendStorage need record different idTypes count metric or just record the total backend count?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Following your thought already in this PR, record different backend counts from different idTypes and backend managers. SetBackendCount will finally look like this way after other changes are also included.

Suggested change
metrics.Metrics.SetBackendCount(len(s.backends))
metrics.Metrics.SetBackendCount(s.backendType, idType, len(s.backends))

cc @cheftako and @jkh52 who will give the final say to this PR.

@k8s-ci-robot k8s-ci-robot added size/M Denotes a PR that changes 30-99 lines, ignoring generated files. and removed size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Dec 21, 2021
@jkh52
Copy link
Contributor

jkh52 commented Jan 14, 2022

/ok-to-test

@k8s-ci-robot k8s-ci-robot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Jan 14, 2022
@jkh52
Copy link
Contributor

jkh52 commented Jan 14, 2022

/lgtm

I like depending on len() only, compared with Inc() / Dec().

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jan 14, 2022
@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle stale
  • Mark this issue or PR as rotten with /lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Apr 14, 2022
[]string{
"manager",
"idType",
},
Copy link
Contributor

@jkh52 jkh52 May 13, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@logicalhan Is a change like this backward compatible, and if not is it more appropriate to create a new metric and deprecate the old?

https://kubernetes.io/blog/2021/04/23/kubernetes-release-1.21-metrics-stability-ga/

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is definitely not backwards compatible. This isn't in k8s-proper, so the same rules of API conformance do not necessarily hold here. I would at the very least add a release note clearly denoting the API change to the metric.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also what's the expected cardinality of manager and idType?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think cardinality is likely not a concern. (manager is currently 3, idType is about 3 or 4).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is definitely not backwards compatible. This isn't in k8s-proper, so the same rules of API conformance do not necessarily hold here. I would at the very least add a release note clearly denoting the API change to the metric.

This is a binary we include in some distros of K8s (i.e. its an optional binary in K8s) so we should probably consider it part of K8s.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think cardinality is likely not a concern. (manager is currently 3, idType is about 3 or 4).

Technically the idType is part of a CLI flag on the agent (Eg https://github.com/kubernetes/kubernetes/blob/master/cluster/gce/addons/konnectivity-agent/konnectivity-agent-ds.yaml#L43), so theoretically its not bounded.
Practically the number of managers and id types's are under the cloud provider's control.
So I would expect this number to be small.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed. However that is on the agent side and the agent is at some level a reference implementation.
If we had similar code on the server side to limit the cardinality of idType, then I think we could reasonably claim it was limited.

@jkh52
Copy link
Contributor

jkh52 commented May 13, 2022

/remove-lgtm

I think we should keep this metric as-is, and add a new one. That way we avoid breaking current integrations with metrics readers.

@k8s-ci-robot k8s-ci-robot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label May 13, 2022
@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

@k8s-ci-robot k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Jun 15, 2022
@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Reopen this issue or PR with /reopen
  • Mark this issue or PR as fresh with /remove-lifecycle rotten
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close

@k8s-ci-robot
Copy link
Contributor

@k8s-triage-robot: Closed this PR.

In response to this:

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Reopen this issue or PR with /reopen
  • Mark this issue or PR as fresh with /remove-lifecycle rotten
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@cheftako
Copy link
Contributor

cheftako commented Sep 9, 2022

/remove-lifecycle rotten

@cheftako cheftako reopened this Sep 9, 2022
@k8s-ci-robot k8s-ci-robot removed the lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. label Sep 9, 2022
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: YRXING
Once this PR has been reviewed and has the lgtm label, please assign andrewsykim for approval by writing /assign @andrewsykim in a comment. For more information see:The Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Sep 9, 2022
@k8s-ci-robot
Copy link
Contributor

@YRXING: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
pull-apiserver-network-proxy-docker-build-arm64 1a8d66f link true /test pull-apiserver-network-proxy-docker-build-arm64
pull-apiserver-network-proxy-make-lint 1a8d66f link true /test pull-apiserver-network-proxy-make-lint
pull-apiserver-network-proxy-test 1a8d66f link true /test pull-apiserver-network-proxy-test
pull-apiserver-network-proxy-docker-build-amd64 1a8d66f link true /test pull-apiserver-network-proxy-docker-build-amd64

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@k8s-ci-robot k8s-ci-robot added needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. and removed needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. labels Dec 8, 2022
@k8s-ci-robot
Copy link
Contributor

@YRXING: PR needs rebase.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough contributors to adequately respond to all PRs.

This bot triages PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the PR is closed

You can:

  • Mark this PR as fresh with /remove-lifecycle stale
  • Close this PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Mar 9, 2023
@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough active contributors to adequately respond to all PRs.

This bot triages PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the PR is closed

You can:

  • Mark this PR as fresh with /remove-lifecycle rotten
  • Close this PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

@k8s-ci-robot k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Apr 8, 2023
@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the PR is closed

You can:

  • Reopen this PR with /reopen
  • Mark this PR as fresh with /remove-lifecycle rotten
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close

@k8s-ci-robot
Copy link
Contributor

@k8s-triage-robot: Closed this PR.

In response to this:

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the PR is closed

You can:

  • Reopen this PR with /reopen
  • Mark this PR as fresh with /remove-lifecycle rotten
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. ok-to-test Indicates a non-member PR verified by an org member that is safe to test. size/M Denotes a PR that changes 30-99 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[BUG] The backend ServerMetric is not correct
7 participants