[nginx] VTS metrics breaks prometheus endpoint `:10254/metrics` (0.9.0-beta.3) #448

MaikuMori · 2017-03-15T16:30:37Z

After deploying 0.9.0-beta.3 and enabling vts the metrics endpoint is broken.

Going to :10245/metrics throws 500 with following text:

An error has occurred during metrics gathering:

1120 error(s) occurred:
* collected metric nginx_nginx_all_upstream_responses_total label:<name:"server" value:"10.0.10.62:4000" > label:<name:"status_code" value:"1xx" > label:<name:"upstream" value:"redacted-redacted-staging-redacted" > counter:<value:0 >  was collected before with the same name and label values
* collected metric nginx_nginx_all_upstream_responses_total label:<name:"server" value:"10.0.10.62:4000" > label:<name:"status_code" value:"2xx" > label:<name:"upstream" value:"redacted-redacted-staging-redacted" > counter:<value:0 >  was collected before with the same name and label values
* collected metric nginx_nginx_all_upstream_responses_total label:<name:"server" value:"10.0.10.62:4000" > label:<name:"status_code" value:"3xx" > label:<name:"upstream" value:"redacted-redacted-staging-redacted" > counter:<value:0 >  was collected before with the same name and label values
* collected metric nginx_nginx_all_upstream_responses_total label:<name:"server" value:"10.0.10.62:4000" > label:<name:"status_code" value:"4xx" > label:<name:"upstream" value:"redacted-redacted-staging-redacted" > counter:<value:0 >  was collected before with the same name and label values
* collected metric nginx_nginx_all_upstream_responses_total label:<name:"server" value:"10.0.10.62:4000" > label:<name:"status_code" value:"5xx" > label:<name:"upstream" value:"redacted-redacted-staging-redacted" > counter:<value:0 >  was collected before with the same name and label values

... snip ... (they're all the same error, just different values)

The text was updated successfully, but these errors were encountered:

MaikuMori · 2017-03-15T16:45:16Z

Possible related people: @gianrubio @aledbf

Could it be that that particular Ingress has somewhat complex setup?

Heavily redacted Ingress resource in question:

apiVersion: extensions/v1beta1
kind: Ingress
metadata:
  name: redacted-staging
  namespace: redacted
  labels:
    environment: staging
    project: redacted
  annotations:
    kubernetes.io/tls-acme: "true"
    kubernetes.io/ingress.class: "nginx"
spec:
  tls:
    - hosts:
        - host1.redacted.com
        - host2.redacted.com
        - host3.redacted.com
        - host4.redacted.com
        - host5.redacted.com
        - host6.redacted.com
      secretName: redacted-tls
  rules:
    - host: host1.redacted.com
      http:
        paths:
          - path: /
            backend:
              serviceName: redacted-staging
              servicePort: port-1
          - path: /admin
            backend:
              serviceName: redacted-staging
              servicePort: port-1-admin
    - host: host2.redacted.com
      http:
        paths:
          - path: /
            backend:
              serviceName: redacted-staging
              servicePort: port-1
          - path: /admin
            backend:
              serviceName: redacted-staging
              servicePort: port-1-admin
    - host: host3.redacted.com
      http:
        paths:
          - path: /
            backend:
              serviceName: redacted-staging
              servicePort: port-2
    - host: host4.redacted.com
      http:
        paths:
          - path: /
            backend:
              serviceName: redacted-staging
              servicePort: port-3
    - host: host5.redacted.com
      http:
        paths:
          - path: /
            backend:
              serviceName: redacted-staging
              servicePort: port-4
    - host: host6.redacted.com
      http:
        paths:
          - path: /
            backend:
              serviceName: redacted-staging
              servicePort: port-1

All 1120 erors are from 2 different versions of this same ingress resource.

aledbf · 2017-03-15T16:58:12Z

@MaikuMori please check the vts output in url :18080/nginx_status and :18080/nginx_status/format/json (if you can, please redact the upstream names and hosts and send the json)

MaikuMori · 2017-03-15T17:53:20Z

Status endpoint works, that's the first thing I checked. I'm sending you email with json. It's somewhat big because we have more Ingreses besides this one.

gianrubio · 2017-03-15T18:07:01Z

Did you customize your template? Looks like your upstream is not named in the pattern name-portnumber.

On Wed, 15 Mar 2017 at 18:53, Miks Kalnins ***@***.***> wrote: Status endpoint works, that's the first thing I checked. I'm sending you email with json. It's somewhat big because we have more Ingreses besides this one. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#448 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AFbljAAW_Cv0DStbCJSBnEabvQWtCkACks5rmCWRgaJpZM4MeMd2> .

-- Giancarlo Rubio

MaikuMori · 2017-03-15T19:39:26Z

I don't use custom template.

Also working via email with @aledbf where I provided him some additional debug info.

aledbf · 2017-03-15T19:43:59Z

@gianrubio I think the issue is related to the the reuse of the upstreams inside the same zone (multiple ingress pointing to the same service).
I will try to reproduce this using the echoheaders service

gianrubio · 2017-03-15T22:55:36Z

I just reproduce the error, the controller is duplicating the server upstream. I'm looking how to fix this

nginx.conf

    upstream default-default-http-backend-port-1 {
        least_conn;
        server 172.17.0.8:8080 max_fails=0 fail_timeout=0;
        server 172.17.0.8:8080 max_fails=0 fail_timeout=0;
    }
    upstream default-default-http-backend-port-1-admin {
        least_conn;
        server 172.17.0.8:8080 max_fails=0 fail_timeout=0;
        server 172.17.0.8:8080 max_fails=0 fail_timeout=0;
    }

Error

$ curl 192.168.99.100:10254/metrics
An error has occurred during metrics gathering:

28 error(s) occurred:
* collected metric nginx_nginx_all_upstream_responses_total label:<name:"server" value:"172.17.0.8:8080" > label:<name:"status_code" value:"1xx" > label:<name:"upstream" value:"default-default-http-backend-port-1" > counter:<value:0 >  was collected before with the same name and label values
* collected metric nginx_nginx_all_upstream_responses_total label:<name:"server" value:"172.17.0.8:8080" > label:<name:"status_code" value:"2xx" > label:<name:"upstream" value:"default-default-http-backend-port-1" > counter:<value:0 >  was collected before with the same name and label values
* collected metric nginx_nginx_all_upstream_responses_total label:<name:"server" value:"172.17.0.8:8080" > label:<name:"status_code" value:"3xx" > label:<name:"upstream" value:"default-default-http-backend-port-1" > counter:<value:0 >  was collected before with the same name and label values

MaikuMori · 2017-03-15T22:59:59Z

I concur this is probably the error since we have multiple upstream servers with same and/or different ports.

gianrubio · 2017-03-15T23:01:32Z

@MaikuMori just to confirm, could you share your upstream for redacted-redacted-staging-redacted ?

aledbf · 2017-03-15T23:03:40Z

I just reproduce the error, the controller is duplicating the server upstream. I'm looking how to fix this

is not duplicating the upstream, the ports are different (names) https://github.com/kubernetes/ingress/blob/master/core/pkg/ingress/controller/controller.go#L739

gianrubio · 2017-03-15T23:07:49Z

is not duplicating the upstream, the ports are different (names)

Sorry, it's duplicating the server

server 172.17.0.8:8080 max_fails=0 fail_timeout=0;
server 172.17.0.8:8080 max_fails=0 fail_timeout=0;

aledbf · 2017-03-15T23:17:16Z

@gianrubio right, but that is ok. The current implementation allows different configuration for the same service if is used by different ingress rules like sticky sessions
Maybe we need to preprocess the stats in order to avoid this issue?

aledbf · 2017-03-16T11:17:36Z

@MaikuMori please test the image quay.io/aledbf/nginx-ingress-controller:0.78

aledbf · 2017-03-16T11:22:11Z

@MaikuMori @gianrubio this issue is related to #455

MaikuMori · 2017-03-16T15:26:12Z

Yep, this indeed fixes the problem.

MaikuMori changed the title ~~[nginx] VTS metrics break prometheus endpoint :10254/metrics (0.9.0-beta.3)~~ [nginx] VTS metrics breaks prometheus endpoint :10254/metrics (0.9.0-beta.3) Mar 15, 2017

MaikuMori mentioned this issue Mar 16, 2017

[nginx] Nginx template duplicates upstreams. #455

Closed

aledbf mentioned this issue Mar 16, 2017

Avoid upstreams with multiple servers with the same port #456

Merged

aledbf closed this as completed in #456 Mar 16, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[nginx] VTS metrics breaks prometheus endpoint `:10254/metrics` (0.9.0-beta.3) #448

[nginx] VTS metrics breaks prometheus endpoint `:10254/metrics` (0.9.0-beta.3) #448

MaikuMori commented Mar 15, 2017

MaikuMori commented Mar 15, 2017

aledbf commented Mar 15, 2017

MaikuMori commented Mar 15, 2017

gianrubio commented Mar 15, 2017 via email

MaikuMori commented Mar 15, 2017

aledbf commented Mar 15, 2017 •

edited

Loading

gianrubio commented Mar 15, 2017

MaikuMori commented Mar 15, 2017

gianrubio commented Mar 15, 2017

aledbf commented Mar 15, 2017

gianrubio commented Mar 15, 2017

aledbf commented Mar 15, 2017

aledbf commented Mar 16, 2017

aledbf commented Mar 16, 2017

MaikuMori commented Mar 16, 2017

[nginx] VTS metrics breaks prometheus endpoint :10254/metrics (0.9.0-beta.3) #448

[nginx] VTS metrics breaks prometheus endpoint :10254/metrics (0.9.0-beta.3) #448

Comments

MaikuMori commented Mar 15, 2017

MaikuMori commented Mar 15, 2017

aledbf commented Mar 15, 2017

MaikuMori commented Mar 15, 2017

gianrubio commented Mar 15, 2017 via email

MaikuMori commented Mar 15, 2017

aledbf commented Mar 15, 2017 • edited Loading

gianrubio commented Mar 15, 2017

nginx.conf

Error

MaikuMori commented Mar 15, 2017

gianrubio commented Mar 15, 2017

aledbf commented Mar 15, 2017

gianrubio commented Mar 15, 2017

aledbf commented Mar 15, 2017

aledbf commented Mar 16, 2017

aledbf commented Mar 16, 2017

MaikuMori commented Mar 16, 2017

[nginx] VTS metrics breaks prometheus endpoint `:10254/metrics` (0.9.0-beta.3) #448

[nginx] VTS metrics breaks prometheus endpoint `:10254/metrics` (0.9.0-beta.3) #448

aledbf commented Mar 15, 2017 •

edited

Loading