Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Runtime metrics not generated #1241

Open
WeihanLi opened this issue Dec 23, 2021 · 15 comments
Open

Runtime metrics not generated #1241

WeihanLi opened this issue Dec 23, 2021 · 15 comments
Assignees
Labels
bug Something isn't working

Comments

@WeihanLi
Copy link
Contributor

WeihanLi commented Dec 23, 2021

Description

When I try to use the dotnet-monitor as a sidecar container, the runtime metrics is not generated

I could only get the following metrics:

image

It seemed only have microsoftaspnetcorehosting related provider? Not sure if I'm wrong with some config

Configuration

deployment yaml https://github.com/WeihanLi/SparkTodo/blob/82ba4ac7493afcb476d096cf11f482e9a297003a/sparktodo-api-k8s-deploy.yaml#L20

apiVersion: apps/v1
kind: Deployment
metadata:
  name: sparktodo-api
  labels:
    app: sparktodo-api
spec:
  replicas: 1
  revisionHistoryLimit: 0
  selector:
    matchLabels:
      app: sparktodo-api
  minReadySeconds: 0
  strategy:
    type: RollingUpdate
    rollingUpdate:
      maxUnavailable: 1
      maxSurge: 1

  template:
    metadata:
      annotations:
        prometheus.io/scrape: "true"
        prometheus.io/port: "52323"
      labels:
        app: sparktodo-api
    
    spec:
      containers:
        - name: sparktodo-api
          image: weihanli/sparktodo-api:latest
          imagePullPolicy: Always
          resources:
            requests:
              memory: "64Mi"
              cpu: "20m"
            limits:
              memory: "128Mi"
              cpu: "50m"
          env:
          - name: ASPNETCORE_URLS
            value: http://+:80
          - name: DOTNET_DiagnosticPorts
            value: /diag/port
          ports:
            - name: http
              containerPort: 80
              protocol: TCP
          livenessProbe:
            httpGet:
              path: /health
              port: 80
            initialDelaySeconds: 60
            periodSeconds: 30
          readinessProbe:
            httpGet:
              path: /health
              port: 80
            initialDelaySeconds: 60
            periodSeconds: 30
          volumeMounts:
          - mountPath: /diag
            name: diagvol
          - mountPath: /dumps
            name: dumpsvol
        - name: monitor
          image: mcr.microsoft.com/dotnet/monitor:6.0.0
          # DO NOT use the --no-auth argument for deployments in production
          args: [ "--no-auth" ]
          imagePullPolicy: Always
          ports:
            - containerPort: 52323
          env:
          - name: DOTNETMONITOR_DiagnosticPort__ConnectionMode
            value: Listen
          - name: DOTNETMONITOR_DiagnosticPort__EndpointName
            value: /diag/port
          - name: DOTNETMONITOR_Storage__DumpTempFolder
            value: /dumps
          - name: DOTNETMONITOR_Urls
            value: "http://*:52323"
          volumeMounts:
          - mountPath: /diag
            name: diagvol
          - mountPath: /dumps
            name: dumpsvol
          resources:
            requests:
              cpu: 50m
              memory: 32Mi
            limits:
              cpu: 100m
              memory: 256Mi
          securityContext:
            capabilities:
              add: ["SYS_PTRACE"]
      volumes:
      - name: diagvol
        emptyDir: {}
      - name: dumpsvol
        emptyDir: {}
@WeihanLi WeihanLi added the bug Something isn't working label Dec 23, 2021
@WeihanLi WeihanLi reopened this Dec 30, 2021
@WeihanLi
Copy link
Contributor Author

WeihanLi commented Jan 7, 2022

I try to download the trace data, and there's runtime counters data in the trace
image

@wiktork wiktork self-assigned this May 24, 2022
@johan-bjerling
Copy link

I'm experiencing the same issue when I run dotnet-monitor as a sidecar in AKS.

I'm able add other providers, e.g. System.Net.Http and Microsoft-AspNetCore-Server-Kestrel and I get metrics for them.
I even tried to add System.Runtime manually, both with IncludeDefaultProviders set to true and to false, but to no avail.

@jander-msft
Copy link
Member

Hey Folks, make sure that you are using the latest dotnet-monitor version (this is easily done if you are using a floating tag e.g. 6 or 6-alpine, you may need to configure your deployment to always pull the image in order to get a newer version for these floating tags). We released an update in August that may contain a fix for this issue:

@johan-bjerling
Copy link

Hey Folks, make sure that you are using the latest dotnet-monitor version (this is easily done if you are using a floating tag e.g. 6 or 6-alpine, you may need to configure your deployment to always pull the image in order to get a newer version for these floating tags). We released an update in August that may contain a fix for this issue:

@jander-msft Just tried this again with the latest version of dotnet-monitor running in a side-car in AKS with default providers. But I can't seem to get any metrics other than these I'm afraid:

/app # dotnet-monitor --version
6.2.2+8abeb94c15ee4175d7078a3f768ed1ef15032bd8
/app # curl http://localhost:52323/metrics
# HELP microsoftaspnetcorehosting_requests_per_second Request Rate
# TYPE microsoftaspnetcorehosting_requests_per_second gauge
microsoftaspnetcorehosting_requests_per_second 6 1663747451248
microsoftaspnetcorehosting_requests_per_second 8 1663747456248
microsoftaspnetcorehosting_requests_per_second 7 1663747461248
# HELP microsoftaspnetcorehosting_total_requests Total Requests
# TYPE microsoftaspnetcorehosting_total_requests gauge
microsoftaspnetcorehosting_total_requests 307 1663747451248
microsoftaspnetcorehosting_total_requests 315 1663747456248
microsoftaspnetcorehosting_total_requests 322 1663747461248
# HELP microsoftaspnetcorehosting_current_requests Current Requests
# TYPE microsoftaspnetcorehosting_current_requests gauge
microsoftaspnetcorehosting_current_requests 0 1663747451248
microsoftaspnetcorehosting_current_requests 0 1663747456248
microsoftaspnetcorehosting_current_requests 0 1663747461248
# HELP microsoftaspnetcorehosting_failed_requests Failed Requests
# TYPE microsoftaspnetcorehosting_failed_requests gauge
microsoftaspnetcorehosting_failed_requests 1 1663747451248
microsoftaspnetcorehosting_failed_requests 1 1663747456248
microsoftaspnetcorehosting_failed_requests 1 1663747461248

@xsoheilalizadeh
Copy link

@johan-bjerling I'm using following configuration and it works for me.

- name: monitor
  securityContext:
    runAsUser: 1000
  image: mcr.microsoft.com/dotnet/monitor:6-alpine
  args: [ "--no-auth" ]
  imagePullPolicy: IfNotPresent
  ports:
    - name: metrics
      containerPort: 52323
  env:
    - name: DOTNETMONITOR_DiagnosticPort__ConnectionMode
      value: "Listen"
    - name: DOTNETMONITOR_DiagnosticPort__EndpointName
      value: "/diag/port.sock"
    - name: DOTNETMONITOR_Storage__DumpTempFolder
      value: "/diag/dumps"
    - name: DOTNETMONITOR_Urls
      value: http://+:52323
    - name: DefaultProcess__Filters__0__Key 
      value: "ProcessName"
    - name: DefaultProcess__Filters__0__Value 
      value: "dotnet"
    - name: DotnetMonitor_Metrics__Providers__0__ProviderName
      value: "Microsoft-AspNetCore-Server-Kestrel"     
    - name: DotnetMonitor_Metrics__Providers__1__ProviderName
      value: "Microsoft.AspNetCore.Http.Connections"    
    - name: DotnetMonitor_Metrics__Providers__2__ProviderName
      value: "System.Net.Http"    
    - name: DotnetMonitor_Metrics__Providers__3__ProviderName
      value: "System.Net.NameResolution"    
    - name: DotnetMonitor_Metrics__Providers__4__ProviderName
      value: "System.Net.Security"    
    - name: DotnetMonitor_Metrics__Providers__5__ProviderName
      value: "System.Net.Sockets"    
  volumeMounts:
    - mountPath: /diag
      name: diagvol
  resources:
    requests:
      memory: "32Mi"
      cpu: "50m"
    limits:
      memory: "256Mi"
      cpu: "250m"    

@johan-bjerling
Copy link

@xsoheilalizadeh Yep, I can get all of those metrics to work for me as well. Unfortunately neither adding "System.Runtime" explicitly or relying on it implicitly being added via the default providers seem to work.

@jander-msft
Copy link
Member

@xsoheilalizadeh Yep, I can get all of those metrics to work for me as well. Unfortunately neither adding "System.Runtime" explicitly or relying on it implicitly being added via the default providers seem to work.

Could you provide the full configuration of your dotnet-monitor instance? If you exec into the dotnet-monitor container and execute dotnet-monitor config show, that would provide the configuration as understood by the running instance.

#2730 may be related

@johan-bjerling
Copy link

@jander-msft sure, here's the full config:

{
  "urls": "http://localhost:52323",
  "Kestrel": ":NOT PRESENT:",
  "Templates": ":NOT PRESENT:",
  "CollectionRuleDefaults": ":NOT PRESENT:",
  "GlobalCounter": {
    "IntervalSeconds": "5"
  },
  "CollectionRules": ":NOT PRESENT:",
  "CorsConfiguration": ":NOT PRESENT:",
  "DiagnosticPort": {
    "ConnectionMode": "Listen",
    "DeleteEndpointOnStartup": "true",
    "EndpointName": "/diag/port.sock"
  },
  "Metrics": {
    "Enabled": "True",
    "Endpoints": "http://localhost:52325",
    "IncludeDefaultProviders": "True",
    "MetricCount": "3",
    "Providers": [
      {
        "ProviderName": "Microsoft-AspNetCore-Server-Kestrel"
      },
      {
        "ProviderName": "Microsoft.AspNetCore.Http.Connections"
      },
      {
        "ProviderName": "System.Net.Http"
      },
      {
        "ProviderName": "System.Net.NameResolution"
      },
      {
        "ProviderName": "System.Net.Security"
      },
      {
        "ProviderName": "System.Net.Sockets"
      },
      {
        "ProviderName": "System.Runtime"
      }
    ]
  },
  "Storage": {
    "DumpTempFolder": "/diag/dumps"
  },
  "DefaultProcess": {
    "Filters": [
      {
        "Key": "ProcessId",
        "ProcessName": "dotnet",
        "Value": "1"
      }
    ]
  },
  "Logging": {
    "Console": {
      "FormatterName": "json",
      "FormatterOptions": {
        "IncludeScopes": "True",
        "TimestampFormat": "yyyy-MM-ddTHH:mm:ss.fffffffZ",
        "UseUtcTimestamp": "true"
      }
    },
    "EventLog": {
      "LogLevel": {
        "Default": "Information",
        "Microsoft": "Warning",
        "Microsoft.Diagnostics": "Information",
        "Microsoft.Hosting.Lifetime": "Information"
      }
    },
    "LogLevel": {
      "Default": "Information",
      "Microsoft": "Warning",
      "Microsoft.Diagnostics": "Information",
      "Microsoft.Diagnostics.Monitoring.WebApi.OutputStreamResult": "Warning",
      "Microsoft.Hosting.Lifetime": "Information"
    }
  },
  "Authentication": ":NOT PRESENT:",
  "Egress": {
    "Properties": {
      "MonitorBlobAccountKey": ":REDACTED:"
    },
    "AzureBlobStorage": {
      "MonitorBlob": {
        "AccountUri": "redacted",
        "BlobPrefix": "dotnetmetrics",
        "ContainerName": "storeapi",
        "CopyBufferSize": ":NOT PRESENT:",
        "QueueName": ":NOT PRESENT:",
        "QueueAccountUri": ":NOT PRESENT:",
        "SharedAccessSignature": ":NOT PRESENT:",
        "AccountKey": ":NOT PRESENT:",
        "SharedAccessSignatureName": ":NOT PRESENT:",
        "AccountKeyName": "MonitorBlobAccountKey",
        "ManagedIdentityClientId": ":NOT PRESENT:"
      }
    },
    "FileSystem": ":NOT PRESENT:"
  }

@jander-msft
Copy link
Member

Couple of thoughts here:

@johan-bjerling
Copy link

Couple of thoughts here:

My app was not trimmed. It was however using OpenTelemetry. But as we couldn't get the dotnet-monitor sidecar to work quite how we wanted it, that's now been removed. So I don't have an easy way of testing this again without OpenTelemetry.

I suspect I won't have the time in the near future to try this out, but I will report back if I do!
Thanks for the ideas though @jander-msft, hopefully they'll be of help to others!

@Krishnadas-KP
Copy link

I have the same problem. Unable to get system.runtime metrics in the /metrics endpoint. This is my config

{
  "urls": "http://localhost:52323",
  "Kestrel": ":NOT PRESENT:",
  "Templates": ":NOT PRESENT:",
  "CollectionRuleDefaults": ":NOT PRESENT:",
  "GlobalCounter": {
    "IntervalSeconds": "5"
  },
  "CollectionRules": ":NOT PRESENT:",
  "CorsConfiguration": ":NOT PRESENT:",
  "DiagnosticPort": {
    "ConnectionMode": "Listen",
    "DeleteEndpointOnStartup": "true"
  },
  "InProcessFeatures": ":NOT PRESENT:",
  "Metrics": {
    "Enabled": "True",
    "Endpoints": "http://\u002B:52325",
    "IncludeDefaultProviders": "True",
    "MetricCount": "3"
  },
  "Storage": {
    "DefaultSharedPath": "/diag"
  },
  "DefaultProcess": {
    "Filters": [
      {
        "Key": "ProcessId",
        "Value": "1"
      }
    ]
  },
  "Logging": {
    "Console": {
      "FormatterName": "json",
      "FormatterOptions": {
        "IncludeScopes": "True",
        "TimestampFormat": "yyyy-MM-ddTHH:mm:ss.fffffffZ",
        "UseUtcTimestamp": "true"
      }
    },
    "EventLog": {
      "LogLevel": {
        "Default": "Information",
        "Microsoft": "Warning",
        "Microsoft.Diagnostics": "Information",
        "Microsoft.Hosting.Lifetime": "Information"
      }
    },
    "LogLevel": {
      "Default": "Information",
      "Microsoft": "Warning",
      "Microsoft.Diagnostics": "Information",
      "Microsoft.Hosting.Lifetime": "Information"
    }
  },
  "Authentication": ":NOT PRESENT:",
  "Egress": ":NOT PRESENT:"

I am using version 7.0.2 as a sidecar in a pod in AKS.

@sheng-jie
Copy link

I have the same issue with version 7.2. The deployment is :

apiVersion: apps/v1
kind: Deployment
metadata:
  name: akstest
  namespace: default
spec:
  replicas: 1
  selector:
    matchLabels:
      app: akstest
  template:
      labels:
        app: akstest
    spec:
      volumes:
        - name: diagvol
          emptyDir: {}
        - name: dumpsvol
          emptyDir: {}
      containers:
        - name: client-api
          image: mcr.microsoft.com/dotnet/samples:aspnetapp
          ports:
            - name: http
              containerPort: 80
              protocol: TCP
          env:
            - name: DOTNET_DiagnosticPorts
              value: /diag/port.sock
          resources:
            limits:
              cpu: 250m
              memory: 512Mi
          volumeMounts:
            - name: diagvol
              mountPath: /diag
            - name: dumpsvol
              mountPath: /dumps
        - name: monitor
          image: mcr.microsoft.com/dotnet/monitor:7.2
          args:
            - collect
            - '--urls'
            - http://+:52323
            - '--no-auth'
          ports:
            - containerPort: 52323
              protocol: TCP
          env:
            - name: DOTNETMONITOR_DiagnosticPort__ConnectionMode
              value: Listen
            - name: DOTNETMONITOR_DiagnosticPort__EndpointName
              value: /diag/port.sock
            - name: DOTNETMONITOR_Storage__DumpTempFolder
              value: /dumps
            - name: DOTNETMONITOR_Urls
              value: http://+:52323
          resources:
            limits:
              cpu: 250m
              memory: 256Mi
            requests:
              cpu: 50m
              memory: 32Mi
          volumeMounts:
            - name: diagvol
              mountPath: /diag
            - name: dumpsvol
              mountPath: /dumps

And the dotnet monitor config is :

{
  "urls": "http://\u002B:52323",
  "Kestrel": ":NOT PRESENT:",
  "Templates": ":NOT PRESENT:",
  "CollectionRuleDefaults": ":NOT PRESENT:",
  "GlobalCounter": {
    "IntervalSeconds": "5"
  },
  "CollectionRules": ":NOT PRESENT:",
  "CorsConfiguration": ":NOT PRESENT:",
  "DiagnosticPort": {
    "ConnectionMode": "Listen",
    "DeleteEndpointOnStartup": "true",
    "EndpointName": "/diag/port.sock"
  },
  "InProcessFeatures": ":NOT PRESENT:",
  "Metrics": {
    "Enabled": "True",
    "Endpoints": "http://localhost:52325",
    "IncludeDefaultProviders": "True",
    "MetricCount": "3"
  },
  "Storage": {
    "DumpTempFolder": "/dumps"
  },
  "DefaultProcess": {
    "Filters": [
      {
        "Key": "ProcessId",
        "Value": "1"
      }
    ]
  },
  "Logging": {
    "Console": {
      "FormatterName": "json",
      "FormatterOptions": {
        "IncludeScopes": "True",
        "TimestampFormat": "yyyy-MM-ddTHH:mm:ss.fffffffZ",
        "UseUtcTimestamp": "true"
      }
    },
    "EventLog": {
      "LogLevel": {
        "Default": "Information",
        "Microsoft": "Warning",
        "Microsoft.Diagnostics": "Information",
        "Microsoft.Hosting.Lifetime": "Information"
      }
    },
    "LogLevel": {
      "Default": "Information",
      "Microsoft": "Warning",
      "Microsoft.Diagnostics": "Information",
      "Microsoft.Hosting.Lifetime": "Information"
    }
  },
  "Authentication": ":NOT PRESENT:",
  "Egress": ":NOT PRESENT:"
}

@jander-msft
Copy link
Member

Please check the two questions that I've asked before: #1241 (comment)

Also, if you use any other tools that collect EventCounters and they are not collecting at the same interval as .NET Monitor (default is 5 seconds), then that will cause .NET Monitor to fail to collect them.

@kchilka-msft
Copy link

hey @jander-msft - we are using OpenTelemetry and I am not seeing metrics in Prometheus.

Although, when I curl http://localhost:52325/metrics I do see metrics getting generated for microsoftaspnetcorehosting, systemruntime and microsoftaspnetcoreserverkestrel but I don't see them in Prometheus.

I am assuming it has to do with interval mismatch with OpenTelemetry and .Net Monitor? Is the fix to make sure the interval values match for .Net Monitor and OpenTelemetry? Is there any other change that needs to be made?

Thanks in advance!

@jander-msft
Copy link
Member

Although, when I curl http://localhost:52325/metrics I do see metrics getting generated for microsoftaspnetcorehosting, systemruntime and microsoftaspnetcoreserverkestrel but I don't see them in Prometheus.

If you see systemruntime from .NET Monitor, then it is unlikely that OpenTelemetry is interfering with the collection of metrics by .NET Monitor.

I am assuming it has to do with interval mismatch with OpenTelemetry and .Net Monitor? Is the fix to make sure the interval values match for .Net Monitor and OpenTelemetry? Is there any other change that needs to be made?

.NET Monitor doesn't automatically provide information to a Promethues server; you have to configure that server to scrape the /metrics route.

I'm not certain that the interval mismatch is impacting anything within the consumption of the Prometheus metrics. @wiktork any ideas?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

8 participants