Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

container keep restarting with dotnetmonitor side car and DOTNET_DiagnosticPorts configured #2017

Closed
allanchanly opened this issue Jun 20, 2022 · 4 comments
Labels
bug Something isn't working

Comments

@allanchanly
Copy link

Description

We have a statefulset that configured with dotnetmonitor as sidecar, but after running for a few days, some container keeps restarting because of below error:
The runtime has been configured to pause during startup and is awaiting a Diagnostics IPC ResumeStartup command from a Diagnostic Port.
DOTNET_DiagnosticPorts="/diag/port"
DOTNET_DefaultDiagnosticPortSuspend=0

Configuration

Our statefulset is running in AKS, and dotnetmonitor is added as sidecar, environment variable DOTNET_DiagnosticPorts is set, DOTNET_DefaultDiagnosticPortSuspend is not set and it will get default value 0 per the dotnetmonitor document.

Regression?

In the same statefulset, most of the pods are running fine, this only happens to a few pods.

Other information

bugreport

@allanchanly allanchanly added the bug Something isn't working label Jun 20, 2022
@jander-msft
Copy link
Member

This appears to be the same issue in #1958 that you logged a few weeks ago.

I would ask that you still provide the same information that I asked for in that issue:

In the pods where this is occurring, please verify that:

  1. There is a dotnet monitor container
  2. It is configured to host its diagnostic port at /diag/port
  3. The /diag folder is mounted to the same volume between the application container and the dotnet monitor container.

If these are correct, check the logs of the dotnet monitor container to see if there was any issue.

@allanchanly
Copy link
Author

@jander-msft thanks for the response.

  1. we do have a dotnet monitor container configured
  2. /diag/port is configured
  3. mount is correct, in the same stateful set, most of the pods are running fine.

I checked today, we do see some logs with error:

image

@jander-msft
Copy link
Member

I checked today, we do see some logs with error:

image

This is likely the same issue as #1827. In your case, what's likely happening is that when you've updated or redeployed the StatefulSet, it had a stale Unix domain socket file at the same path that you are specifying for the diagnostic port. So dotnet-monitor fails to establish its server there and the target applications fail to communicate at that path and remain stuck in the "The runtime has been configured to pause during startup and is awaiting a Diagnostics IPC ResumeStartup command from a Diagnostic Port."

We have a fix (#2164) where we will enable deleting the Unix domain socket at startup before the server attempts to create a new socket file at the path. We hope to have this fix rolled out in the next dotnet-monitor update and automatically enabled in the dotnet-monitor images.

@jander-msft
Copy link
Member

jander-msft commented Aug 9, 2022

The "Address is use" issue should be fixed in the dotnet-monitor images that were released today: 6.2.2 and 7.0.0 Preview 7.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants