Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"System.Net.Sockets.SocketException (98): Address in use" exception during rolling deployment #1827

Closed
joelnotified opened this issue May 1, 2022 · 8 comments
Assignees
Labels
bug Something isn't working

Comments

@joelnotified
Copy link

I've got the following yaml configured for an application running in an AKS cluster, with dotnet-monitor as a sidecar. It runs fine, but during a rolling deployment (of a new application version), we got the following error: System.Net.Sockets.SocketException (98): Address in use. I'm attaching the logs as well.

Not sure if this is actually reproduced on each rolling deployment. An engineer on my team notified me and removed dotnet-monitor, so I can't really test in under the same circumstances right now. I think the issue was that it actually prevented the old application instance to be removed.

Is this something you have seen or do you understand by the configuration what's going on? Looks like it's when it shutting down that it's actually getting the exception? 🤔

(I know we shouldn't use --no-auth and expose the port, but we're just evaluating it in a test environment right now)

apiVersion: apps/v1
kind: Deployment
metadata:
  name: itemsservice
  labels:
    app: itemsservice
spec:
  replicas: 3
  selector:
    matchLabels:
      app: itemsservice
  strategy:
    type: RollingUpdate
    rollingUpdate:
      maxUnavailable: 0
      maxSurge: 1
  template:
    metadata:
      labels:
        app: itemsservice
      annotations:
        prometheus.io/scrape: "true"
        prometheus.io/port: "9102"
    spec:
      volumes:
        - name: diagnostics
          emptyDir: {}
        - name: dumps
          emptyDir: {}
        - name: dotnet-monitor-config
          configMap:
            name: dotnet-monitor-config
      containers:
        - name: itemsservice
          image: <redacted>
          ports:
            - containerPort: 80
            - containerPort: 9102
          env:
            - name: DOTNET_DiagnosticPorts
              value: /diag/port
          volumeMounts:
            - name: diagnostics
              mountPath: /diag
            - name: dumps
              mountPath: /dumps
          resources:
            requests:
              memory: "256Mi"
              cpu: "50m"
            limits:
              memory: "2Gi"
          startupProbe:
            httpGet:
              path: /health
              port: 80
            timeoutSeconds: 10
            failureThreshold: 6
            periodSeconds: 10
          livenessProbe:
            httpGet:
              path: /health
              port: 80
            initialDelaySeconds: 0
            timeoutSeconds: 10
            failureThreshold: 3
            periodSeconds: 30
        - name: dotnet-monitor
          image: mcr.microsoft.com/dotnet/monitor:6.1.0
          ports:
            - containerPort: 52323
          args: ["--no-auth"]
          env:
            - name: DOTNETMONITOR_DiagnosticPort__ConnectionMode
              value: Listen
            - name: DOTNETMONITOR_DiagnosticPort__EndpointName
              value: /diag/port
            - name: DOTNETMONITOR_Storage__DumpTempFolder
              value: /dumps
            - name: DOTNETMONITOR_Urls
              value: http://localhost:52323
          volumeMounts:
            - name: diagnostics
              mountPath: /diag
            - name: dumps
              mountPath: /dumps
            - name: dotnet-monitor-config
              mountPath: /etc/dotnet-monitor
          resources:
            requests:
              cpu: 50m
              memory: 32Mi
            limits:
              cpu: 250m
              memory: 256Mi
---
apiVersion: v1
kind: ConfigMap
metadata:
  name: dotnet-monitor-config
data:
  # Configure an Egress called "artifacts", this is where files will be stored
  Egress__FileSystem__artifacts__directoryPath: "/artifacts"
  # Configure a simple collection rule
  CollectionRules__HighMemoryUsage__Trigger__Type: "EventCounter"
  CollectionRules__HighMemoryUsage__Trigger__Settings__ProviderName: "System.Runtime"
  CollectionRules__HighMemoryUsage__Trigger__Settings__CounterName: "working-set"
  CollectionRules__HighMemoryUsage__Trigger__Settings__GreaterThan: "1000"
  CollectionRules__HighMemoryUsage__Trigger__Settings__SlidingWindowDuration: "00:00:10"
  CollectionRules__HighMemoryUsage__Actions__0__Type: "CollectDump"
  CollectionRules__HighMemoryUsage__Actions__0__Settings__Type: "Full"
  CollectionRules__HighMemoryUsage__Actions__0__Settings__Egress: "artifacts"
  CollectionRules__HighMemoryUsage__Limits__ActionCount: "3"
  CollectionRules__HighMemoryUsage__Limits__ActionCountSlidingWindowDuration: "00:30:00"
2022-04-27T07:08:54.009604788Z {"Timestamp":"2022-04-27T07:08:54.0014026Z","EventId":25,"LogLevel":"Information","Category":"Microsoft.Diagnostics.Tools.Monitor.MonitorApiKeyConfigurationObserver","Message":"MonitorApiKey settings have changed. The new settings have passed validation.","State":{"Message":"MonitorApiKey settings have changed. The new settings have passed validation.","apiAuthenticationConfigKey":"MonitorApiKey","{OriginalFormat}":"{apiAuthenticationConfigKey} settings have changed. The new settings have passed validation."},"Scopes":[]}
2022-04-27T07:08:54.095200656Z {"Timestamp":"2022-04-27T07:08:54.0947611Z","EventId":13,"LogLevel":"Warning","Category":"Microsoft.Diagnostics.Tools.Monitor.Startup","Message":"WARNING: Authentication has been disabled. This can pose a security risk and is not intended for production environments.","State":{"Message":"WARNING: Authentication has been disabled. This can pose a security risk and is not intended for production environments.","{OriginalFormat}":"WARNING: Authentication has been disabled. This can pose a security risk and is not intended for production environments."},"Scopes":[]}
2022-04-27T07:08:54.414546382Z {"Timestamp":"2022-04-27T07:08:54.4140678Z","EventId":0,"LogLevel":"Warning","Category":"Microsoft.AspNetCore.Server.Kestrel","Message":"Overriding address(es) \u0027http://localhost:52323\u0027. Binding to endpoints defined via IConfiguration and/or UseKestrel() instead.","State":{"Message":"Overriding address(es) \u0027http://localhost:52323\u0027. Binding to endpoints defined via IConfiguration and/or UseKestrel() instead.","addresses":"http://localhost:52323","{OriginalFormat}":"Overriding address(es) \u0027{addresses}\u0027. Binding to endpoints defined via IConfiguration and/or UseKestrel() instead."},"Scopes":[]}
2022-04-27T07:08:54.498664419Z {"Timestamp":"2022-04-27T07:08:54.4983599Z","EventId":14,"LogLevel":"Information","Category":"Microsoft.Hosting.Lifetime","Message":"Now listening on: http://localhost:52323","State":{"Message":"Now listening on: http://localhost:52323","address":"http://localhost:52323","{OriginalFormat}":"Now listening on: {address}"},"Scopes":[]}
2022-04-27T07:08:54.498699119Z {"Timestamp":"2022-04-27T07:08:54.4984881Z","EventId":14,"LogLevel":"Information","Category":"Microsoft.Hosting.Lifetime","Message":"Now listening on: http://:52325","State":{"Message":"Now listening on: http://:52325","address":"http://:52325","{OriginalFormat}":"Now listening on: {address}"},"Scopes":[]}
2022-04-27T07:08:54.790658425Z {"Timestamp":"2022-04-27T07:08:54.7044277Z","EventId":9,"LogLevel":"Error","Category":"Microsoft.Extensions.Hosting.Internal.Host","Message":"BackgroundService failed","Exception":"System.Threading.Tasks.TaskCanceledException: A task was canceled.    at Microsoft.Diagnostics.Monitoring.WebApi.MetricsService.ExecuteAsync(CancellationToken stoppingToken) in /_/src/Microsoft.Diagnostics.Monitoring.WebApi/Metrics/MetricsService.cs:line 71    at Microsoft.Extensions.Hosting.Internal.Host.TryExecuteBackgroundServiceAsync(BackgroundService backgroundService)","State":{"Message":"BackgroundService failed","{OriginalFormat}":"BackgroundService failed"},"Scopes":[]}
2022-04-27T07:08:54.791091621Z {"Timestamp":"2022-04-27T07:08:54.7907751Z","EventId":10,"LogLevel":"Critical","Category":"Microsoft.Extensions.Hosting.Internal.Host","Message":"The HostOptions.BackgroundServiceExceptionBehavior is configured to StopHost. A BackgroundService has thrown an unhandled exception, and the IHost instance is stopping. To avoid this behavior, configure this to Ignore; however the BackgroundService will not be restarted.","Exception":"System.Threading.Tasks.TaskCanceledException: A task was canceled.    at Microsoft.Diagnostics.Monitoring.WebApi.MetricsService.ExecuteAsync(CancellationToken stoppingToken) in /_/src/Microsoft.Diagnostics.Monitoring.WebApi/Metrics/MetricsService.cs:line 71    at Microsoft.Extensions.Hosting.Internal.Host.TryExecuteBackgroundServiceAsync(BackgroundService backgroundService)","State":{"Message":"The HostOptions.BackgroundServiceExceptionBehavior is configured to StopHost. A BackgroundService has thrown an unhandled exception, and the IHost instance is stopping. To avoid this behavior, configure this to Ignore; however the BackgroundService will not be restarted.","{OriginalFormat}":"The HostOptions.BackgroundServiceExceptionBehavior is configured to StopHost. A BackgroundService has thrown an unhandled exception, and the IHost instance is stopping. To avoid this behavior, configure this to Ignore; however the BackgroundService will not be restarted."},"Scopes":[]}
2022-04-27T07:08:54.791215820Z {"Timestamp":"2022-04-27T07:08:54.7911326Z","EventId":0,"LogLevel":"Information","Category":"Microsoft.Hosting.Lifetime","Message":"Application is shutting down...","State":{"Message":"Application is shutting down...","{OriginalFormat}":"Application is shutting down..."},"Scopes":[]}
2022-04-27T07:08:54.799149138Z Unhandled exception: System.AggregateException: One or more errors occurred. (Address in use)
2022-04-27T07:08:54.799185038Z  ---> System.Net.Sockets.SocketException (98): Address in use
2022-04-27T07:08:54.799192638Z    at System.Net.Sockets.Socket.UpdateStatusAfterSocketErrorAndThrowException(SocketError error, String callerName)
2022-04-27T07:08:54.799196138Z    at System.Net.Sockets.Socket.DoBind(EndPoint endPointSnapshot, SocketAddress socketAddress)
2022-04-27T07:08:54.799199238Z    at System.Net.Sockets.Socket.Bind(EndPoint localEP)
2022-04-27T07:08:54.799205038Z    at Microsoft.Diagnostics.NETCore.Client.IpcUnixDomainSocket.Bind(IpcUnixDomainSocketEndPoint localEP)
2022-04-27T07:08:54.799209538Z    at Microsoft.Diagnostics.NETCore.Client.IpcUnixDomainSocketServerTransport.CreateNewSocketServer()
2022-04-27T07:08:54.799214138Z    at Microsoft.Diagnostics.NETCore.Client.IpcUnixDomainSocketServerTransport..ctor(String path, Int32 backlog, IIpcServerTransportCallbackInternal transportCallback)
2022-04-27T07:08:54.799219138Z    at Microsoft.Diagnostics.NETCore.Client.IpcServerTransport.Create(String address, Int32 maxConnections, Boolean enableTcpIpProtocol, IIpcServerTransportCallbackInternal transportCallback)
2022-04-27T07:08:54.799224838Z    at Microsoft.Diagnostics.NETCore.Client.ReversedDiagnosticsServer.ListenAsync(Int32 maxConnections, CancellationToken token)
2022-04-27T07:08:54.799228737Z    at Microsoft.Diagnostics.NETCore.Client.ReversedDiagnosticsServer.DisposeAsync()
2022-04-27T07:08:54.799233737Z    --- End of inner exception stack trace ---
2022-04-27T07:08:54.799237237Z    at System.Threading.Tasks.Task.ThrowIfExceptional(Boolean includeTaskCanceledExceptions)
2022-04-27T07:08:54.799241337Z    at System.Threading.Tasks.Task.Wait(Int32 millisecondsTimeout, CancellationToken cancellationToken)
2022-04-27T07:08:54.799244837Z    at System.Threading.Tasks.Task.Wait()
2022-04-27T07:08:54.799248037Z    at Microsoft.Diagnostics.NETCore.Client.ReversedDiagnosticsServer.Start(Int32 maxConnections)
2022-04-27T07:08:54.799251337Z    at Microsoft.Diagnostics.Tools.Monitor.ServerEndpointInfoSource.ExecuteAsync(CancellationToken stoppingToken) in /_/src/Tools/dotnet-monitor/EndpointInfo/ServerEndpointInfoSource.cs:line 87
2022-04-27T07:08:54.799254637Z    at Microsoft.Diagnostics.Tools.Monitor.ServerEndpointInfoSource.ExecuteAsync(CancellationToken stoppingToken) in /_/src/Tools/dotnet-monitor/EndpointInfo/ServerEndpointInfoSource.cs:line 89
2022-04-27T07:08:54.799257837Z    at Microsoft.Extensions.Hosting.Internal.Host.StartAsync(CancellationToken cancellationToken)
2022-04-27T07:08:54.799262537Z    at Microsoft.Diagnostics.Tools.Monitor.Commands.CollectCommandHandler.Invoke(CancellationToken token, String[] urls, String[] metricUrls, Boolean metrics, String diagnosticPort, Boolean noAuth, Boolean tempApiKey, Boolean noHttpEgress) in /_/src/Tools/dotnet-monitor/Commands/CollectCommandHandler.cs:line 35
2022-04-27T07:08:54.799280637Z    at Microsoft.Diagnostics.Tools.Monitor.Commands.CollectCommandHandler.Invoke(CancellationToken token, String[] urls, String[] metricUrls, Boolean metrics, String diagnosticPort, Boolean noAuth, Boolean tempApiKey, Boolean noHttpEgress) in /_/src/Tools/dotnet-monitor/Commands/CollectCommandHandler.cs:line 66
2022-04-27T07:08:54.799284137Z    at System.CommandLine.Invocation.CommandHandler.GetResultCodeAsync(Object value, InvocationContext context)
2022-04-27T07:08:54.799288237Z    at System.CommandLine.Invocation.ModelBindingCommandHandler.InvokeAsync(InvocationContext context)
2022-04-27T07:08:54.799292037Z    at System.CommandLine.Invocation.InvocationPipeline.<>c__DisplayClass4_0.<<BuildInvocationChain>b__0>d.MoveNext()
2022-04-27T07:08:54.799295337Z --- End of stack trace from previous location ---
2022-04-27T07:08:54.799298637Z    at System.CommandLine.Builder.CommandLineBuilderExtensions.<>c.<<UseParseErrorReporting>b__21_0>d.MoveNext()
2022-04-27T07:08:54.799302137Z --- End of stack trace from previous location ---
2022-04-27T07:08:54.799305937Z    at System.CommandLine.Builder.CommandLineBuilderExtensions.<>c__DisplayClass16_0.<<UseHelp>b__0>d.MoveNext()
2022-04-27T07:08:54.799309337Z --- End of stack trace from previous location ---
2022-04-27T07:08:54.799312637Z    at System.CommandLine.Builder.CommandLineBuilderExtensions.<>c__DisplayClass25_0.<<UseVersionOption>b__0>d.MoveNext()
2022-04-27T07:08:54.799316137Z --- End of stack trace from previous location ---
2022-04-27T07:08:54.799319737Z    at System.CommandLine.Builder.CommandLineBuilderExtensions.<>c__DisplayClass23_0.<<UseTypoCorrections>b__0>d.MoveNext()
2022-04-27T07:08:54.799323136Z --- End of stack trace from previous location ---
2022-04-27T07:08:54.799326636Z    at System.CommandLine.Builder.CommandLineBuilderExtensions.<>c.<<UseSuggestDirective>b__22_0>d.MoveNext()
2022-04-27T07:08:54.799330136Z --- End of stack trace from previous location ---
2022-04-27T07:08:54.799333336Z    at System.CommandLine.Builder.CommandLineBuilderExtensions.<>c.<<UseParseDirective>b__20_0>d.MoveNext()
2022-04-27T07:08:54.799336936Z --- End of stack trace from previous location ---
2022-04-27T07:08:54.799350136Z    at System.CommandLine.Builder.CommandLineBuilderExtensions.<>c.<<UseDebugDirective>b__11_0>d.MoveNext()
2022-04-27T07:08:54.799354136Z --- End of stack trace from previous location ---
2022-04-27T07:08:54.799357436Z    at System.CommandLine.Builder.CommandLineBuilderExtensions.<>c.<<RegisterWithDotnetSuggest>b__10_0>d.MoveNext()
2022-04-27T07:08:54.799360836Z --- End of stack trace from previous location ---
2022-04-27T07:08:54.799364536Z    at System.CommandLine.Builder.CommandLineBuilderExtensions.<>c__DisplayClass14_0.<<UseExceptionHandler>b__0>d.MoveNext()
2022-04-27T07:09:41.733591Z    Stream closed EOF for services/itemsservice-84b855fcd5-jjph8 (dotnet-monitor)
@joelnotified joelnotified added the bug Something isn't working label May 1, 2022
@SebastianStehle
Copy link

I have a similar issue: My processes crashed when it tried to make a gcdump. When it tried to restart the container it failed with this exception. I had to remove dotnet-monitor.

@wiktork
Copy link
Member

wiktork commented Jun 15, 2022

We were not able to reproduce this issue with rolling deployments, but there is a suggested workaround, which is to use a unique socket address for each pod. This would look something like the sample below

      containers:
        - name: itemsservice
          image: <redacted>
          ports:
            - containerPort: 80
            - containerPort: 9102
          env:
            - name: PODNAME
                 valueFrom:
                   fieldRef:
                     fieldPath: metadata.name   
            - name: DOTNET_DiagnosticPorts
              value: /diag/port-$(PODNAME)

          - name: DiagnosticPort__EndpointName 

            
####
          env:
             - name: PODNAME
                  valueFrom:
                    fieldRef:
                      fieldPath: metadata.name   
            - name: DOTNETMONITOR_DiagnosticPort__ConnectionMode
              value: Listen
            - name: DOTNETMONITOR_DiagnosticPort__EndpointName
              value: /diag/port-$(PODNAME)

@joelnotified
Copy link
Author

Thank you @wiktork! Will try that out 😊

@kewillford
Copy link

We are hitting this issue as well but we are attempting to run dotnet-monitor as a daemonset so the suggested workaround will not work.

@bcarthic
Copy link

@wiktork We are using /tmp/port.sock as Diagnostic port value and sharing the /tmp folder. If the folder only stays for the lifecycle of the pod how come, giving the pod name would work in this scenario.

@bcarthic
Copy link

@joelnotified, is this issue resolved after appending the pod name or making it unique?

@joelnotified
Copy link
Author

@joelnotified, is this issue resolved after appending the pod name or making it unique?

I can't tell for sure. I haven't run into the issue again, but I haven't tested it that much either. Sorry.

@jander-msft
Copy link
Member

jander-msft commented Aug 9, 2022

The "Address is use" issue should be fixed in the dotnet-monitor images that were released today: 6.2.2 and 7.0.0 Preview 7.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

7 participants