Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Memory leak in single InstanceContext, WmiChannels property #5171

Open
msfcolombo opened this issue Jun 2, 2023 · 8 comments
Open

Memory leak in single InstanceContext, WmiChannels property #5171

msfcolombo opened this issue Jun 2, 2023 · 8 comments
Assignees
Labels

Comments

@msfcolombo
Copy link

Describe the bug
The System.ServiceModel.InstanceContext type has an internal WmiChannels property, which holds a list of IChannel instances.
We observed increased memory usage, and upon analysis of memory dump, we found about 50gb of memory retained by that list. All observed channels are in faulted state, and our application automatically recycles a channel when it detects that a channel became faulty. Browsing the code in this repository, it appears that the list only grows. Channels are added in this line of code, but I could not find any place where channels are removed.

To Reproduce

  1. Create a service that uses a single InstanceContext for the entire application, as documented here.
  2. Create some WCF endpoint that times out whenever it receives a call (make it sleep for 10s). This is required to put the client channel in faulted state.
  3. Create an infinite loop that creates a channel with a client timeout of 2s, calls the above endpoint it and ignores the timeout exception. Make sure your channel creation options trigger this line of code.

The bug should repro even if the faulted channel is disposed. Just make sure the reference is cleared (i.e. reuse the reference on each iteration of the loop).

Expected behavior
As the loop executes, memory usage should eventually stabilize. It must not grow forever.

@mconnew
Copy link
Member

mconnew commented Jun 6, 2023

@HongGit, the fix is to remove the InstanceContext.WmiChannels property and its backing field, and to remove the if block which adds to it in CreateChannel<TChannel>(EndpointAddress address, Uri via). Code locations are here, here, and here.

@msfcolombo
Copy link
Author

@mconnew Thanks for having a look on this. Any chance for the fix to be applied in NET472 or at least NET48? These FX versions are also prone to the issue.

@mconnew
Copy link
Member

mconnew commented Jun 9, 2023

There shouldn't be a problem on .NET Framework. The problem was caused be a partial port of a WMI feature to the Core client which missed the cleanup code. The fix will be to remove the code in Core as we don't have WMI support there. Can you confirm you are definitely seeing the same memory leak on .NET Framework?

@msfcolombo
Copy link
Author

The leak was found in NET472. Below are top retained bytes:

Type, Objects, Bytes, Minimum retained bytes
System.Runtime.Remoting.Proxies.__TransparentProxy, 60917, 3411352, 79312908863
System.ServiceModel.Channels.ServiceChannelProxy, 60917, 6822704, 79309497511
System.Collections.Generic.List, 60902, 2436080, 66748825392
System.ServiceModel.Channels.IChannel[], 60900, 3934784, 66748708376
System.ServiceModel.InstanceContext, 3, 600, 66748545328
System.Collections.Generic.SynchronizedCollection, 1, 32, 66748544656
System.ServiceModel.Channels.ServiceChannel, 63751, 17850280, 32547219072

Note that there are 3 System.ServiceModel.InstanceContext and only one System.Collections.Generic.SynchronizedCollection<IChannel>. That one collection belongs to one of the InstanceContext.

I've noticed that NET48 indeed contains the following cleanup code on ServiceChannel.cs, line 1552:

    if (this.WmiInstanceContext != null)
    {
        this.WmiInstanceContext.WmiChannels.Remove((IChannel)this.proxy);
    }

It is not clear that these lines exist on NET472. One thing I know for sure is that our memory dump shows no reference for the WmiInstanceContext property in the ServiceChannel objects. Here's a sample ServiceChannel instance:

 {
  "@ref": "0x00000281593711f0",
  "@type": "System.ServiceModel.Channels.ServiceChannel",
  "state": 2,
  "aborted": false,
  "closeCalled": false,
  "onClosingCalled": false,
  "onClosedCalled": false,
  "onOpeningCalled": true,
  "onOpenedCalled": true,
  "raisedClosed": false,
  "raisedClosing": false,
  "raisedFaulted": false,
  "traceOpenAndClose": false,
  "allowInitializationUI": true,
  "allowOutputBatching": false,
  "activityCount": 1,
  "autoClose": true,
  "closeBinder": true,
  "closeFactory": false,
  "didInteractiveInitialization": true,
  "doneReceiving": false,
  "explicitlyOpened": false,
  "hasSession": false,
  "isPending": false,
  "isReplyChannel": false,
  "openBinder": true,
  "hasChannelStartedAutoClosing": false,
  "hasIncrementedBusyCount": false,
  "hasCleanedUpChannelCollections": false,
  "operationTimeout._ticks": 600000000,
  "mutex": {
    "@ref": "0x0000028159371308",
    "@type": "System.Object"
  },
  "eventSender": {
    "@ref": "0x0000028159371408",
    "@type": "System.Runtime.Remoting.Proxies.__TransparentProxy",
    "_rp": {
      "@ref": "0x0000028159371398",
      "@type": "System.ServiceModel.Channels.ServiceChannelProxy"
    },
    "_stubData": {
      "@ref": "0x0000027066cf6430",
      "@type": "System.IntPtr"
    }
  },
  "Closed": {
    "@ref": "0x0000028159371580",
    "@type": "System.EventHandler",
    "_target": {
      "@ref": "0x000002815d389308",
      "@type": "redacted"
    }
  },
  "autoDisplayUIManager": {
    "@ref": "0x00000281593716d0",
    "@type": "System.ServiceModel.Channels.ServiceChannel+CallOnceManager",
    "isFirst": false,
    "callOnce": {
      "@ref": "0x0000027066cf9108",
      "@type": "System.ServiceModel.Channels.ServiceChannel+CallDisplayUIOnce"
    },
    "channel": {
      "@ref": "0x00000281593711f0",
      "@type": "System.ServiceModel.Channels.ServiceChannel"
    }
  },
  "autoOpenManager": {
    "@ref": "0x00000281593716a0",
    "@type": "System.ServiceModel.Channels.ServiceChannel+CallOnceManager",
    "isFirst": false,
    "callOnce": {
      "@ref": "0x0000027066cf90a8",
      "@type": "System.ServiceModel.Channels.ServiceChannel+CallOpenOnce"
    },
    "channel": {
      "@ref": "0x00000281593711f0",
      "@type": "System.ServiceModel.Channels.ServiceChannel"
    }
  },
  "binder": {
    "@ref": "0x0000028159370f08",
    "@type": "System.ServiceModel.Dispatcher.RequestChannelBinder",
    "channel": {
      "@ref": "0x0000028159371068",
      "@type": "System.ServiceModel.Channels.SecurityChannelFactory+SecurityRequestChannel<IRequestChannel>"
    }
  },
  "clientRuntime": {
    "@ref": "0x0000028155fe5be0",
    "@type": "System.ServiceModel.Dispatcher.ClientRuntime",
    "maxFaultSize": 65536,
    "addTransactionFlowProperties": false,
    "useSynchronizationContext": true,
    "messageVersionNoneFaultsEnabled": false,
    "messageInspectors": {
      "@ref": "0x0000028155fe5db0",
      "@type": "System.ServiceModel.Dispatcher.ClientRuntime+ProxyBehaviorCollection<IClientMessageInspector>"
    },
    "operations": {
      "@ref": "0x0000028155fe5c90",
      "@type": "System.ServiceModel.Dispatcher.ClientRuntime+OperationCollection"
    },
    "compatOperations": {
      "@ref": "0x0000028155fe5cf8",
      "@type": "System.ServiceModel.Dispatcher.ClientRuntime+OperationCollectionWrapper"
    },
    "channelInitializers": {
      "@ref": "0x0000028155fe5d60",
      "@type": "System.ServiceModel.Dispatcher.ClientRuntime+ProxyBehaviorCollection<IChannelInitializer>"
    },
    "contractName": "redacted",
    "contractNamespace": "redacted",
    "contractProxyType": {
      "@ref": "0x0000026c2aaf6930",
      "@type": "System.RuntimeType"
    },
    "identityVerifier": {
      "@ref": "0x0000027066cf63a8",
      "@type": "System.ServiceModel.Security.IdentityVerifier+DefaultIdentityVerifier"
    },
    "interactiveChannelInitializers": {
      "@ref": "0x0000028155fe5e00",
      "@type": "System.ServiceModel.Dispatcher.ClientRuntime+ProxyBehaviorCollection<IInteractiveChannelInitializer>"
    },
    "operationSelector": {
      "@ref": "0x000002815c476618",
      "@type": "System.ServiceModel.Dispatcher.OperationSelectorBehavior+MethodInfoOperationSelector"
    },
    "runtime": {
      "@ref": "0x0000028159371440",
      "@type": "System.ServiceModel.Dispatcher.ImmutableClientRuntime"
    },
    "unhandled": {
      "@ref": "0x0000028155fe5e50",
      "@type": "System.ServiceModel.Dispatcher.ClientOperation"
    },
    "shared": {
      "@ref": "0x0000028155fe5c78",
      "@type": "System.ServiceModel.Dispatcher.SharedRuntimeState"
    }
  },
  "factory": {
    "@ref": "0x0000028159349f20",
    "@type": "System.ServiceModel.Channels.ServiceChannelFactory+ServiceChannelFactoryOverRequest",
    "state": 2,
    "aborted": false,
    "closeCalled": false,
    "onClosingCalled": false,
    "onClosedCalled": false,
    "onOpeningCalled": true,
    "onOpenedCalled": true,
    "raisedClosed": false,
    "raisedClosing": false,
    "raisedFaulted": false,
    "traceOpenAndClose": false,
    "closeTimeout._ticks": 600000000,
    "openTimeout._ticks": 600000000,
    "receiveTimeout._ticks": 6000000000,
    "sendTimeout._ticks": 600000000,
    "mutex": {
      "@ref": "0x0000028159798c28",
      "@type": "System.Object"
    },
    "eventSender": {
      "@ref": "0x0000028159349f20",
      "@type": "System.ServiceModel.Channels.ServiceChannelFactory+ServiceChannelFactoryOverRequest"
    },
    "bindingName": "WS2007FederationHttpBinding",
    "channelsList": {
      "@ref": "0x0000028159798c40",
      "@type": "System.Collections.Generic.List<IChannel>"
    },
    "clientRuntime": {
      "@ref": "0x0000028155fe5be0",
      "@type": "System.ServiceModel.Dispatcher.ClientRuntime"
    },
    "requestReplyCorrelator": {
      "@ref": "0x0000028159798b60",
      "@type": "System.ServiceModel.Channels.RequestReplyCorrelator"
    },
    "timeouts": {
      "@ref": "0x0000028159798c68",
      "@type": "System.ServiceModel.Channels.ServiceChannelFactory+DefaultCommunicationTimeouts"
    },
    "messageVersion": {
      "@ref": "0x0000026ea6ea3230",
      "@type": "System.ServiceModel.Channels.MessageVersion"
    },
    "innerChannelFactory": {
      "@ref": "0x0000028159798350",
      "@type": "System.ServiceModel.Channels.SecurityChannelFactory<IRequestChannel>"
    }
  },
  "localAddress": {
    "@ref": "0x0000027066cf5770",
    "@type": "System.ServiceModel.EndpointAddress",
    "extensionSection": -1,
    "metadataSection": -1,
    "pspSection": -1,
    "isAnonymous": true,
    "isNone": false,
    "headers": {
      "@ref": "0x0000026ce788f170",
      "@type": "System.ServiceModel.Channels.AddressHeaderCollection"
    },
    "uri": {
      "@ref": "0x0000026f66d3de48",
      "@type": "System.Uri"
    }
  },
  "messageVersion": {
    "@ref": "0x0000026ea6ea3230",
    "@type": "System.ServiceModel.Channels.MessageVersion",
    "envelope": {
      "@ref": "0x0000026ea6ea1e20",
      "@type": "System.ServiceModel.EnvelopeVersion"
    },
    "addressing": {
      "@ref": "0x0000026ea6ea2db8",
      "@type": "System.ServiceModel.Channels.AddressingVersion"
    }
  },
  "proxy": {
    "@ref": "0x0000028159371408",
    "@type": "System.Runtime.Remoting.Proxies.__TransparentProxy",
    "@comment": "written above"
  }
}

My conclusion is that NET472 does not have the ServiceChannel.WmiInstanceContext, and therefore it cannot perform the cleanup the same way NET48 does.

@mconnew
Copy link
Member

mconnew commented Jun 9, 2023

I think your problem on .NET Framework is you aren't calling Abort on the channel after it's faulted. This results in the ChannelFactory still holding a reference and will look like a leak. When you close or abort a channel it removes the tracking from the factory. This is separate from the issue you are seeing with the WCF Client packages.

@msfcolombo
Copy link
Author

msfcolombo commented Jun 12, 2023

@mconnew

It might be noteworthy that the memory dump says explicitlyOpened: false. We are certainly calling Abort() for the channels we explicitly create. This is the code:

    private static void CloseCommunicationObject(ICommunicationObject communicationObject)
    {
      switch (communicationObject.State)
      {
        case CommunicationState.Opened:
          try
          {
            communicationObject.Close();
            break;
          }
          catch (CommunicationException ex)
          {
            communicationObject.Abort();
            break;
          }
          catch (TimeoutException ex)
          {
            communicationObject.Abort();
            break;
          }
        case CommunicationState.Faulted:
          communicationObject.Abort();
          break;
        default:
          communicationObject.Abort();
          break;
      }
    }

Note that if Close() does not throw, or if it throws something other than CommunicationException or TimeoutException, then we don't call Abort(). Would that be a problem?

We are NOT calling Abort() for inbound channels, or otherwise "non-explicitly" opened channels. I'm not sure if that makes sense, though. I'm just saying that because the issue is related to using a single InstanceContext.

@HongGit
Copy link
Contributor

HongGit commented Oct 29, 2024

@afifi-ins can you please try to see if this repro on .NET Framework?

@mconnew
Copy link
Member

mconnew commented Nov 21, 2024

The code around WmiInstanceContext hasn't been changed since at least .NET 4.5, if not earlier. We've tried to reproduce what you're reporting on .NET Framework and haven't been able to do so. If there was a simple leak in a common scenario here, it would have been reported a long time ago. There must be something unusual that you are doing that is causing this. The code which populates ServiceChannel.WmiInstanceContext should only be reachable if you are instantiating new channel instances from within a callback operation. Can you provide a minimal repro app for .NET Framework which exhibits the leak you are seeing. My suspicion is that you aren't cleaning up your channels correctly and it's left over timers that are holding on to a reference (e.g. receive timeout will close the channel if nothing has been received for a while) and when they fire, the reference will be gone. Properly cleaning up your channels will clean up this timer reference. I suspect the solution will be a change in your channel lifetime management. But without a concrete repro app demonstrating the issue, there's nothing we can do to investigate further.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants