Possible memory leak #1110

waeljammal · 2024-05-07T09:54:36Z

Hi,

We are using clustering with consul and there seems to be a memory leak with gossip. I started new instances and left them idle for a short while (10 mins) and saw the following in parca.

It keeps accumulating entries in ConcurrentMap and never releases, I've left it running for up to an hour with the same result, memory just keeps increasing.

We do not send anything over gossip so this is purely internal to clustering and we do not subscribe to anything on the cluster gossip.

It seems to eventually flatten out though but after creating over 50K+ objects, not sure what it's allocating here have not looking into it but that concurrent map keeps growing for some time and eats up a decent chunk of memory. I did have a quick look in a debugger but all I could see were 32 entries each containing 0 items and a rw mutex so could not figure out where these in use allocations are going.

We are initializing the cluster like so, basically defaults:

	config := remote.Configure(a.opts.Config.BindAddress, a.opts.Config.BindPort)
	config.AdvertisedHost = fmt.Sprintf("%s:%d", a.opts.Config.AdvertiseHost, a.opts.Config.BindPort)

	if provider, err = consul.NewWithConfig(&api.Config{
		Address: address,
	}); err != nil {
		return err
	}
 
	clusterConfig := cluster.Configure(a.opts.ClusterName, provider, lookup, config, cluster.WithKinds(a.kinds...))
	clusterConfig.RequestTimeoutTime = time.Second * 30

	c := cluster.New(a.as, clusterConfig)
	a.cluster = c
	c.StartMember()

The text was updated successfully, but these errors were encountered:

rogeralsing · 2024-05-07T15:20:10Z

By the looks of it from the screenshots,
It seems like the Future processes are not cleared out from the local ProcessRegistry.

One thing that caught my eye is this line:
clusterConfig.RequestTimeoutTime = time.Second * 30

Does it look the same if you set that to say 5 seconds? does it flatten out earlier then?

One possible issue could be that futures are not cleared until the timeout expires, even if completed successfully.
I´m not saying this is the case, but if we have such a bug, then it would likely manifest this way.

Another possibility might be if the ConcurrentMap do keep the already allocated size even when entries are removed.

We will have to look deeper into all this.
Any more data from your side would be much appreciated.

lrweck · 2024-05-10T02:03:01Z

I've seen the same behaviour. From what I could gather, it is the second option (ConcurrentMap keeps already allocated size and does not decrease).

waeljammal · 2024-05-21T12:41:01Z

Hi, sorry for the late response, I've been away. I'll give your recommendation a try but seems Irweck thinks this might have something to do with ConcurrentMap but I'll give it a shit either way and report back.

waeljammal · 2024-05-22T08:22:46Z

It still happens after reducing the RequestTimeoutTime to 5 seconds, memory usage keeps going up same as before.

lrweck · 2024-07-31T19:09:01Z

@rogeralsing have you had the time to check if the memory increase is indeed from ConcurrentMap?

rogeralsing added the investigate label May 7, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Possible memory leak #1110

Possible memory leak #1110

waeljammal commented May 7, 2024 •

edited

Loading

rogeralsing commented May 7, 2024

lrweck commented May 10, 2024

waeljammal commented May 21, 2024

waeljammal commented May 22, 2024

lrweck commented Jul 31, 2024 •

edited

Loading

Possible memory leak #1110

Possible memory leak #1110

Comments

waeljammal commented May 7, 2024 • edited Loading

rogeralsing commented May 7, 2024

lrweck commented May 10, 2024

waeljammal commented May 21, 2024

waeljammal commented May 22, 2024

lrweck commented Jul 31, 2024 • edited Loading

waeljammal commented May 7, 2024 •

edited

Loading

lrweck commented Jul 31, 2024 •

edited

Loading