Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Idle pool connections use excessive memory #1110

Closed
atombender opened this issue Nov 11, 2021 · 6 comments
Closed

Idle pool connections use excessive memory #1110

atombender opened this issue Nov 11, 2021 · 6 comments

Comments

@atombender
Copy link
Contributor

atombender commented Nov 11, 2021

We are currently investing excessive memory usage in our application. Some profiling leads us to pgx.Conn, which seems to hold onto some memory even after the connection is released back to the pool.

Here is a repo with a reproducible case.

Some debugging shows that extendedQueryBuilder keeps some slices around that rarely get resized:

	if eqb.resetCount%128 == 0 {
		if cap(eqb.paramValues) > 64 {
			eqb.paramValues = make([][]byte, 0, cap(eqb.paramValues)/2)
		}

		if cap(eqb.paramValueBytes) > 256 {
			eqb.paramValueBytes = make([]byte, 0, cap(eqb.paramValueBytes)/2)
		}

		if cap(eqb.paramFormats) > 64 {
			eqb.paramFormats = make([]int16, 0, cap(eqb.paramFormats)/2)
		}
		if cap(eqb.resultFormats) > 64 {
			eqb.resultFormats = make([]int16, 0, cap(eqb.resultFormats)/2)
		}
	}

We have a process that pumps ~500MB of data through a single connection. Then the connection is released, but the above logic would mean that it would subsequently take several hundred Query() (which calls Reset()) calls for the slices to be reallocated.

From what I can tell, it is actually *pgx.Conn (not *pgconn.Conn) that is pooled, so when a connection is released to the pool, it's stored with the query builder as-is, together with things like the map of prepared statements. I don't see a mechanism whereby the pool "thins" a connection once released.

We're not seeing just one connection holding onto memory, either. Even with just one concurrent connection, I bet it's not the same *pgx.Conn that ends up being acquired every time. Therefore, we'll have several idle connections in the pool that are all containing these big slices.

Testing appears to confirm this. The test case above inserts a bunch of data, and the Go heap grows to around 5GB on my my machine, which does not get reclaimed until the pool is closed. I also tried modifying extendedQueryBuilder to clear its slices on Reset(), and that also works.

If the above is correct, I suggest two things:

  • extendedQueryBuilder should be much more conservative about how much memory it keeps around. Even when using a single connection, that connection might be used for many different things: For example, we have some gRPC endpoints that stream data in and/or out, and those streams might live for hours at a time, using just a single connection.
  • It's important that idle connections don't consume memory. When a connection is released, it should thin any structures it's holding, such as the query builder and statement cache.

Thoughts?


Addendum: In my test case, the slices actually stick around even after all idle connections are destroyed. Only closing the pool appears to reclaims the memory. Not sure why. Edit: This is why.

@jackc
Copy link
Owner

jackc commented Nov 13, 2021

The original idea was to prevent multiple large queries in a row from having to each allocate new buffers. But I see how this strategy could cause large amounts of memory to be pinned.

Currently, Reset is called before each query. If it was changed to run after the query and if it always freed large buffers instead of only every 128 calls that should resolve this issue. This will cause large queries to allocate more often, but the allocations won't be pinned -- I guess that's a good trade. sync.Pool could probably be used to get the best of both worlds, but since this problem was caused by being too clever while trying to reduce allocations I'm inclined to prefer the simple solution.

@atombender
Copy link
Contributor Author

Right. I think there's at least two problematic scenarios:

  • One or more large queries followed by a release of the connection back to the pool: The slices follow the connection and will remain there.
  • Long-running connection which does at least one large query, but subsequently only submits small queries. Mitigated by the counter in Reset(), but since it's count-based and not time-based, it could take a while to deallocate.

I think that thinning the connection struct on release would be a good start.

Less sure about the second point. Pooling the query builders with sync.Pool would be a bad idea the way the slice code is written today; for example, you wouldn't want a 1GB slice to end back in the sync.Pool, at least not without some kind of expiration time.

If the slices always had a fixed size (kind of like "pages"), it would be different, but that would be more complicated, since the query builder would have to stitch together multiple slices.

@jackc
Copy link
Owner

jackc commented Nov 13, 2021

I think this should be resolved in a457da8.

Reset now releases excessive memory usage on every call and Reset is called after query operations. Since it already is called before each op this does mean that Reset is called twice per query op. I considered removing the before op Reset, but given that Reset is very cheap and the fatal consequences of missing a required cleanup path I left the redundancy.

Regarding sync.Pool, IIUC sync.Pool only holds a weak-ref to its elements. i.e. The elements it contains can be GC'ed. I believe it would be safe to use in this case. But in the spirit of not getting too clever, I'd still rather not do it unless it proves necessary.

@atombender
Copy link
Contributor Author

That works for me, thanks! I didn't think you'd be so receptive to give up slice reuse. This change will do an unnecessary allocation if the connection is never used again afterwards, and I'd prefer idle connections to not take up any heap at all. But it's pretty small, and connections are typically pooled and reused, so not a big deal unless you have hundreds of pooled connections.

@jackc
Copy link
Owner

jackc commented Nov 15, 2021

That works for me, thanks! I didn't think you'd be so receptive to give up slice reuse.

Well, it only gives up reuse for queries that have more than 64 arguments or more than 256 bytes of argument data. I believe that the vast majority of queries should use less than that. For the rarer large queries ... well, the Go GC is a lot better than it used to be. So I am optimistic that overall this change will be a win. If it does prove a problem for certain workloads we can revisit a sync.Pool or other optimization.

@jackc jackc closed this as completed Nov 15, 2021
@atombender
Copy link
Contributor Author

Thanks. I've confirmed locally that this patch fixes the issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants