Support to use heap for pool for large systems #7871

yufengwangca · 2021-06-24T07:30:29Z

Problem

What is being fixed? Examples:
Large systems (e.g. Linux) do not need actual 'pools' of data and would rather use heap. We could always preallocate large amounts (e.g. saying a linux controller can connect to 100 devices at a time) and virtual memory will ensure we do not really use that much physical RAM.

Pools always need to be over-provisioned for the worst case, reserve space from data section for each pool waste RAM and this adds up the more memory when we have more pools.

This is the initial change for pool management optimization in chip, I would like to collect feedback and make sure if this is the right direction before scaling it further.

Fixes Allow more control over Pool sizes #7716

Change overview

Add configuration to support to use heap for pool for large systems.

Testing

How was this tested? (at least one bullet point required)

Run $ ./TestSystemObject and make sure all test pass
Run Echo ping test and confirm the test succeed with this change.

msandstedt · 2021-06-24T12:47:37Z

src/system/SystemObject.h

+#if CHIP_SYSTEM_CONFIG_POOL_USE_HEAP
+    unsigned int oldCount = __sync_fetch_and_add(&this->mRefCount, 1);
+
+    if (oldCount == 0)


I see how the atomic operations ensure correctness for mRefCount. However, how do we ensure that the pool is actually initialized after the second call to Init returns? I don't think we actually can.

It seems like this is unsafe unless all of the pools are carefully initialized at the start of the program before anything actually uses them. That seems to somewhat defeat the purpose of using atomics for mRefCount, or having the reference counting scheme at all.

Given that we're moving to purely external synchronization, I would just leave it out unless there's an actual race (and I don't see how there can be; I think we'll have bigger problems if somebody tries to initialize the sdk twice), but if necessary, prefer std::atomic to gcc instrinsics.

msandstedt · 2021-06-24T13:14:17Z

I agree that unconstrained platforms do not share the same concerns that motivated creation of the object pools. There aren't the same risks around fragmentation that can occur with dynamic allocation of a large number of small objects, and particularly objects that may have a long lifetime.

However, moving the pools from the bss segment to the heap doesn't seem functionally different. My preference would instead be that when System::TryCreate is called, this is just a thin wrapper for malloc. This would allow unconstrained platforms to leverage the fact they have abundant memory and not exhaust the sPools at runtime simply because we developers did not anticipate a particular use case.

Unfortunately, I don't know how this would be easily achieved. One current use of the sPools is for layers to walk all objects in the pool to perform particular operations. Apparently objects do need to be known to the layer that allocated them. I think to refactor in the way I'm suggesting, we would need to organize the objects into some type of accessible, but expandable structure like a linked list.

Alternatively, we would need to remove the use cases where layers require immediate access to all objects in the pool. This seems to be the biggest user of that pattern:

void InetLayer::HandleSelectResult(int selectRes, fd_set * readfds, fd_set * writefds, fd_set * exceptfds)

Does anybody know why we need to walk every single pool object in this function? That seems fundamentally unscalable. I admit I haven't looked at the inet code very closely. But if we can remove this type of behavior, then a refactor where System::TryCreate becomes a thin wrapper for malloc is very straight forward.

kpschoedel · 2021-06-24T16:36:31Z

Alternatively, we would need to remove the use cases where layers require immediate access to all objects in the pool. This seems to be the biggest user of that pattern:
void InetLayer::HandleSelectResult(int selectRes, fd_set * readfds, fd_set * writefds, fd_set * exceptfds)

That case is going away (#5556 / #6561).

bzbarsky-apple

I'm trying to understand the actual goal here.

So far what I am seeing is that we still have a pool size template argument; it's just that instead of allocating the pool memory inline in the object we are heap-allocating that memory. What is the benefit of doing this?

I thought the idea we had been discussing was to allow heap-allocation of the individual objects. This PR does not really seem to be doing that...

yufengwangca · 2021-06-24T23:25:54Z

I'm trying to understand the actual goal here.

So far what I am seeing is that we still have a pool size template argument; it's just that instead of allocating the pool memory inline in the object we are heap-allocating that memory. What is the benefit of doing this?

I thought the idea we had been discussing was to allow heap-allocation of the individual objects. This PR does not really seem to be doing that...

The goal of this PR is to use a flag to allow reserve pool space from heap instead of global static memory for big systems. The pool still use big chunk of continuous memory consist of fixed-size blocks. Pools always need to be over-provisioned for the worst case, virtual memory will ensure we do not really use that much physical RAM until we actually access the data. It also allow the system return the memory reserved by pool back to system after usage.

Could you explain more about heap-allocation of the individual objects? Then we will have extra meta data to track those individual and discrete objects in memory, right? how we handle the memory fragmentation if the pool memory is not continuous?

msandstedt · 2021-06-25T00:03:36Z

I'm trying to understand the actual goal here.
So far what I am seeing is that we still have a pool size template argument; it's just that instead of allocating the pool memory inline in the object we are heap-allocating that memory. What is the benefit of doing this?
I thought the idea we had been discussing was to allow heap-allocation of the individual objects. This PR does not really seem to be doing that...

The goal of this PR is to use a flag to allow reserve pool space from heap instead of global static memory for big systems. The pool still use big chunk of continuous memory consist of fixed-size blocks. Pools always need to be over-provisioned for the worst case, virtual memory will ensure we do not really use that much physical RAM until we actually access the data. It also allow the system return the memory reserved by pool back to system after usage.

OK, this wasn't obvious to me. But can't the kernel map the bss to the zero virtual page? I think this copy-on-write virtual memory optimization is possible regardless of where we allocate the pools.

Could you explain more about heap-allocation of the individual objects? Then we will have extra meta data to track those individual and discrete objects in memory, right? how we handle the memory fragmentation if the pool memory is not continuous?

I believe @bzbarsky-apple is suggesting the same thing I was: make System::TryCreate a shallow wrapper for malloc such that each individual object is malloc'ed.

The only remaining question is whether the layers need to track objects themselves. @kpschoedel's comment implies they may not. If not, can't we just map System::TryCreate to malloc and map System::Release to free? It would be really nice if we could.

mspang · 2021-06-25T00:07:41Z

I'm trying to understand the actual goal here.
So far what I am seeing is that we still have a pool size template argument; it's just that instead of allocating the pool memory inline in the object we are heap-allocating that memory. What is the benefit of doing this?
I thought the idea we had been discussing was to allow heap-allocation of the individual objects. This PR does not really seem to be doing that...

That's certainly what I've been pushing for.

The goal of this PR is to use a flag to allow reserve pool space from heap instead of global static memory for big systems. The pool still use big chunk of continuous memory consist of fixed-size blocks. Pools always need to be over-provisioned for the worst case, virtual memory will ensure we do not really use that much physical RAM until we actually access the data. It also allow the system return the memory reserved by pool back to system after usage.

Virtual memory works the same whether it's a large static allocation or a large dynamic one (but you should use calloc rather than malloc + memset if you want lazy allocation).

Could you explain more about heap-allocation of the individual objects? Then we will have extra meta data to track those individual and discrete objects in memory, right? how we handle the memory fragmentation if the pool memory is not continuous?

If objects are relocatable (which requires not storing pointers to pool-allocated objects) and we really did want contiguous storage so we can iterate over the objects with good locality, it's an option to allow use of std::vector... But we'd have to store indexes instead of pointers.

bzbarsky-apple · 2021-06-25T02:24:59Z

reserve pool space from heap instead of global static memory for big systems.

Why does it matter? As @msandstedt says, there's not much of a difference here in that .bss likewise involves just a virtual memory reservation on these systems until you actually write nonzero data in.

It also allow the system return the memory reserved by pool back to system after usage.

That is true, but in practice do we deallocate these pools short of shutting down the entire Matter stack?

Could you explain more about heap-allocation of the individual objects?

Sure. The basic problem we are running into, as I understood it, is that picking the right size for some of these pools for some of these "large" systems is very hard because we have no idea up front what the workload will look like. Maybe we transiently need 200 exchanges when we boot up to send subscribe requests to everything on the fabric, but then we'll never need that many again. That sort of thing.

One plausible solution to that problem is to just not use a fixed-size pool at all and instead heap-allocate the objects, so the only constraint on how many we can have is the actual memory of the device. On small devices where we would run up against these limits a lot this is not a great strategy, but on the "large" systems it seems like it should be fine.

Then we will have extra meta data to track those individual and discrete objects in memory, right?

Yes. Well, malloc will. We only need this sort of metadata if we need to find all the objects. That is the only part of the whole thing that seems complicated and annoying...

how we handle the memory fragmentation if the pool memory is not continuous?

By just accepting that some amount of loss to fragmentation is fine. In practice, on "large" systems:

Virtual memory means that physical fragmentation is really only a problem for large allocations that span many pages: if every physical page has something on it already, then you might have an issue for such an allocation. But I don't expect us to make such large allocations in Matter.
Virtual fragmentation is almost entirely a non-issue on 64-bit systems due to the size of the address space. On 32-bit systems it's possible to end up with significant fragmentation, but again in practice that is usually a problem if you have to make a large virtual allocation. Which, again, we don't plan to do.

So in practice, fragmentation on "large" systems comes down to a some-percentage tax on your memory capacity. And while a 10% or so on a constrained device's memory is a significant problem, the same thing on a "large" device is a much smaller issue, no?

stale · 2021-07-02T02:43:32Z

This pull request has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs.

mspang · 2021-07-05T23:01:07Z

Actually I think even with std::vector<PoolObj> we would still be overcomplicating things.

How about just switching to std::vector<std::unique_ptr<PoolObj>> on sufficiently capable systems?

Sizing is dynamic
No need to invalidate pointers
Doesn't require immediate redesign of code that assumes we can iterate over the pool

yufengwangca · 2021-07-08T21:37:50Z

Actually I think even with std::vector<PoolObj> we would still be overcomplicating things.

How about just switching to std::vector<std::unique_ptr<PoolObj>> on sufficiently capable systems?

Sizing is dynamic

No need to invalidate pointers

Doesn't require immediate redesign of code that assumes we can iterate over the pool

I think this is a good idea, will pilot this solution, thanks.

yufengwangca · 2021-07-08T21:44:26Z

I believe @bzbarsky-apple is suggesting the same thing I was: make System::TryCreate a shallow wrapper for malloc such that each individual object is malloc'ed.

The only remaining question is whether the layers need to track objects themselves. @kpschoedel's comment implies they may not. If not, can't we just map System::TryCreate to malloc and map System::Release to free? It would be really nice if we could.

We actually have two pool systems in our system, Memory pool class BitMapObjectPool defined in support/Pool.h. and Memory pool class SystemObject defined in system/SystemObject.h, we plan to finally merge these two systems.

Even though we might not need immediate access to all objects in the pool for SystemObject with latest update, but BitMapObjectPool provide interface to iterate all items within the pool.

yufengwangca · 2021-07-08T21:52:46Z

This draft PR is only for feedback collection, based on the initial feedback, I need to reset the direction based on what we really want to achieve for pool optimization.

Close this draft PR of discussion and will post a new PR for pool optimization.

boring-cyborg bot added inet linux platform system labels Jun 24, 2021

pullapprove bot requested review from chrisdecenzo, gerickson, jelderton, jmartinez-silabs, LuDuda, mspang, saurabhst, vivien-apple and woody-apple June 24, 2021 07:30

pullapprove bot added the review - pending label Jun 24, 2021

yufengwangca marked this pull request as draft June 24, 2021 07:32

yufengwangca removed request for mspang, gerickson, LuDuda, saurabhst, woody-apple, chrisdecenzo, jelderton, vivien-apple and jmartinez-silabs June 24, 2021 07:32

Support to use heap for pool for large systems

c7c66c4

yufengwangca requested review from andy31415, kpschoedel, mspang, bzbarsky-apple and msandstedt June 24, 2021 07:35

msandstedt reviewed Jun 24, 2021

View reviewed changes

bzbarsky-apple reviewed Jun 24, 2021

View reviewed changes

stale bot added the stale Stale issue or PR label Jul 2, 2021

stale bot removed the stale Stale issue or PR label Jul 5, 2021

yufengwangca closed this Jul 8, 2021

msandstedt mentioned this pull request Jul 14, 2021

Support to use heap for pool based on actual usage for large systems #8299

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support to use heap for pool for large systems #7871

Support to use heap for pool for large systems #7871

yufengwangca commented Jun 24, 2021

msandstedt Jun 24, 2021

kpschoedel Jun 24, 2021

msandstedt commented Jun 24, 2021

kpschoedel commented Jun 24, 2021

bzbarsky-apple left a comment

yufengwangca commented Jun 24, 2021

msandstedt commented Jun 25, 2021

mspang commented Jun 25, 2021 •

edited

Loading

bzbarsky-apple commented Jun 25, 2021

stale bot commented Jul 2, 2021

mspang commented Jul 5, 2021

yufengwangca commented Jul 8, 2021

yufengwangca commented Jul 8, 2021

yufengwangca commented Jul 8, 2021

Support to use heap for pool for large systems #7871

Support to use heap for pool for large systems #7871

Conversation

yufengwangca commented Jun 24, 2021

Problem

Change overview

Testing

msandstedt Jun 24, 2021

Choose a reason for hiding this comment

kpschoedel Jun 24, 2021

Choose a reason for hiding this comment

msandstedt commented Jun 24, 2021

kpschoedel commented Jun 24, 2021

bzbarsky-apple left a comment

Choose a reason for hiding this comment

yufengwangca commented Jun 24, 2021

msandstedt commented Jun 25, 2021

mspang commented Jun 25, 2021 • edited Loading

bzbarsky-apple commented Jun 25, 2021

stale bot commented Jul 2, 2021

mspang commented Jul 5, 2021

yufengwangca commented Jul 8, 2021

yufengwangca commented Jul 8, 2021

yufengwangca commented Jul 8, 2021

mspang commented Jun 25, 2021 •

edited

Loading