-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support to use heap for pool for large systems #7871
Conversation
#if CHIP_SYSTEM_CONFIG_POOL_USE_HEAP | ||
unsigned int oldCount = __sync_fetch_and_add(&this->mRefCount, 1); | ||
|
||
if (oldCount == 0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see how the atomic operations ensure correctness for mRefCount
. However, how do we ensure that the pool is actually initialized after the second call to Init
returns? I don't think we actually can.
It seems like this is unsafe unless all of the pools are carefully initialized at the start of the program before anything actually uses them. That seems to somewhat defeat the purpose of using atomics for mRefCount
, or having the reference counting scheme at all.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given that we're moving to purely external synchronization, I would just leave it out unless there's an actual race (and I don't see how there can be; I think we'll have bigger problems if somebody tries to initialize the sdk twice), but if necessary, prefer std::atomic to gcc instrinsics.
I agree that unconstrained platforms do not share the same concerns that motivated creation of the object pools. There aren't the same risks around fragmentation that can occur with dynamic allocation of a large number of small objects, and particularly objects that may have a long lifetime. However, moving the pools from the bss segment to the heap doesn't seem functionally different. My preference would instead be that when Unfortunately, I don't know how this would be easily achieved. One current use of the sPools is for layers to walk all objects in the pool to perform particular operations. Apparently objects do need to be known to the layer that allocated them. I think to refactor in the way I'm suggesting, we would need to organize the objects into some type of accessible, but expandable structure like a linked list. Alternatively, we would need to remove the use cases where layers require immediate access to all objects in the pool. This seems to be the biggest user of that pattern:
Does anybody know why we need to walk every single pool object in this function? That seems fundamentally unscalable. I admit I haven't looked at the inet code very closely. But if we can remove this type of behavior, then a refactor where |
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm trying to understand the actual goal here.
So far what I am seeing is that we still have a pool size template argument; it's just that instead of allocating the pool memory inline in the object we are heap-allocating that memory. What is the benefit of doing this?
I thought the idea we had been discussing was to allow heap-allocation of the individual objects. This PR does not really seem to be doing that...
The goal of this PR is to use a flag to allow reserve pool space from heap instead of global static memory for big systems. The pool still use big chunk of continuous memory consist of fixed-size blocks. Pools always need to be over-provisioned for the worst case, virtual memory will ensure we do not really use that much physical RAM until we actually access the data. It also allow the system return the memory reserved by pool back to system after usage. Could you explain more about heap-allocation of the individual objects? Then we will have extra meta data to track those individual and discrete objects in memory, right? how we handle the memory fragmentation if the pool memory is not continuous? |
OK, this wasn't obvious to me. But can't the kernel map the bss to the zero virtual page? I think this copy-on-write virtual memory optimization is possible regardless of where we allocate the pools.
I believe @bzbarsky-apple is suggesting the same thing I was: make The only remaining question is whether the layers need to track objects themselves. @kpschoedel's comment implies they may not. If not, can't we just map |
That's certainly what I've been pushing for.
Virtual memory works the same whether it's a large static allocation or a large dynamic one (but you should use calloc rather than malloc + memset if you want lazy allocation).
If objects are relocatable (which requires not storing pointers to pool-allocated objects) and we really did want contiguous storage so we can iterate over the objects with good locality, it's an option to allow use of std::vector... But we'd have to store indexes instead of pointers. |
Why does it matter? As @msandstedt says, there's not much of a difference here in that
That is true, but in practice do we deallocate these pools short of shutting down the entire Matter stack?
Sure. The basic problem we are running into, as I understood it, is that picking the right size for some of these pools for some of these "large" systems is very hard because we have no idea up front what the workload will look like. Maybe we transiently need 200 exchanges when we boot up to send subscribe requests to everything on the fabric, but then we'll never need that many again. That sort of thing. One plausible solution to that problem is to just not use a fixed-size pool at all and instead heap-allocate the objects, so the only constraint on how many we can have is the actual memory of the device. On small devices where we would run up against these limits a lot this is not a great strategy, but on the "large" systems it seems like it should be fine.
Yes. Well,
By just accepting that some amount of loss to fragmentation is fine. In practice, on "large" systems:
So in practice, fragmentation on "large" systems comes down to a some-percentage tax on your memory capacity. And while a 10% or so on a constrained device's memory is a significant problem, the same thing on a "large" device is a much smaller issue, no? |
This pull request has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. |
Actually I think even with How about just switching to
|
I think this is a good idea, will pilot this solution, thanks. |
We actually have two pool systems in our system, Memory pool class BitMapObjectPool defined in support/Pool.h. and Memory pool class SystemObject defined in system/SystemObject.h, we plan to finally merge these two systems. Even though we might not need immediate access to all objects in the pool for SystemObject with latest update, but BitMapObjectPool provide interface to iterate all items within the pool. |
This draft PR is only for feedback collection, based on the initial feedback, I need to reset the direction based on what we really want to achieve for pool optimization. Close this draft PR of discussion and will post a new PR for pool optimization. |
Problem
What is being fixed? Examples:
Large systems (e.g. Linux) do not need actual 'pools' of data and would rather use heap. We could always preallocate large amounts (e.g. saying a linux controller can connect to 100 devices at a time) and virtual memory will ensure we do not really use that much physical RAM.
Pools always need to be over-provisioned for the worst case, reserve space from data section for each pool waste RAM and this adds up the more memory when we have more pools.
This is the initial change for pool management optimization in chip, I would like to collect feedback and make sure if this is the right direction before scaling it further.
Change overview
Add configuration to support to use heap for pool for large systems.
Testing
How was this tested? (at least one bullet point required)