[FEA] Add host memory allocation APIs #8879
Labels
reliability
Features to improve reliability or bugs that severly impact the reliability of the plugin
task
Work required that improves the product but is not user facing
Is your feature request related to a problem? Please describe.
In order to be able to limit host memory usage we need some new APIs to allow us to allocate host memory, but with some limits in place.
I am only going to outline the APIs in pseudo code here. The exact details will likely be worked out as we use them.
In the first go at this, these APIs would have no way to spill. We should think about what it would take to let spill work. But the goal is mostly to put these APIs in place. I expect these APIs to evolve a bit over time, but hopefully they are close to what the final APIs will look like.
Internally when blocking threads to wait for memory to free up in the reservation we need a priority system. The priority should be based on the same algorithm that the device retry framework works on, except tasks that holding the GPU semaphore gives a higher priority to a task than not holding it. In order to try and avoid deadlocks the priority is going to be a hard priority. All allocation requests/reservations will be handled in priority order and then FIFO order. This means that if there is a blocked allocation, all non-blocking allocations with the same or lower priority will fail, even if there is memory to satisfy them. This also means that when a task grabs or drops the GPU Semaphore the priority queue in the memory allocation APIs needs to adjust to that change.
If the allocator does not have a limit, then we just call into the normal memory buffer allocation APIs and return them. If there is a limit we will need a way to track when an allocation is freed so we can wake up pending tasks. This is where a lot of the testing will happen. Similar things will happen when we make a buffer spillable, but that will come in a later PR.
Note that in the case with a limit memory from the pinned pool is considered to be free. If we get it, that is great and it will not count against the limit. For a reservation, if it gets pinned memory then it should count against the reservation limit. We might need a good way in the callback to indicate this.
The text was updated successfully, but these errors were encountered: