-
Notifications
You must be signed in to change notification settings - Fork 435
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
WIP: Proposal to base VkDeviceMemory on MTLHeap always #2309
base: main
Are you sure you want to change the base?
Conversation
Incomplete draft since it requires reworking how VkImage and VkBuffer handle the memory, but should be enough to provide an overview of the proposed changes.
@@ -70,11 +68,11 @@ | |||
// Coherent memory does not require flushing by app, so we must flush now | |||
// to support Metal textures that actually reside in non-coherent memory. | |||
if (mvkIsAnyFlagEnabled(_vkMemPropFlags, VK_MEMORY_PROPERTY_HOST_COHERENT_BIT)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the reasoning behind this? Wouldn't we want to map MTLStorageModeShared
to VK_MEMORY_PROPERTY_HOST_VISIBLE_BIT|VK_MEMORY_PROPERTY_HOST_COHERENT_BIT
and MTLStorageModeManaged
to VK_MEMORY_PROPERTY_HOST_VISIBLE_BIT
? Later in the flushToDevice
call we check that the storage is MTLStorageModeManaged
, so wouldn't it be just better to check for that here and assume that any mappable memory is host visible (I believe validation would complain if it's not). Similar situation with map
call.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On macOS discrete GPU's, Metal does not always support MTLStorageModeShared
for textures. Search the MoltenVK code base for references to MTLStorageModeManaged
to see when we ignore VK_MEMORY_PROPERTY_HOST_COHERENT_BIT
and force the use of MTLStorageModeManaged
.
Metal has relaxed this restriction in later versions (see the MVKPhysicalDeviceMetalFeatures::sharedLinearTextures
feature), so we might be able to rework the overall logic a bit, but that should already be taken into consideration wherever we force MTLStorageModeManaged
.
This issue might be problematic for an OS that doesn't support sharedLinearTextures
. For macOS, that means 10.15.6
. Perhaps we can make that our minimum for MoltenVK after this (although I would like to be able to support the 5 year window that Apple seems to cover on Xcode). Or perhaps if sharedLinearTextures
is disabled, we could just error out and report that an image can't be bound to a shared-memory device memory when an attempt is made to do that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To get around the limitation of private-only MTLHeaps
on non-Apple GPU's, I'm wondering if we could allocate a temporary managed MTLBuffer
when a map call occurs (and remove it on unmap), and do a BLIT from it to the aliased private MTLBuffer
on flush?
In this case, we would have to indicate that there is no host-coherent memory options, so that the app would have to flush when it makes changes to the buffer content.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've been thinking about the issue for some time now. I'll dump my ideas here so I don't forget about them and we can decide.
First of, let's understand what Vulkan requires in terms of host visible coherent memory (which is our main issue due to non Apple silicon hardware). The spec states in https://registry.khronos.org/vulkan/specs/1.3-extensions/html/vkspec.html#memory-device-properties the following:
There must be at least one memory type with both the VK_MEMORY_PROPERTY_HOST_VISIBLE_BIT and VK_MEMORY_PROPERTY_HOST_COHERENT_BIT bits set in its propertyFlags.
and
For any memory allocated with both the VK_MEMORY_PROPERTY_HOST_COHERENT_BIT and the VK_MEMORY_PROPERTY_DEVICE_COHERENT_BIT_AMD, host or device accesses also perform automatic memory domain transfer operations, such that writes are always automatically available and visible to both host and device memory domains.
So we need to provide a host visible coherent type always and any memory writes either in host or device require no flushing to be visible on both.
The other restriction Vulkan imposes in https://registry.khronos.org/vulkan/specs/1.3-extensions/html/vkspec.html#resources-association is the following:
If buffer is a VkBuffer not created with the VK_BUFFER_CREATE_SPARSE_BINDING_BIT or VK_BUFFER_CREATE_PROTECTED_BIT bits set, or if image is a linear image that was not created with the VK_IMAGE_CREATE_PROTECTED_BIT bit set, then the memoryTypeBits member always contains at least one bit set corresponding to a VkMemoryType with a propertyFlags that has both the VK_MEMORY_PROPERTY_HOST_VISIBLE_BIT bit and the VK_MEMORY_PROPERTY_HOST_COHERENT_BIT bit set. In other words, mappable coherent memory can always be attached to these objects.
So, any buffer and linear texture (except if created with some flags) must have at least a memory type that is host visible and coherent. We can work around this by limiting what we expose through vkGetPhysicalDeviceImageFormatProperties
, more on this later.
Now, looking at Metal texel buffers (MTLBuffers anyway) have a limit according to https://developer.apple.com/metal/Metal-Feature-Set-Tables.pdf which is 256MB, and is also limited by MTLDevice.maxBufferLength
.
I do believe the way forward can be allocating private-only MTLHeaps
, providing a fake memory on map, and flushing it to the MTLHeap
. However, there are a few caveats that need addressing due to the previously mentioned points:
- Host coherent memory requires that any write/read actually has the same value as the GPU counterpart memory. Why is this an issue? We cannot use map/unmap nor flush/invalidate Vulkan calls to do the flushing CPU memory to GPU memory and vice-versa. The user may map a VkDeviceMemory object, modify its contents by GPU commands, and read values from the mapped memory. I believe this case is not correctly handled in MoltenVK as of now either.
- If we decided to use a
MTLBuffer
we are limited to either 256MB orMTLDevice.maxBufferLength
. What happens if the mapped memory is bigger? How do we provide a contiguous memory for the user? Unsure how MoltenVK handles this at the moment.
Potential solutions to point 1 (will use GFXR as baseline since they've already addressed this issue and aligns to what I had in mind):
- https://github.com/LunarG/gfxreconstruct/blob/dev/USAGE_desktop_Vulkan.md#3-page_guard. In short, the idea is to trap the memory we provide to the user by changing the protection level and install a signal handler to catch accesses to that memory. Then we do our thing and flush/read to/from GPU in the handler.
- https://github.com/LunarG/gfxreconstruct/blob/dev/USAGE_desktop_Vulkan.md#4-userfaultfd. Unsure if this one is doable in macOS, I believe it should be, but need to put more time into it to correctly understand the viability.
Potential solutions to point 2 (when I use the limitation
I'm refering to either 256MB or MTLDevice.maxBufferLength
for this point, the smallest of those two):
- Expose a separate heap with the limitation as size that will be the one the user is allowed to allocate host visible coherent memory. The big downside to this approach is the potential low memory it would have.
- Limit linear texture sizes via
vkGetPhysicalDeviceImageFormatProperties
to not exceed the limitation. However, this does not limit users to allocate a biggerVkDeviceMemory
than the limitation. - Allocate host memory and provide that to the user. We also maintain a copy of this memory in one or multiple
MTLBuffer
s and do the flush/read to GPU using them. The big downside to this is the amount of memory we are using (3 times for the mapped region: one for the CPU, one for the buffers, one for the heap).
All of this is assuming we cannot create buffers with MTLStorageShared
in non Apple silicon devices. I don't have a device with such set up, so I cannot verify this. I'm basing off MoltenVK's approach for those devices. If we had MTLStorageShared
then a couple of things could be simplified, since we could just allocate buffers instead of heaps. But I believe at least point 2 would require addressing.
Any thoughts, comments or concerns are welcome!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Most of this is managing syncing non-Apple GPU memory. I think there will always be situations where someone opens a memory map, leaves it open, and constantly makes changes to it, so we may always have to accept that we will have incomplete solutions.
There must be at least one memory type with both the VK_MEMORY_PROPERTY_HOST_VISIBLE_BIT and VK_MEMORY_PROPERTY_HOST_COHERENT_BIT bits set in its propertyFlags.
It might be possible to modify this within the spec with something like "If the VK_KHR_portability_subset
extension is not enabled, ...", allowing us to back-out of certain standard Vulkan requirements, and then modify CTS to not fail when this happens.
|
||
MVKSmallVector<MVKBuffer*, 4> _buffers; | ||
MVKSmallVector<MVKImageMemoryBinding*, 4> _imageMemoryBindings; | ||
std::mutex _rezLock; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've removed this since my understanding is they are used to read/write to the resources that use this memory device objects. Allocated resources now will be placed in heaps and a buffer that spans the whole heap is provided to accomplish this. The only exception are imported resources, these are handled slightly different. The explanation is in code comments, but in short, textures require to be dedicated for export/import by VK_EXT_metal_objects, so we can create buffers when importing to account for host mapping due to Metal not providing anything; and buffers will just not have a heap if they are not backed by it at import.
@@ -44,14 +45,11 @@ | |||
return reportError(VK_ERROR_MEMORY_MAP_FAILED, "Memory is already mapped. Call vkUnmapMemory() first."); | |||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While I am not against these checks, it feels like they are redundant since VVL should catch those with VUID-vkMapMemory-memory-00678
and VUID-vkMapMemory-memory-00682
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree. These are holdovers from when MoltenVK was not as integrated with the SDK and validation environment. These can all go.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for submitting this. I've been thinking about moving to MTLHeaps
only for a while now.
This will also help with Metal Argument resource usage efficiencies. But I'll take care of that later once this is pulled in.
This makes MVKConfiguration::useMTLHeap
obsolete, so that needs too be handled and documented. But again, I can take care of that too.
|
||
/** Resturn the Metal cache mode from the Metal resource options */ | ||
MTLCPUCacheMode mvkMTLCPUCacheMode(MTLResourceOptions options); | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These are for internal use and don't need to be in mvk_datatypes.h
. I suggest you move them (along with mvkMTLResourceOptions()
) to a support functions section at the bottom of MVKDeviceMemory.h/mm
. See other support function areas at the bottom of other files for reference.
The ancient original intention of mvk_datatypes.h
was to provide apps access to convenience functions for mapping Vulkan to Metal types. TBH...I'm sure no-one ever uses it, and eventually I'd like to remove it and just use internal non-public files instead.
Also Returns
is misspelled in the comment.
|
||
MVK_PUBLIC_SYMBOL MTLCPUCacheMode mvkMTLCPUCacheMode(MTLResourceOptions options) { | ||
return static_cast<MTLCPUCacheMode>((options & MTLResourceCPUCacheModeMask) >> MTLResourceCPUCacheModeShift); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As above, I suggest you just add these as support functions at the bottom of MVKDeviceMemory.h/mm
.
@@ -44,14 +45,11 @@ | |||
return reportError(VK_ERROR_MEMORY_MAP_FAILED, "Memory is already mapped. Call vkUnmapMemory() first."); | |||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree. These are holdovers from when MoltenVK was not as integrated with the SDK and validation environment. These can all go.
enum class DedicatedResourceType : uint8_t { | ||
NONE = 0, | ||
BUFFER, | ||
IMAGE |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor nit. MoltenVK traditional style is to avoid all-caps for names (except in some old macro definitions). So it would be more consistent to call these None
, Buffer
, and Image
.
// Having no buffer and texture being host accessible means we allocated memory for the mapping | ||
if (!_mtlBuffer && mvkIsAnyFlagEnabled(_vkMemPropFlags, VK_MEMORY_PROPERTY_HOST_VISIBLE_BIT)) { | ||
free(_map); | ||
_map = nullptr; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where is _map
set if there is no _mtlBuffer
?
And if we're going to set it to nullptr
, we should move that part below, so it's also set to nullptr
for the case where we have a _mtlBuffer
.
} | ||
|
||
fail_alloc: | ||
setConfigurationResult(reportError(VK_ERROR_OUT_OF_DEVICE_MEMORY, "vkAllocateMemory(): Could not allocate VkDeviceMemory of size %llu bytes.", _size)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please don't use goto
. Package this in a reportAllocFail()
function, and then just call return reportAllocFail();
in the goto
calls above.
Also, to cover someone trying to use this with ancient OS versions, can you add a check in that function for getMetalFeatures().placementHeaps
, and if it's false, report a slightly different message indicating that the app must be running on macOS 10.15 or iOS/tvOS 13.0. Those are also our current minimum supported versions anyway.
if (mvkIsAnyFlagEnabled(_vkMemPropFlags, VK_MEMORY_PROPERTY_HOST_VISIBLE_BIT)) { | ||
_mtlBuffer = [_mtlHeap newBufferWithLength:_size options:_options]; | ||
if (!_mtlBuffer) goto fail_alloc; | ||
[_mtlBuffer makeAliasable]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Uggh. The Apple docs for makeAliasable
indicate:
Important
This method is only valid for heap-allocated resources using the MTLHeapTypeAutomatic allocator.
Do we need to use placement heaps, or can we rework everything for automatic heaps? I guess the requirement of setting offsets on resources in a device memory makes it impossible not to use placement heaps.
Ah! Actually, from the Apple docs, it looks like with placement heaps, any overlapping areas are automatically aliased, so maybe we just need to avoid the use of makeAliasable
.
Also, we should check the maximum size of a MTLBuffer
on the device. It may not be able to span the entire MTLHeap
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had initially wanted to enable MTLHeap
by default when I first added support, but I discovered that rendering to MTLHeap
-backed textures was broken on AMD hardware. Dunno if they've fixed that yet.
I actually think that MTLHeap
s shouldn't be used for dedicated allocations. IIUC part of the point of VK_KHR_dedicated_allocation
is to avoid the overhead of a general memory allocation. Note that MTLHeap
-backed resources have higher overhead than non-heap resources.
I also think--and this might be more controversial--that MTLHeap
shouldn't be used for coherent, host-visible memory, precisely because MTLHeap
s cannot be created in Shared
memory on some GPUs. I was disappointed to learn that Apple didn't support Shared
MTLHeap
s and wouldn't support them, because I wanted to use MTLHeap
for everything non-dedicated.
Doesn't always using MTLHeap for VkDeviceMemory also require a rework to do explicit barriers? Metal tracks the underlying heap of each resource, so it would lead to a lot of false sharing. Apple advises against using automatic tracking with MTLHeap in a ton of WWDC videos. |
Incomplete draft since it requires reworking how VkImage and VkBuffer handle the memory, but should be enough to provide an overview of the proposed changes.
@billhollings would like to know your thoughts. Relevant changes are in MVKDeviceMemory.h and MVKDeviceMemory.mm
The idea is to move all memory ownership to VkDeviceMemory. This would also simplify implementing a certain extension. Let me know if there's any issue that comes to mind. I'll also leave some comments with questions, if you could address those, it would be of great help to be able to finish with the changes once we align.