You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I am trying to use the NKI Allocation API to keep multiple PSUM banks active. In the documentation, I see, "rather than laying out
multiple 128x512 tensors in the same partition with offset byte_addr , and making them live with allocated_block_shape, on PSUM, we achieve similar parallelism by mapping the blocks we want live into multiple banks."
However, when I try to implement this, only one PSUM bank is in use at a time, and despite specifying that banks 0-3 should be in use, over the course of the kernel, I see all eight banks in use.
Environment: I started with the Neuron 2.20 DLAMI and installed the Allocation API using the .deb and .whl files @aws-serina-tan sent me.
Hi Nandeeka! When PSUM is under allocated, like in this case, the compiler has optimization that rotates the PSUM bank allocation to use all available banks. In terms of why profiler shows 1 bank is in use at a time, I will need to reproduce myself and take a closer look.
There is only one block alive in PO_SBUF at the same time, yet the loop attempts to load data into every single one of them. This is undefined behaviour and would cause data race during execution.
Note that this feature is not released yet at the moment, so the API signature has changed during development.
Could you please contact @aws-serina-tan and ask her to provide additional documents for you to understand the behaviour of the allocation, and a new wheel if possible?
I am trying to use the NKI Allocation API to keep multiple PSUM banks active. In the documentation, I see, "rather than laying out
multiple 128x512 tensors in the same partition with offset
byte_addr
, and making them live withallocated_block_shape
, on PSUM, we achieve similar parallelism by mapping the blocks we want live into multiple banks."However, when I try to implement this, only one PSUM bank is in use at a time, and despite specifying that banks 0-3 should be in use, over the course of the kernel, I see all eight banks in use.
Environment: I started with the Neuron 2.20 DLAMI and installed the Allocation API using the .deb and .whl files @aws-serina-tan sent me.
Full Kernel:
A zoomed in portion of the resulting profile:
Color guide:
The text was updated successfully, but these errors were encountered: