Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Internal Compiler Error When Storing #1054

Closed
chloe-jeon opened this issue Dec 5, 2024 · 6 comments
Closed

Internal Compiler Error When Storing #1054

chloe-jeon opened this issue Dec 5, 2024 · 6 comments
Labels

Comments

@chloe-jeon
Copy link

chloe-jeon commented Dec 5, 2024

We're getting this error when we try to store data to HBM?

SyntaxError: Internal compiler error: kernel failed verifier check. See above for actual error message.

Further details can be found in https://github.com/chloe-jeon/cs149pa5/tree/main (private repo shared with AWS team members)

@AWSNB
Copy link
Contributor

AWSNB commented Dec 5, 2024

can you add @AWSNB as well

@chloe-jeon
Copy link
Author

added!

@aws-serina-tan
Copy link

Hello, I think somehow test_harness.py is swallowing useful compiler error messages for debugging.

If I call the kernel directly in conv2d.py with the following wrapper code:

input_channels = 128
output_channels = 128
kernel_size = 3
batch_size = 4
image_dims = (32, 16)

X = np.random.rand(
    batch_size, input_channels, image_dims[0], image_dims[1]
).astype(np.float32)
W = np.random.rand(
    output_channels, input_channels, kernel_size, kernel_size
).astype(np.float32)
bias = (
    np.zeros(output_channels).astype(np.float32)
)


fused_conv2d_maxpool(X, W, bias)

I get:

Matmult dst tensor is not psum: TongaSB partitions[1] float32 %output[4, 128, 420, 9] in         float32<128 x 420> TongaSB partitions[1] float32 [4, 128, 420, 9] %'output'(init=0.0)[b,i0.128,i1.420,3i+j] = matmul(float32<128 x 420> TongaSB partitions[3] float32 [4, 3, 3, 128, 420] %'X_shifted'[b,i,j,i2.128,i1.420], float32<128 x 128> $13[b, i, j]), contract={}, lhs_free={b=[0:4:1]}, rhs_free={}, pe_tile={,}, onezero={False,False}, skip_in_tritium_fusion=False, perf_mode=) # id=14, , src_id=None, instances=36 # dl = tensor_op_name:  |  [[i2.128];[i1.420]] -> [[i0.128];[i1.420]]  {NeuronEngine.Tensor}

In this case, your tensor output was declared as an SBUF buffer. You can pass buffer=nl.psum into nl.zeros to create one in PSUM.

@ggumen ggumen added the NKI label Dec 5, 2024
@AWSNB
Copy link
Contributor

AWSNB commented Dec 5, 2024

@chloe-jeon did the comment above help unblock you ?

@chloe-jeon
Copy link
Author

chloe-jeon commented Dec 6, 2024

I tried adding that, and it told me the free dimension (420, 9) exceeds PSUM limit of 512.
As far as I can tell, the error is not with the matmul but with storing output to out_X?
Also, shouldn't output be in SBUF, since we'll eventually have to write it back to out_X, and store requires the source tensor to be in SBUF?

@chloe-jeon
Copy link
Author

Nevermind, the fix was to create a temporary variable on PSUM to store the results of the matrix multiplication for a single pixel, write that to output (SBUF), then store the output back to out_X (HBM).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants