Internal Compiler Error When Storing #1054

chloe-jeon · 2024-12-05T04:56:52Z

We're getting this error when we try to store data to HBM?

SyntaxError: Internal compiler error: kernel failed verifier check. See above for actual error message.

Further details can be found in https://github.com/chloe-jeon/cs149pa5/tree/main (private repo shared with AWS team members)

AWSNB · 2024-12-05T05:50:01Z

can you add @AWSNB as well

chloe-jeon · 2024-12-05T05:51:41Z

added!

aws-serina-tan · 2024-12-05T08:13:30Z

Hello, I think somehow test_harness.py is swallowing useful compiler error messages for debugging.

If I call the kernel directly in conv2d.py with the following wrapper code:

input_channels = 128
output_channels = 128
kernel_size = 3
batch_size = 4
image_dims = (32, 16)

X = np.random.rand(
    batch_size, input_channels, image_dims[0], image_dims[1]
).astype(np.float32)
W = np.random.rand(
    output_channels, input_channels, kernel_size, kernel_size
).astype(np.float32)
bias = (
    np.zeros(output_channels).astype(np.float32)
)


fused_conv2d_maxpool(X, W, bias)

I get:

Matmult dst tensor is not psum: TongaSB partitions[1] float32 %output[4, 128, 420, 9] in         float32<128 x 420> TongaSB partitions[1] float32 [4, 128, 420, 9] %'output'(init=0.0)[b,i0.128,i1.420,3i+j] = matmul(float32<128 x 420> TongaSB partitions[3] float32 [4, 3, 3, 128, 420] %'X_shifted'[b,i,j,i2.128,i1.420], float32<128 x 128> $13[b, i, j]), contract={}, lhs_free={b=[0:4:1]}, rhs_free={}, pe_tile={,}, onezero={False,False}, skip_in_tritium_fusion=False, perf_mode=) # id=14, , src_id=None, instances=36 # dl = tensor_op_name:  |  [[i2.128];[i1.420]] -> [[i0.128];[i1.420]]  {NeuronEngine.Tensor}

In this case, your tensor output was declared as an SBUF buffer. You can pass buffer=nl.psum into nl.zeros to create one in PSUM.

AWSNB · 2024-12-05T21:04:42Z

@chloe-jeon did the comment above help unblock you ?

chloe-jeon · 2024-12-06T04:03:55Z

I tried adding that, and it told me the free dimension (420, 9) exceeds PSUM limit of 512.
As far as I can tell, the error is not with the matmul but with storing output to out_X?
Also, shouldn't output be in SBUF, since we'll eventually have to write it back to out_X, and store requires the source tensor to be in SBUF?

chloe-jeon · 2024-12-06T05:45:01Z

Nevermind, the fix was to create a temporary variable on PSUM to store the results of the matrix multiplication for a single pixel, write that to output (SBUF), then store the output back to out_X (HBM).

ggumen added the NKI label Dec 5, 2024

chloe-jeon closed this as completed Dec 6, 2024

This was referenced Dec 6, 2024

Stanford CS149 Assignment 4 getting red error [TEN404] Internal tensorizer error: TensorInitialization:Expect NeuronReduceMacro! #1058

Closed

Different values when using nki.simulate_kernel #1051

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Internal Compiler Error When Storing #1054

Internal Compiler Error When Storing #1054

chloe-jeon commented Dec 5, 2024 •

edited

Loading

AWSNB commented Dec 5, 2024

chloe-jeon commented Dec 5, 2024

aws-serina-tan commented Dec 5, 2024

AWSNB commented Dec 5, 2024

chloe-jeon commented Dec 6, 2024 •

edited

Loading

chloe-jeon commented Dec 6, 2024

Internal Compiler Error When Storing #1054

Internal Compiler Error When Storing #1054

Comments

chloe-jeon commented Dec 5, 2024 • edited Loading

AWSNB commented Dec 5, 2024

chloe-jeon commented Dec 5, 2024

aws-serina-tan commented Dec 5, 2024

AWSNB commented Dec 5, 2024

chloe-jeon commented Dec 6, 2024 • edited Loading

chloe-jeon commented Dec 6, 2024

chloe-jeon commented Dec 5, 2024 •

edited

Loading

chloe-jeon commented Dec 6, 2024 •

edited

Loading