Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rename tt.Layout to tt.MetalLayout #1386

Merged
merged 1 commit into from
Nov 27, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion docs/src/dialects-overview.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@
Here is a brief overview of the dialects in the project, please refer to the
individual dialect documentation for more details.:

- `tt`: Common types such as, `tt.tile`, `tt.layout`, `tt.grid`, etc. and enums such as, data formats, memory spaces, iterator types etc.
- `tt`: Common types such as, `tt.tile`, `tt.metal_layout`, `tt.grid`, etc. and enums such as, data formats, memory spaces, iterator types etc.
- `ttir`: A high level dialect that models the tensor compute graph on tenstorrent devices. Accepts `tosa` and `linalg` input.
- `ttir.generic`: Generically describe compute work.
- `ttir.to_layout`: Convert between different tensor memory layouts and transfer between different memory spaces.
Expand Down
6 changes: 3 additions & 3 deletions docs/src/specs/device.md
Original file line number Diff line number Diff line change
Expand Up @@ -135,7 +135,7 @@ the logical device grid:

```mlir
tensor<16x3x64x128xf32,
#tt.layout<(d0, d1, d2, d3) -> (d0, d1 * 64 + d2, d3),
#tt.metal_layout<(d0, d1, d2, d3) -> (d0, d1 * 64 + d2, d3),
undef,
<2x2x4>,
memref<8x3x1x!tt.tile<32 x 32, bfp_bf8>, #tt.memory_space<l1>>
Expand Down Expand Up @@ -170,7 +170,7 @@ the logical device grid:

```mlir
tensor<256x1024xf32,
#tt.layout<(d0, d1) -> (d0, d1),
#tt.metal_layout<(d0, d1) -> (d0, d1),
undef,
<4x16>,
memref<2x2x!tt.tile<32 x 32, bfp_bf8>, #tt.memory_space<l1>>
Expand Down Expand Up @@ -205,7 +205,7 @@ We can consider the following tensor to map onto this grid:

```mlir
tensor<64x256x1024xf32,
#tt.layout<(d0, d1) -> (d0, d1),
#tt.metal_layout<(d0, d1) -> (d0, d1),
undef,
<2x4x16>,
memref<32x2x2x!tt.tile<32 x 32, bfp_bf8>, #tt.memory_space<l1>>
Expand Down
32 changes: 16 additions & 16 deletions docs/src/specs/tensor-layout.md
Original file line number Diff line number Diff line change
Expand Up @@ -33,7 +33,7 @@ been used by the TT dialect to encode the tensor's layout. This looks like:

```mlir
tensor<2x3x64x128xf32,
#tt.layout<
#tt.metal_layout<
(d0, d1, d2, d3) -> (d0 * 192 + d1 * 64 + d2, d3),
undef,
<1x1>,
Expand Down Expand Up @@ -76,7 +76,7 @@ topics:

### Dimension Collapsing

Probably the most important concept in `tt.layout` is dimension collapsing.
Probably the most important concept in `tt.metal_layout` is dimension collapsing.
This is captured by the affine map `linear` property which provides a
mapping from tensor dim space to a reduced physical dimensional space. This
single-handedly touches on most of the tensor layout goals mentioned at the
Expand Down Expand Up @@ -106,7 +106,7 @@ to get our remapped offset:
This remapped offset `(262, 100)` corresponds to the row and column index of the
collapsed physical memory.

By default, the dim range `[0, -1)` is collapsed, but the `tt.layout` contructor
By default, the dim range `[0, -1)` is collapsed, but the `tt.metal_layout` contructor
can actually take a programmable range called `collapseIntervals`.
`collapseIntervals` is a list of pairs, where each pair is a dim range interval,
left inclusive, right exclusive. Let's consider a few examples:
Expand Down Expand Up @@ -137,7 +137,7 @@ Let's consider the original example again, but on a larger grid than `1x1`, say

```mlir
tensor<2x3x64x128xf32,
#tt.layout<
#tt.metal_layout<
(d0, d1, d2, d3) -> (d0 * 192 + d1 * 64 + d2, d3),
undef,
<2x4>,
Expand Down Expand Up @@ -173,31 +173,31 @@ Here's a few more example mlir snippets:

```mlir
tensor<8x300xf32,
#tt.layout<(d0, d1) -> (d0, d1),
#tt.metal_layout<(d0, d1) -> (d0, d1),
undef,
<1x2>,
memref<8x150xf32, #tt.memory_space<l1>>
>
>

tensor<8x96x32xf32,
#tt.layout<(d0, d1, d2) -> (d0 * 96 + d1, d2),
#tt.metal_layout<(d0, d1, d2) -> (d0 * 96 + d1, d2),
undef,
<2x1>,
memref<384x32xf32, #tt.memory_space<l1>>
>
>

tensor<8x96x32xf32,
#tt.layout<(d0, d1, d2) -> (d0 * 96 + d1, d1, d2),
#tt.metal_layout<(d0, d1, d2) -> (d0 * 96 + d1, d1, d2),
undef,
<2x1x2>,
memref<384x96x16xf32, #tt.memory_space<l1>>
>
>

tensor<5x3x2x2x7x32x32xf32,
#tt.layout<
#tt.metal_layout<
(d0, d1, d2, d3, d4, d5, d6)
-> (d0 * 2688 + d1 * 896 + d2 * 448 + d3 * 224 + d4 * 32 + d5, d4, d5, d6),
undef,
Expand Down Expand Up @@ -226,7 +226,7 @@ A tilized tensor is one with a memref that has a tile element type.
Given some tensor with scalar layout:
```mlir
tensor<3x64x128xf32,
#tt.layout<
#tt.metal_layout<
(d0, d1, d2) -> (d0 * 64 + d1, d2),
undef,
<3x2>,
Expand All @@ -238,7 +238,7 @@ tensor<3x64x128xf32,
After tilizing we'll have:
```mlir
tensor<3x64x128xf32,
#tt.layout<
#tt.metal_layout<
(d0, d1, d2) -> (d0 * 64 + d1, d2),
undef,
<3x2>,
Expand All @@ -256,7 +256,7 @@ intact.
Padding can be a bit of an overloaded term, but in this context it refers to an
out of bounds area in the physical memory allocation that has no real tensor
data in it. The contents of this area is tracked by `oob_val` and the padding
area can be automatically derived from the attributes of `tt.layout`.
area can be automatically derived from the attributes of `tt.metal_layout`.

Padding is a necessary evil that arises when a tensor is not evenly divisible by
a grid shape or tile shape. It can also arise due to minimum Noc addressing
Expand All @@ -265,7 +265,7 @@ requirements.
Example of non-divisible grid:
```mlir
tensor<53x63xf32,
#tt.layout<
#tt.metal_layout<
(d0, d1) -> (d0, d1),
undef,
<3x2>,
Expand All @@ -284,7 +284,7 @@ cores and 1 scalar column of padding on the last column of cores.
Taking the above example a step further, we could tilize it:
```mlir
tensor<53x63xf32,
#tt.layout<
#tt.metal_layout<
(d0, d1) -> (d0, d1),
undef,
<3x2>,
Expand All @@ -308,7 +308,7 @@ stride between dimensions. Consider tensor (w/ batch dim `2`):

```mlir
tensor<2x8x32xf32,
#tt.layout<
#tt.metal_layout<
(d0, d1, d2) -> (d0 * 8 + d1, d2),
undef,
<1x2>,
Expand Down Expand Up @@ -356,7 +356,7 @@ consider the following example with a 3d grid and `collapseIntervals=[(1, -1)]`.

```mlir
tensor<2x3x64x128xf32,
#tt.layout<(d0, d1, d2, d3) -> (d0, d1 * 64 + d2, d3),
#tt.metal_layout<(d0, d1, d2, d3) -> (d0, d1 * 64 + d2, d3),
undef,
<2x2x4>,
memref<1x3x1x!tt.tile<32 x 32, bfp_bf8>, #tt.memory_space<l1>>
Expand Down Expand Up @@ -387,7 +387,7 @@ under the same grid primitive that also divides tensor rows and columns.

## Concerns

- `tt.layout` is deliberately flexible and tries to capture as many problematic
- `tt.metal_layout` is deliberately flexible and tries to capture as many problematic
use-cases we've ran into in the past in a single, succinct representation.
This flexibility will need to be further constrained by backends to avoid
unsupported programming of this attribute.
Expand Down
6 changes: 3 additions & 3 deletions include/ttmlir-c/TTAttrs.h
Original file line number Diff line number Diff line change
Expand Up @@ -50,9 +50,9 @@ MLIR_CAPI_EXPORTED MlirAttribute ttmlirTTSystemDescAttrGet(
size_t chipCoordsSize, MlirAttribute *chipChannels,
size_t chipChannelsSize);

MLIR_CAPI_EXPORTED MlirAttribute
ttmlirTTLayoutAttrGet(MlirContext ctx, MlirAffineMap linear, unsigned oobVal,
MlirAttribute grid, MlirType memref, unsigned memLayout);
MLIR_CAPI_EXPORTED MlirAttribute ttmlirTTMetalLayoutAttrGet(
MlirContext ctx, MlirAffineMap linear, unsigned oobVal, MlirAttribute grid,
MlirType memref, unsigned memLayout);

MLIR_CAPI_EXPORTED MlirAttribute
ttmlirTTMemorySpaceAttrGet(MlirContext ctx, uint32_t memorySpace);
Expand Down
30 changes: 15 additions & 15 deletions include/ttmlir/Dialect/TT/IR/TTOpsTypes.td
Original file line number Diff line number Diff line change
Expand Up @@ -214,7 +214,7 @@ def TT_SystemDescAttr : TT_Attr<"SystemDesc", "system_desc"> {
}];
}

def TT_LayoutAttr : TT_Attr<"Layout", "layout"> {
def TT_MetalLayoutAttr : TT_Attr<"MetalLayout", "metal_layout"> {
let summary = "Tensor layout attribute";
let description = [{
The tensor layout attribute captures how tensor data is sharded across a grid of devices, cores, and
Expand All @@ -241,31 +241,31 @@ def TT_LayoutAttr : TT_Attr<"Layout", "layout"> {
Examples:
```mlir
tensor<8x300xf32,
#tt.layout<(d0, d1) -> (d0, d1),
#tt.metal_layout<(d0, d1) -> (d0, d1),
undef,
<1x2>,
memref<8x150xf32, #tt.memory_space<l1>>
>
>

tensor<8x96x32xf32,
#tt.layout<(d0, d1, d2) -> (d0 * 96 + d1, d2),
#tt.metal_layout<(d0, d1, d2) -> (d0 * 96 + d1, d2),
undef,
<2x1>,
memref<384x32xf32, #tt.memory_space<l1>>
>
>

tensor<8x96x32xf32,
#tt.layout<(d0, d1, d2) -> (d0 * 96 + d1, d1, d2),
#tt.metal_layout<(d0, d1, d2) -> (d0 * 96 + d1, d1, d2),
undef,
<2x1x2>,
memref<384x96x16xf32, #tt.memory_space<l1>>
>
>

tensor<5x3x2x2x7x32x32xf32,
#tt.layout<
#tt.metal_layout<
(d0, d1, d2, d3, d4, d5, d6)
-> (d0 * 2688 + d1 * 896 + d2 * 448 + d3 * 224 + d4 * 32 + d5, d4, d5, d6),
undef,
Expand All @@ -284,36 +284,36 @@ def TT_LayoutAttr : TT_Attr<"Layout", "layout"> {
let assemblyFormat = "`<` $linear`,` $oob_val`,` $grid`,` $memref (`,` $mem_layout^)? `>`";

let extraClassDeclaration = [{
static LayoutAttr get(::mlir::MLIRContext *context,
static MetalLayoutAttr get(::mlir::MLIRContext *context,
ArrayRef<int64_t> tensorShape,
Type elementType,
MemorySpace memorySpace = MemorySpace::System,
GridAttr grid = {},
ArrayRef<std::pair<std::int64_t, std::int64_t>> collapseIntervals = {{0, -1}},
OOBVal oobVal = OOBVal::Undef,
TensorMemoryLayout memLayout = TensorMemoryLayout::None);
static LayoutAttr get(::mlir::MLIRContext *context,
static MetalLayoutAttr get(::mlir::MLIRContext *context,
RankedTensorType ty,
MemorySpace memorySpace = MemorySpace::System,
GridAttr grid = {},
ArrayRef<std::pair<std::int64_t, std::int64_t>> collapseIntervals = {{0, -1}},
OOBVal oobVal = OOBVal::Undef,
TensorMemoryLayout memLayout = TensorMemoryLayout::None);
static LayoutAttr get(::mlir::MLIRContext *context,
static MetalLayoutAttr get(::mlir::MLIRContext *context,
RankedTensorType ty,
MemorySpace memorySpace,
GridAttr grid,
Type elementType,
TensorMemoryLayout memLayout = TensorMemoryLayout::None);
LayoutAttr withGrid(::mlir::MLIRContext *context, ArrayRef<int64_t> tensorShape, GridAttr grid, ArrayRef<std::pair<std::int64_t, std::int64_t>> collapseIntervals = {{0, -1}});
LayoutAttr withGrid(::mlir::MLIRContext *context,
MetalLayoutAttr withGrid(::mlir::MLIRContext *context, ArrayRef<int64_t> tensorShape, GridAttr grid, ArrayRef<std::pair<std::int64_t, std::int64_t>> collapseIntervals = {{0, -1}});
MetalLayoutAttr withGrid(::mlir::MLIRContext *context,
RankedTensorType ty,
GridAttr grid,
ArrayRef<std::pair<std::int64_t, std::int64_t>> collapseIntervals = {{0, -1}});
LayoutAttr withElementType(::mlir::MLIRContext *context, Type elementType);
LayoutAttr withMemorySpace(::mlir::MLIRContext *context, MemorySpace memorySpace);
LayoutAttr withMemoryLayout(::mlir::MLIRContext *context, TensorMemoryLayout memLayout);
LayoutAttr withShardShape(::mlir::MLIRContext *context, llvm::SmallVector<int64_t> shardShape);
MetalLayoutAttr withElementType(::mlir::MLIRContext *context, Type elementType);
MetalLayoutAttr withMemorySpace(::mlir::MLIRContext *context, MemorySpace memorySpace);
MetalLayoutAttr withMemoryLayout(::mlir::MLIRContext *context, TensorMemoryLayout memLayout);
MetalLayoutAttr withShardShape(::mlir::MLIRContext *context, llvm::SmallVector<int64_t> shardShape);

uint64_t getMemrefSizeBytes() const;
MemorySpace getMemorySpace() const;
Expand Down Expand Up @@ -400,7 +400,7 @@ def TT_DeviceAttr : TT_Attr<"Device", "device", []> {
// - DeviceL1: This ends up being exactly the shard size
// - DeviceDRAM: Is more nuanced because the whole tensor size gets paged and interleaved between all dram channels,
// due to paging and rounding the footprint ends up being close to: the_whole_tensor / num_dram_channels
uint64_t getLayoutSizeBytes(ArrayRef<int64_t> tensorShape, LayoutAttr layout, MemorySpace memorySpace) const;
uint64_t getLayoutSizeBytes(ArrayRef<int64_t> tensorShape, MetalLayoutAttr layout, MemorySpace memorySpace) const;

// Returns the footprint size in bytes of the tensor distributed across the given memory space.
// Forwards to getLayoutSizeBytes, see comment there for more info.
Expand Down
4 changes: 2 additions & 2 deletions include/ttmlir/Dialect/TTIR/IR/TTIROps.td
Original file line number Diff line number Diff line change
Expand Up @@ -114,8 +114,8 @@ def TTIR_ToLayoutOp : TTIR_Op<"to_layout", [DestinationStyleOpInterface, TTIROpI
- Some combination of the above

```llvm
#layout = #tt.layout<8192x128x1, undef, <1x1>, memref<64x128xf32, #system>>
#layout1 = #tt.layout<8192x128x1, undef, <1x1>, memref<64x128xf32, #l1_>>
#layout = #tt.metal_layout<8192x128x1, undef, <1x1>, memref<64x128xf32, #system>>
#layout1 = #tt.metal_layout<8192x128x1, undef, <1x1>, memref<64x128xf32, #l1_>>
%1 = "ttir.to_layout"(%arg0, %0) : (tensor<64x128xf32, #layout>, tensor<64x128xf32, #layout1>) -> tensor<64x128xf32, #layout1>
```
}];
Expand Down
11 changes: 6 additions & 5 deletions include/ttmlir/Target/Utils/MLIRToFlatbuffer.h
Original file line number Diff line number Diff line change
Expand Up @@ -18,8 +18,9 @@
namespace mlir::tt {

flatbuffers::Offset<::tt::target::LayoutDesc>
layoutAttrToFlatbuffer(FlatbufferObjectCache &cache, LayoutAttr attr,
ArrayRef<int64_t> logicalShape, DeviceAttr deviceAttr);
metalLayoutAttrToFlatbuffer(FlatbufferObjectCache &cache, MetalLayoutAttr attr,
ArrayRef<int64_t> logicalShape,
DeviceAttr deviceAttr);

flatbuffers::Offset<::tt::target::LayoutDesc> ttnnLayoutAttrToFlatbuffer(
FlatbufferObjectCache &cache, ttnn::TTNNLayoutAttr attr,
Expand Down Expand Up @@ -438,9 +439,9 @@ toFlatbuffer(FlatbufferObjectCache &cache, ElementsAttr elementsAttr) {
inline flatbuffers::Offset<::tt::target::LayoutDesc>
encodingToFlatbuffer(FlatbufferObjectCache &cache, Attribute attr,
ArrayRef<int64_t> logicalShape, DeviceAttr deviceAttr) {
if (isa<LayoutAttr>(attr)) {
return layoutAttrToFlatbuffer(cache, cast<LayoutAttr>(attr), logicalShape,
deviceAttr);
if (isa<MetalLayoutAttr>(attr)) {
return metalLayoutAttrToFlatbuffer(cache, cast<MetalLayoutAttr>(attr),
logicalShape, deviceAttr);
}

assert(isa<ttnn::TTNNLayoutAttr>(attr) && "unsupported layout attr");
Expand Down
16 changes: 8 additions & 8 deletions lib/CAPI/TTAttrs.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -119,15 +119,15 @@ MlirAttribute ttmlirTTSystemDescAttrGet(
chipCapabilitiesUnwrapped, chipCoordsUnwrapped, chipChannelsUnwrapped));
}

MlirAttribute ttmlirTTLayoutAttrGet(MlirContext ctx, MlirAffineMap linear,
unsigned oobVal, MlirAttribute grid,
MlirType memref, unsigned memLayout) {
MlirAttribute ttmlirTTMetalLayoutAttrGet(MlirContext ctx, MlirAffineMap linear,
unsigned oobVal, MlirAttribute grid,
MlirType memref, unsigned memLayout) {
mlir::AffineMap affineMap = mlir::AffineMap::getFromOpaquePointer(linear.ptr);
return wrap(LayoutAttr::get(unwrap(ctx), affineMap,
static_cast<OOBVal>(oobVal),
mlir::cast<GridAttr>(unwrap(grid)),
mlir::cast<MemRefType>(unwrap(memref)),
static_cast<TensorMemoryLayout>(memLayout)));
return wrap(MetalLayoutAttr::get(unwrap(ctx), affineMap,
static_cast<OOBVal>(oobVal),
mlir::cast<GridAttr>(unwrap(grid)),
mlir::cast<MemRefType>(unwrap(memref)),
static_cast<TensorMemoryLayout>(memLayout)));
}

MlirAttribute ttmlirTTMemorySpaceAttrGet(MlirContext ctx,
Expand Down
Loading
Loading