Skip to content

Commit

Permalink
[TE][TIR] Implement layout transformations, non-flat memory buffers (a…
Browse files Browse the repository at this point in the history
…pache#9727)

* [TIR] Added BufferLoadNode::LegalizeDtype

When modifying a BufferLoad object, the return dtype must also be
updated.  This exposes the legalization function, so that passes that
use `BufferLoad::CopyOnWrite` to modify the buffer/indices don't need
to repeat the logic to update the dtype returned.

* Replacing Store/Load in Stmt/Expr Visitor/Mutator

* Removing Store/Load from optimization passes

- UpdatePointerStorageScope
- UnrollLoop
- ThreadSync
- LinearAccessPatternFinder
- StoragePlanRewriter
- VectorTypeRewriter
- VectorTypeAccessChecker
- NarrowDataType
- IRConvertSSA
- CompactBufferRegion

* Removing Store/Load from examples

- ConvertAddToSubtract

* Replacing Store/Load in StorageFlatten

Now, outputs BufferLoad/BufferStore with a flattened buffer object.

temp commit, replacing Store/Load, BufferBindUnwrapper

temp commit, replacing Store/Load, StorageFlattener

* Replacing Store/Load in utility passes.

- StmtSimplifier
- IRSubstitute
- BaseInliner
- FeatureVisitor

* Replacing Store/Load in analysis functions

- StorageAccessVisitor
- VarTouchedAnalysis
- MemoryAccessVerifier
- InplaceOpVerifier
- GPUCodeVerifier
- VarTouchVisitor
- LCADetector
- BlockReadWriteDetector
- InstrumentBoundCheckers

* Replacing Store/Load in lowering/legalization passes.

- MakeCrossThreadReduction
- CacheReadRewriter/CacheWriteRewriter
- InjectVirtualThread
- InjectDoubleBuffer
- InjectCopyIntrin
- LowerWarpMemory
- LowerThreadAllreduce
- LowerThreadAllreduce
- LowerCustomDatatypes
- LowerTVMBuiltin
- CoProcSync
- MergeDynamicSharedMemAllocations
- VectorizeLoop
- BF16Legalize

* Replacing Load/Store in codegens.

- Device code generators
  - CodegenC
  - CodegenLLVM
  - CodeGenOpenCL

- Utilities used during codegen
  - ArgBinder
  - MakePackedAPI
  - ReturnRewriter
  - SplitHostDevice

- Execution environments
  - CodeGenStackVM
  - CodeGenHybrid
  - AOTExecutorCodegen

* [UnitTest] Add unit tests to test physical layout remapping.

* Updated tvm::address_of() to hold BufferLoad instead of Load.

* [TIR] Added IndexMap class.

Holds a set of variables representing the input indices and
expressions in terms of those input indices.

TODO:

- Add validation, the index mapping should be invertible.
- Add helper function, apply mapping to a set of indices.
- Add helper function, apply mapping to bounds of input indices.

* Updated Buffer::vstore/vload to return BufferLoad/BufferStore objects.

StorageFlatten/FlattenBuffer passes updated to modify the
buffer/indices directly, rather than using vload/vstore.

- Primary purpose of vstore/vload is to allow IR written in python to
  define vectorized load/store.  This usage is maintained by returning
  a BufferLoad/BufferStore node whose index is a Ramp.

- Previously, vstore/vload was also used to compute the 1-d physical
  index of a location within a N-d tensor.  This usage will no longer
  be allowed, as it would not allow layout transformations to be
  performed after a schedule definition, but any uses of the buffer
  are flattened.

* [TE] Added Stage::transform_layout to the C++ TE implementation.

Adds an `Array<IndexMap>` in the stage to define the transformations
to be applied on the tensor's layout.  As of this commit, this mapping
isn't propagated into the TIR graph yet.

* Replace Store/Load with BufferStore/BufferLoad in ir_builder

* [TE] Added Stage.transform_layout to the Python TE interface.

Allows users to specify `s[A].transform_layout(mapping)`, and
propagate into the TE definitions.

* Added pre_flattened_shape/pre_flattened_stride fields to Buffer.

The shape and stride checks performed in ArgBinder::BindDLTensor
(called from MakePackedAPI) require the tensor shape/strides prior to
index flattening.  Therefore, though it is no longer used by the
low-level code generators, we must maintain that information for use
in MakePackedAPI.

* [UnitTest] Test N-d indices exposed to low-level codegen

When using te.AXIS_SEPARATOR in the call to .transform_layout, this
should define groups of axes, each of which is flattened to a single
axis, then exposed to the low-level codegen.

* [TIR] Added PrimFunc attribute "layout_transform_map", filled from TE.

Propagated the TE definition of the physical layout into the TIR
graph.

* Added pre_flattened_type.

If a boolean tensor is backed by an int8 buffer, the check on the
argument buffer's type should be against the boolean type.

When rebasing this PR, should be placed after the addition of
pre_flatten_shape/pre_flatten_strides.

* [UnitTest] Added tests for loop iteration order.

After transformation, the iteration order should follow the new
transformed axes.  In addition, the loop iteration variables should be
exposed through the TE interface for further manipulation.

* [TIR] Added BufferNode::axis_separators

- Add axis_separators to represent divisions between groups
  of tensor axes, where each group is flattened into a single
  output axis, to be exposed to the low-level code generators.

- Expose axis_separators to the python interface.

- Update existing C++ calls to the Buffer() constructor.

* [TIR] Added ApplyLayoutTransforms as part of StorageFlatten.

For any buffers that have layout transforms defined in the
"layout_transform_map" attribute of a PrimFunc, rewrite access into
the buffer such that they use the updated ordering.

* Update usage of ir_builder where necessary.

* [TE] Implement te::Transform

Similar to Fuse and Split, this represents a modification to the
existing loop iterations.

* [TE] Added Stage::set_axis_separators.

In C++, this is implemented as an `Array<IntImm>`, specifying
pre-flatteneing axes after which a new post-flattening should be
started.  The python interface uses a sentinel value
`te.AXIS_SEPARATOR` in the call to `transform_layout`, which is then
used to define the array of axis separators.

* [TIR] Expose tir.transform.ApplyLayoutTransforms for testing

* [TE] Rewrite loop iteration order

After .transform_layout, rewrite leaf_iter_vars to follow the updated
order.  Use the te::Transform iter_var relationship to track use of
the transformed variable.

* [TE] Fill BufferNode::axis_separators from StageNode

During ScheduleOps and SchedulePostprocToPrimfunc, the axis separators
defined in the stage must be passed through to the TIR BufferNode.

* [TE] Return transformed iteration variables

* Moved Buffer's pre-flatten information to PrimFunc.

Since the pre-flatten information is only used for validating user
inputs, it makes much more sense to store it alongside the buffer_map.

* Updated ethos-u C++ unit tests to remove use of Load/Store.

* Bugfix, layout transformation.

Error occured during conversion from TE to IRModule, when layout
transforms were applied to a reader of a `cache_read`.

* In test directory, replacing all instances of T.load.

* Return buffer object from tvm.tir.script.scope_handler.Allocate

Now that the load/store require buffer objects, allocation should also
return a buffer object to be used.

* Added .astype to tvm.script.tir.node.BufferSlice

Since `buf[i]` returns a `BufferSlice`, this lets the TIR examples
that use `buf[i].astype('out_dtype')` continue functioning.

* Replacing all T.store TIR calls.

* Added LOG(FATAL) in constructor of Store/Load nodes.

* Updated tvmscript parser to report error for Store/Load nodes.

* [TVMScript] Added T.preflattened_buffer stmt

Used to specify `PrimFunc::preflattened_buffer_map`. Takes an argument
of the postflattened buffer, so that it will work for both simple
declarations and `T.match_buffer` statements without needing to
introduce a param handle.  All other arguments are identical to
`T.match_buffer.`

* [TVMScript] Updated TVMscript for BufferLoad/BufferStore

- Use `T.preflattened_buffer` calls in TVMScript to represent
  `PrimFunc::preflattened_buffer_map`.

- Remove `T.buffer_decl` for return value of `T.allocate`, now that
  `T.allocate` returns a buffer.

- For buffer access as a different type, make a `T.buffer_decl` for
  those accesses.

* Updated test_tvmscript_roundtrip.py for BufferLoad/BufferStore.

* Updated TIR reference in USMP pool allocation unit tests.

Using let var handles as the data pointer in buffers, rather than just
as `T.load`/`T.store` arguments, requires annotation as
`T.Ptr[T.primtype]`, rather than as `T.handle`.

* fixup! Return buffer object from tvm.tir.script.scope_handler.Allocate

* fixup! Return buffer object from tvm.tir.script.scope_handler.Allocate

* fixup! Replacing all T.store TIR calls.

* fixup! Replacing all T.store TIR calls.

* fixup! Return buffer object from tvm.tir.script.scope_handler.Allocate

* fixup! In test directory, replacing all instances of T.load.

* tir.ComputeInline, correct variable count.

Previously, this metaschedule primitive relied on `tir::UndefinedVars`
ignoring the data pointer of BufferLoad/BufferStore nodes.  When
`tir::UndefinedVars` was updated to visit the data pointer, similar to
the previous behavior when visiting Load/Store nodes, this caused the
count of undefined variables to be unexpectedly high.

* fixup! Replacing all T.store TIR calls.

* fixup! Updated Buffer::vstore/vload to return BufferLoad/BufferStore objects.

* fixup! In test directory, replacing all instances of T.load.

* fixup! In test directory, replacing all instances of T.load.

* fixup! Replacing all T.store TIR calls.

* Expose Buffer index flattening function to Python.

* Updated test_tir_buffer.py offset tests.

Replacing calls to `Buffer.vload` with `Buffer.offset_of`, when
testing the index calculations.

* fixup! Replacing all T.store TIR calls.

* fixup! Replacing all T.store TIR calls.

* fixup! Updated Buffer::vstore/vload to return BufferLoad/BufferStore objects.

* fixup! Replacing Store/Load in lowering/legalization passes.

* fixup! Replacing all T.store TIR calls.

* fixup! Updated ethos-u C++ unit tests to remove use of Load/Store.

* fixup! Replacing Store/Load in lowering/legalization passes.

Fix linting for inject_double_buffer.cc

* fixup! Updated ethos-u C++ unit tests to remove use of Load/Store.

* fixup! Added .astype to tvm.script.tir.node.BufferSlice

* fixup! In test directory, replacing all instances of T.load.

* fixup! Replacing all T.store TIR calls.

* fixup! Replacing all T.store TIR calls.

* fixup! In test directory, replacing all instances of T.load.

* fixup! Replacing all T.store TIR calls.

* fixup! Replacing Store/Load in lowering/legalization passes.

* [UnitTests] Added T.preflattened_buffer in expected result

* fixup! In test directory, replacing all instances of T.load.

* [UnitTests] Bound checker update, compare against N-d buffer bounds.

* Fixup, bound checker vectorize test.

* fixup! Return buffer object from tvm.tir.script.scope_handler.Allocate

* [UnitTest] Fixed breakage in InjectRollingBuffer test.

Needed a bit more re-writing than usual, because the test was
explicitly calling lowering passes, then calling `tvm.build`.  Fixed
by using the standard lowering flow, with preprocessing steps
inserting with `tir.add_lower_pass`.

* fixup! Return buffer object from tvm.tir.script.scope_handler.Allocate

* [UnitTest] Fixed breakage in flatten buffer unit tests.

- Updated pass to allow BufferStore/BufferLoad nodes to be visited
  before the block's alloc buffer.

- Added `T.preflattened_buffer` annotations.

* fixup! Return buffer object from tvm.tir.script.scope_handler.Allocate

* [UnitTests] Fixed breakage in test_tir_buffer.py

- Updated vload test for new behavior.
- Added test for offset_of, testing behavior no longer in vload.
- Added null check for buffer visitor.

* fixup! Replacing Load/Store in codegens.

* [UnitTest] ComputeInline, opaque access test updates

* [UnitTest] Fixup, allow unit test to use `ib.pointer()[0]`.

* fixup! Replacing Load/Store in codegens.

The updated CodegenLLVM should use the BufferStore/BufferLoad
convention of indexing by `sizeof(dtype)`, rather than
`sizeof(dtype.element_of())`.

* fixup! Replacing Store/Load in lowering/legalization passes.

BF16Legalize should also update the preflattened_buffer_map, since it
is overwriting the `BufferNode::data` stored in the buffer_map.

* fixup! Replacing all T.store TIR calls.

* Fixed failing codegen c host unit tests.

- Generated functions were making `uint8_t*` parameter arguments for
  array handle for return value, rather than the earlier `void*`.

- New parameter type was due to using
  `PointerType(PrimType(DataType::UInt(8)))` as the type annotation, to
  be usable as `BufferNode::data`.

- Changing to `PointerType(PrimType(DataType::Void()))` still allows
  usage as buffer, more appropriately expresses semantics.

- Updated C codegens to allow `void*` types to be generated from
  variables with type annotation, in addition to the previous behavior
  of `DataType::Handle()` variables without type annotation.

* Fixup, StorageFlatten when applied to post-StorageRewrite functions.

Identified in a test that applied `tvm.lower`, then `tvm.build` on the
result.  If the result of an allocate node is used as the backing
buffer for multiple buffers, such as the output of the StorageRewrite
pass, then StorageFlatten would erroneously think that the second
occurrence was an usage without earlier definition.

* fixup, StorageFlatten

When flattening a boolean buffer, the backing buffer should have type
int8, not the preflattened buffer.

* Bugfix, correctly represent void* in LLVM IR.

* Update, replace tir.Load with tir.BufferLoad

* Added TVMScript error check for matching buffer/index dimensionality

Needed for tests/python/unittest/test_tvmscript_error_report.py::test_high_dim_store

* Bugfix, correct return type when lowering custom datatype.

* Bugfix, removed unused primfunc from test_tvmscript_complete.py

* Updated test_meta_schedule_postproc_verify_gpu_code.py TIR

Replaced Load/Store with BufferLoad/BufferStore.

* Allowed ramp nodes with buffer use analysis.

* Updated tests in test_meta_schedule_postproc_verify_gpu_code.py

Needed dummy writes to prevent buffer resizing, in order to trigger
the verification failure due to memory limits.

* Updated TIR examples to be compatible with buffer dimension check.

* Corrected section header in docstring.

* Corrected indices size check in CogeGenC.

* Fixed breakage in LowerThreadAllreduce.

Since the AllocateNode is rewritten, any buffers that refer to those
variables must also be rewritten.

* [UnitTests] Replaced Store/Load in CUDA codegen tests.

* Resolved breakage in C-based codegen for vectorized store/load.

Needed to update to new convention of using the buffer's element type
as the stride.

* Bugfix, incorrect LCA for buffer access in root scope.

This had been present before the BufferLoad/BufferStore changes, but
hadn't triggered on tests using Load/Store nodes.

* Added docstrings for TransformNode member variables.

* Added TODO for future removal of preflattened_buffer_map.

* Fixup, transform layout + cache write tests.

The correct sequence is to first apply any caching as needed, then to
apply layout transformations, and finally to apply thread binds for
the computation step.

* Bugfix, correct element type for scalarized access.

* Bugfix, cuda buffer indexing when declared as different type.

* Cuda codegen, update reference.

* Bugfix, lower allreduce

Loads of the output of the reduction should be replaced for all
buffers sharing a buffer pointer, not just for the buffer object
itself.

* Removed obsolete comment.

* Changed PrimFunc constructor preflattened_buffer_map to Optional

* Removed flatten_buffer argument from T.match_buffer.

* Correct call to VarUseDefAnalysis::VisitBuffer

* Reverted unintentional testing change, lanes=2.

* Updated lower_cross_thread_reduction to use buffer in allreduce

* Updated transform_layout test to disable CSE

* Updated CSE unit tests to use BufferStore

* Replaced Store/Load for vta.transform and unit tests.

* Updated unit tests for lower_cross_thread_reduction.

* Updated arange to use scalar tensors.

The start/stop/step tensors are declared as 0-d scalar tensors, but
were accessed as 1-d tensors.

* Fix breakage in ethosu constant encoding.

Buffers generated by "ethosu_copy" should have their buffer objects
rewritten, but shouldn't have their size updated in ethosu-specific
Call nodes.

* Fix breakage in ethosu call argument checks.

Need to pull out indices from BufferLoad holders, not Load.

* Resolve breakage from mismatched shape/index dimensions

* Split out encoded parameters from preflattened buffer map.

* Updated buffer shape/index dimensions to match in more ethosu tests

* Fixed lint error

* Removed debug code

* Moved arith::Analyzer local variable to class member

* Fixed SSA conversion of allocations.

Can occur if allocation is inside an unrolled loop.  Added unit test
to catch this failure mode.

* Ethos-u index/buffer dimension updates.

* Updated ethosu passes to handle buffer load/store.

* Resolved bug in tvmscript printing of duplicate buffers.

* Fix breakage in ethos-u test_assign_addresses, encode constants

* Apply same changes to T.allocate_const as to T.allocate

Return a buffer when used in TVMScript, allow for aliasing buffers.

* Fix lint errors.

* Further updates for ethos-u tests.

* Updated ethos.u buffer sizes in test.

* Updated tir.BindParams to use BufferLoad instead of Load.

* Updated topi.cuda.scan implementation to follow buffer dimensions.

* Resolved breakage when flattening AllocateConst nodes.

* Resolved breakages from latest merge with main.

* Corrected error in merge.

* Use empty indices for rank-0 tensor.

* Added ir_builder workaround for 1-d indexing.

* Consistent buffer access type in LLVM codegen, to match C codegen

* StorageRewrite, update indices of modified buffers.

* Dynamic relay nodes, access 0-d tensors with 0-d indices.

* BFloat16 legalization, update buffer type.

* Updated meshgrid to use 0-d index for 0-d buffer.

* Corrected boolean handling in Allocate nodes.

* Added workaround to unpack 1-d Tensor indices into N-d buffer indices.

* Resolved a few more failures in relay tests on cuda.

* Resolve linting

* CI bump

* Updated renormalize_split_pattern tests to use BufferLoad/BufferStore

* Fixed cuda codegen checks for BufferStore/Ramp.

* Simplify indices further, needed to avoid cuda register limit.

* fixed dyn onehot shape func accessing 1d buffer with ()

* Fixed codegen indexing for int4 scalar types.

* Temporary workaround for incorrect constant folding.

Need to further investigate vectorized LLVM constants

* s/find_allocate_usage/FindAllocateUsage/g

* Added buffer type consistency TODO.

* Improved comment on address_of Op.

* Rename LegalizeDtype to LegalizeDType, made private.

* fix format and lint errors

* Disable vectorization of AllocateConst buffer in StorageRewrite.

* Pass buffer_map through to the PrimFunc in cmsisnn

* try disabling problematic winograd test case

* try different way of buffer mapping in storage_rewrite

* Removed unnecessary ramp node in ir_builder.


* Updated LLVM codegen for buffer indexing.

TVM data arrays are always densely packed.  If the LLVM type
corresponding to a vectorized TVM datatype contains padding for
alignment, the array location should be computed based on the
primitive element type.


Co-authored-by: Masahiro Masuda <[email protected]>
Co-authored-by: adstraw <[email protected]>
  • Loading branch information
3 people authored and pfk-beta committed Apr 11, 2022
1 parent e45e3e8 commit 9ac0428
Show file tree
Hide file tree
Showing 185 changed files with 9,634 additions and 5,779 deletions.
41 changes: 41 additions & 0 deletions include/tvm/ir/attrs.h
Original file line number Diff line number Diff line change
Expand Up @@ -382,6 +382,47 @@ inline TFunc WithAttrs(TFunc input, Map<String, ObjectRef> attrs) {
return input;
}

/*!
* \brief Copy the function or module, but removes the specified
* attribute.
*
* \param input The thing to annotate (BaseFunc or IRModule)
* \param attr_key The attribute key.
*
* \tparam TFunc The corresponding function or module type.
*
* \returns The new function or module with removed attribute.
*
* \note This function performs copy on write optimization for func and module.
* If we move a uniquely referenced func or module into WithoutAttr,
* then no additional copy will be performed.
*
* This is also why we make it as a function instead of a member function
* and why we pass by value in the first argument.
*
* \code
*
* // Recommended way to trigger copy on write
* func = WithoutAttr(std::move(func), "key1");
* func = WithoutAttr(std::move(func), "key2");
*
* \endcode
*/
template <typename TFunc>
inline TFunc WithoutAttr(TFunc input, const std::string& attr_key) {
using TNode = typename TFunc::ContainerType;
static_assert(TNode::_type_final, "Can only operate on the leaf nodes");

if (input->attrs.defined()) {
TNode* node = input.CopyOnWrite();
node->attrs.CopyOnWrite()->dict.erase(attr_key);
if (node->attrs->dict.size() == 0) {
node->attrs = NullValue<DictAttrs>();
}
}
return input;
}

// Namespace containing detail implementations
namespace detail {
using runtime::TVMArgValue;
Expand Down
1 change: 1 addition & 0 deletions include/tvm/te/operation.h
Original file line number Diff line number Diff line change
Expand Up @@ -268,6 +268,7 @@ class ComputeOp : public Operation {
Array<IterVar> axis, Array<PrimExpr> body);

TVM_DEFINE_OBJECT_REF_METHODS(ComputeOp, Operation, ComputeOpNode);
TVM_DEFINE_OBJECT_REF_COW_METHOD(ComputeOpNode);
};

/*!
Expand Down
123 changes: 121 additions & 2 deletions include/tvm/te/schedule.h
Original file line number Diff line number Diff line change
Expand Up @@ -29,6 +29,7 @@
#include <tvm/te/tensor.h>
#include <tvm/te/tensor_intrin.h>
#include <tvm/tir/expr.h>
#include <tvm/tir/index_map.h>

#include <string>
#include <unordered_map>
Expand Down Expand Up @@ -256,6 +257,41 @@ class Stage : public ObjectRef {
* \return reference to self.
*/
TVM_DLL Stage& rolling_buffer(); // NOLINT(*)
/*!
* \brief Defines a layout transformation to be applied to the buffer.
*
* The map from initial_index to final_index must be an
* invertible affine transformation.
*
* \param initial_indices An array of variables to represent a
* value's location in the tensor, using the pre-transformation
* layout. These variables are used as binding occurrences to
* represent the initial indices when applying the initial->final
* mapping, and should not occur elsewhere in the
* Schedule. (i.e. Pass in newly constructed variables, not the
* initial IterVar::var)
*
* \param final_indices An array of expressions, giving the
* value's location in the tensor, using the post-transformation layout.
* Expressions should be in terms of the variables given in
* initial_indices.
*
* \param out_iter_vars An optional output location for the updated
* loop iteration variables.
*
* \return reference to self
*/
TVM_DLL Stage& transform_layout(const Array<Var>& initial_indices,
const Array<PrimExpr>& final_indices,
Array<IterVar>* out_iter_vars = nullptr);
/*! \brief Defines separators between groups of axes.
*
* Used to define `BufferNode::axis_separators`, which has
* additional details.
*
* \param axis_separators A list of axis separators.
*/
TVM_DLL Stage& set_axis_separators(const Array<IntImm>& axis_separators);
/*!
* \brief whether the stage has been scheduled.
* \return whether the stage has been scheduled.
Expand Down Expand Up @@ -466,9 +502,27 @@ class StageNode : public Object {
* while origin_op remains fixed.
*/
Operation origin_op;
/*! \brief All the nodes in the iter var */
/*! \brief All the nodes in the iter var
*
* Each element of all_iter_vars represents an iteration variable
* that may appear within this stage's computation. Any element
* of `all_iter_vars` that is in `leaf_iter_vars` represents a
* variable that is directly defined and usable within the stage's
* computation. All other elements of `all_iter_vars` represent
* variables whose value must be computed from the variables in
* `leaf_iter_vars`. (e.g. Support index k has been split by
* ``ko, ki = s.split(k, factor=4)``. ko and ki will appear in
* `leaf_iter_vars`, while k will not, and must be computed as
* `4*ko + ki`.
*/
Array<IterVar> all_iter_vars;
/*! \brief The current active leaf iter vars in the stage. */
/*! \brief The current active leaf iter vars in the stage.
*
* Each element of leaf_iter_vars will either be replaced with the
* bound index (e.g. threadIdx.x), or will be expanded into a loop
* over the variable's extent. `leaf_iter_vars` is a subset of
* `all_iter_vars`.
*/
Array<IterVar> leaf_iter_vars;
/*!
* \brief Specify threads to be launched at the stage.
Expand Down Expand Up @@ -500,6 +554,14 @@ class StageNode : public Object {
bool double_buffer{false};
/*! \brief Whether apply rolling buffer optimization to this stage */
bool rolling_buffer{false};
/*! \brief Layout transformations to be applied onto the stage's tensors. */
Array<IndexMap> layout_transforms;
/*! \brief List of axes after which to divide physical axes.
*
* Used to populate `BufferNode::axis_separators`, which has
* additional details.
*/
Array<IntImm> axis_separators;
/*!
* \brief The parent group of the current stage.
* The stage cannot be assigned to stages outside the group.
Expand All @@ -522,6 +584,8 @@ class StageNode : public Object {
v->Visit("scope", &scope);
v->Visit("is_output", &is_output);
v->Visit("double_buffer", &double_buffer);
v->Visit("layout_transforms", &layout_transforms);
v->Visit("axis_separators", &axis_separators);
v->Visit("group", &group);
v->Visit("num_child_stages", &num_child_stages);
}
Expand Down Expand Up @@ -771,6 +835,61 @@ class Singleton : public IterVarRelation {
TVM_DEFINE_OBJECT_REF_METHODS(Singleton, IterVarRelation, SingletonNode);
};

/*!
* \brief Transform iterator according to some arbitrary expression.
*/
class TransformNode : public IterVarRelationNode {
public:
/*! \brief The loop variables that were replaced by the transformation.
*
* Prior to applying a layout transformation, these represent the
* loops to iterate over a tensor as it is being computed, following
* a row-major traversal of the tensor's original shape in the
* compute definition.
*/
Array<IterVar> original_variables;

/*! \brief The variables generated by the transformation.
*
* After to applying a layout transformation, these represent the
* loops to iterate over a tensor as it is being computed, following
* a row-major traversal of the transformed shape of the tensor.
*/
Array<IterVar> transformed_variables;

/*! \brief Map from the original variables to the transformed variables.
*
* Used to determine iterator ranges over the transformed variables.
*/
IndexMap forward_transformation;

/*! \brief Map from transformed variables to the original variables
*
* Used to rewrite expressions containing the original loop iterators
* in terms of the transformed loop iterators.
*/
IndexMap inverse_transformation;

void VisitAttrs(AttrVisitor* v) {
v->Visit("original_variables", &original_variables);
v->Visit("transformed_variables", &transformed_variables);
v->Visit("forward_transformation", &forward_transformation);
v->Visit("inverse_transformation", &inverse_transformation);
}

static constexpr const char* _type_key = "Transform";
TVM_DECLARE_FINAL_OBJECT_INFO(TransformNode, IterVarRelationNode);
};

class Transform : public IterVarRelation {
public:
TVM_DLL explicit Transform(Array<IterVar> original_variables,
Array<IterVar> transformed_variables, IndexMap forward_transformation,
IndexMap inverse_transformation);

TVM_DEFINE_OBJECT_REF_METHODS(Transform, IterVarRelation, TransformNode);
};

/*! \brief Container for specialization conditions. */
class SpecializedConditionNode : public Object {
public:
Expand Down
44 changes: 38 additions & 6 deletions include/tvm/tir/buffer.h
Original file line number Diff line number Diff line change
Expand Up @@ -55,8 +55,22 @@ class BufferNode : public Object {
Var data;
/*! \brief data type in the content of the tensor */
DataType dtype;
/*! \brief The shape of the buffer */
/*! \brief The type of the buffer prior to flattening
*
* This contains the shape as it is accessed by
* BufferLoad/BufferStore nodes, and used by the low-level code
* generators.
*/
Array<PrimExpr> shape;
/*!
* \brief Separators between input axes when generating flattened output axes
*
* For buffers representing flat 1-d memory (e.g. any buffer in
* RAM), this should be an empty array. For buffers representing
* non-flat memory, each entry in axis_separators should be the
* first input axis that is part of a new flattened axis.
*/
Array<IntImm> axis_separators;
/*!
* \brief The strides of each dimension
* This can be an empty array, indicating array is contiguous
Expand Down Expand Up @@ -89,6 +103,7 @@ class BufferNode : public Object {
v->Visit("dtype", &dtype);
v->Visit("shape", &shape);
v->Visit("strides", &strides);
v->Visit("axis_separators", &axis_separators);
v->Visit("elem_offset", &elem_offset);
v->Visit("name", &name);
v->Visit("data_alignment", &data_alignment);
Expand All @@ -98,10 +113,11 @@ class BufferNode : public Object {
}

bool SEqualReduce(const BufferNode* other, SEqualReducer equal) const {
// Use DefEqual as buffer can define variables
// in its semantics, skip name as name is not important.
// Use DefEqual as buffer can define variables in its semantics,
// skip name as name is not important.
return equal.DefEqual(data, other->data) && equal(dtype, other->dtype) &&
equal.DefEqual(shape, other->shape) && equal.DefEqual(strides, other->strides) &&
equal.DefEqual(axis_separators, other->axis_separators) &&
equal.DefEqual(elem_offset, other->elem_offset) &&
equal(data_alignment, other->data_alignment) && equal(buffer_type, other->buffer_type);
}
Expand All @@ -112,6 +128,7 @@ class BufferNode : public Object {
hash_reduce.DefHash(shape);
hash_reduce.DefHash(strides);
hash_reduce.DefHash(elem_offset);
hash_reduce.DefHash(axis_separators);
hash_reduce(data_alignment);
hash_reduce(buffer_type);
}
Expand All @@ -127,7 +144,7 @@ class BufferNode : public Object {
* without adjusting for number of lanes. (e.g. The number of
* float16x4 elements in a buffer of type float16x4.)
*/
PrimExpr ElemOffset(Array<PrimExpr> index) const;
Array<PrimExpr> ElemOffset(Array<PrimExpr> index) const;

static constexpr const char* _type_key = "tir.Buffer";
static constexpr const bool _type_has_method_sequal_reduce = true;
Expand All @@ -146,7 +163,7 @@ class Buffer : public ObjectRef {
// A default value will be picked.
TVM_DLL Buffer(Var data, DataType dtype, Array<PrimExpr> shape, Array<PrimExpr> strides,
PrimExpr elem_offset, String name, int data_alignment, int offset_factor,
BufferType buffer_type, Span span = Span());
BufferType buffer_type, Array<IntImm> axis_separators = {}, Span span = Span());

/*!
* \brief Return a new buffer that is equivalent with current one
Expand Down Expand Up @@ -186,6 +203,19 @@ class Buffer : public ObjectRef {
*/
TVM_DLL Stmt vstore(Array<PrimExpr> begin, PrimExpr value) const;

/*!
* \brief Get a flattened version of the buffer
*/
Buffer GetFlattenedBuffer() const;

/*! \brief Determine the offset in the buffer of the given index.
*
* Returns the buffer offset, in number of elements of type dtype,
* without adjusting for number of lanes. (e.g. The number of
* float16x4 elements in a buffer of type float16x4.)
*/
Array<PrimExpr> OffsetOf(Array<PrimExpr> index) const;

/*!
* \brief Return the storage scope associated with this buffer.
*/
Expand All @@ -201,12 +231,14 @@ class Buffer : public ObjectRef {
* \param dtype The content data type.
* \param name The name of the buffer
* \param storage_scope The storage scope associated with this buffer
* \param axis_separators Divisions defining the groups of axes that will be flattened together.
* \param span The location of this object in the source code.
* \return The created buffer.
* \sa Buffer for complete constructor.
*/
TVM_DLL Buffer decl_buffer(Array<PrimExpr> shape, DataType dtype = DataType::Float(32),
String name = "buffer", String storage_scope = "", Span span = Span());
String name = "buffer", String storage_scope = "",
Array<IntImm> axis_separators = {}, Span span = Span());

/*!
* \brief Base node for data producers.
Expand Down
11 changes: 8 additions & 3 deletions include/tvm/tir/builtin.h
Original file line number Diff line number Diff line change
Expand Up @@ -105,10 +105,15 @@ TVM_DLL const Op& large_uint_imm();
TVM_DLL const Op& q_multiply_shift();

/*!
* \brief See pesudo code
* \brief Returns the address of an element in the buffer (see pseudocode below).
*
* The number of indices should match the dimensionality of the buffer
* being accessed. If this operation occurs after buffer flattening,
* the number of indices must be supported by the target (i.e. N>1
* only on targets that support non-flat memory buffers).
*
* Handle address_of(Load *op) {
* return &op->buffer_var[index];
* Handle address_of(BufferLoad *op) {
* return &op->buffer_var[op->indices[0], op->indices[1], ..., op->indices[N-1]];
* }
*/
TVM_DLL const Op& address_of();
Expand Down
16 changes: 16 additions & 0 deletions include/tvm/tir/expr.h
Original file line number Diff line number Diff line change
Expand Up @@ -630,6 +630,22 @@ class BufferLoadNode : public PrimExprNode {

static constexpr const char* _type_key = "tir.BufferLoad";
TVM_DECLARE_FINAL_OBJECT_INFO(BufferLoadNode, PrimExprNode);

private:
/*! \brief Set the dtype based on the buffer/indices
*
* Usually, the BufferLoad's dtype will be the same dtype as the
* buffer. This may have a different number of lanes than the
* buffer's dtype if index values have more than 1 lane.
*
* This function should only be called during construction and after
* CopyOnWrite. Friend class used here to restrict usage.
*/
void LegalizeDType();
friend class BufferLoad;
friend class CustomDatatypesLowerer;
friend class VectorTypeRewriter;
friend class Vectorizer;
};

/*!
Expand Down
Loading

0 comments on commit 9ac0428

Please sign in to comment.