Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support ComputeConfig::unpack_to_dest_mode #879

Closed
nsmithtt opened this issue Oct 9, 2024 · 6 comments · Fixed by #955
Closed

Support ComputeConfig::unpack_to_dest_mode #879

nsmithtt opened this issue Oct 9, 2024 · 6 comments · Fixed by #955
Assignees

Comments

@nsmithtt
Copy link
Contributor

nsmithtt commented Oct 9, 2024

Metal replaced computeConfig.preserve_fp32_precision with unpack_to_dest_mode:

enum class UnpackToDestMode : uint8_t    
{    
    UnpackToDestFp32,    
    Default    
}; 
struct ComputeConfig {
    ...
    std::vector<UnpackToDestMode> unpack_to_dest_mode;
};

We need to update:

  • TTKernelOpsEnums.td: Add UnpackToDestMode enum
  • TTKernelOpsTypes.td: Update TensixConfigAttr w/ new ArrayAttr field unpack_to_dest_mode (one per CB)
  • TTMetal/program.fbs: Update table TensixConfig
  • ttmetal/command_queue.cpp: Plumb through to metal api
@nsmithtt
Copy link
Contributor Author

nsmithtt commented Oct 9, 2024

preserve_fp32_precision was removed as part of:

@kmabeeTT
Copy link
Contributor

Going to look at this now

@kmabeeTT
Copy link
Contributor

Hey @nsmithtt - tt-metal requires the new enum to be set for either none, or all 32 CBs with this assert they have:

    if (!unpack_to_dest_mode.empty()) {
        TT_FATAL(unpack_to_dest_mode.size() == NUM_CIRCULAR_BUFFERS, "unpack_to_dest_mode vector must have 32 elements");
    }

I don't see we have this type of variable NUM_CIRCULAR_BUFFERS=32 replicated on our side anywhere. On their side it comes from here:

tt_metal/hostdevcommon/common_runtime_address_map.h:constexpr static std::uint32_t NUM_CIRCULAR_BUFFERS = 32;

2 questions:

  1. Rather than reach into metal runtime and use this variable on our side (bad idea right?) I think we want to get this officially from tt-metal (into our system-desc) using the same (not yet ready) API that some of our other tt-mlir runtime tickets (like [SystemDesc] List supported tile sizes and data formats #433, Get memory alignment from device #300) are pending on, what do you think?

  2. Since that is pending for a couple months already, I propose to replicate/hardcode value 32 in TTIRToTTMetal.cpp creation of TensixConfigAttr so it can populate all 32 values with UnpackToDescMode::Default to preserve existing behavior today and open ticket to remove workaround once metal provides it via API. Any concerns?

@nsmithtt
Copy link
Contributor Author

Completely agree with both points:

  1. Let's get it properly made to be metal API and plumb it through system desc.
  2. Hard coding for now makes sense.

Seems weird that it must be specified for all circular buffers TBH, could make sense for metal to make this a std::array if it's expected to be a hard coded size.

kmabeeTT added a commit that referenced this issue Oct 21, 2024
 - Pass field (per CB) down from compiler to runtime to metal API
 - Hardcoded to UnpackToDestMode::Default and for all 32 CBs for now
   because tt-metal requires this (will be relaxed soon), replace
   with compiler logic when ready.
@kmabeeTT
Copy link
Contributor

Opened a metal ticket tenstorrent/tt-metal#14066 after offline discussion to request that requirement be removed, then don't need to export up via systemdesc (unless you think we want it anyways, for other purposes).

@nsmithtt
Copy link
Contributor Author

nsmithtt commented Oct 21, 2024

Opened a metal ticket tenstorrent/tt-metal#14066 after offline discussion to request that requirement be removed, then don't need to export up via systemdesc (unless you think we want it anyways, for other purposes).

Sounds good. We might need it anyway to know what the max number of CBs is. What we probably want actually is a breakdown:

  • 8 input
  • 8 dataflow
  • 8 outputs
  • 8 intermediates

This is almost inferable from tt-metal's enum CB.

If the above metal ticket gets fixed, then this issue a much lower priority. We already have to model these CB's in TTKernel dialect which I think would make it impossible to emit more of each type already.

kmabeeTT added a commit that referenced this issue Oct 23, 2024
 - Export NUM_CIRCULAR_BUFFERS from tt-metal into system-desc's ChipDesc
   and use it for assert in tt-mlir compiler to ensure compiler/metal
   view of max number of CB's is aligned.

 - Pass new enum UnpackToDestMode (per CB) down from compiler to
   runtime to metal API
mtopalovicTT added a commit that referenced this issue Oct 28, 2024
* Uplift metal: fixed mesh device include, reshape, and memory constants

* Remove preserve_fp32_precision until it is replaced by unpack_to_dest_mode (issue #879)

* Guard saving artifacts in ttrt performance mode and temporary skip for device perf tests

* Additional fixes after uplift

* Skip failing test for uplift

* Change metal version...

* Fix format

* Remove ttrt perf tests for uplift only

---------

Co-authored-by: Jackson Nie <[email protected]>
Co-authored-by: ddilbaz <[email protected]>
Co-authored-by: Tapasvi Patel <[email protected]>
Co-authored-by: Milan Topalovic <[email protected]>
kmabeeTT added a commit that referenced this issue Oct 28, 2024
 - Most recent tt-metal uplift removed preserve_fp32_precision and
   replaced with unpack_to_dest_mode vector on ComputeConfig

 - Export NUM_CIRCULAR_BUFFERS from tt-metal into system-desc's ChipDesc
   and use it for assert in tt-mlir compiler to ensure compiler/metal
   view of max number of CB's is aligned.

 - Pass new enum UnpackToDestMode (per CB) down from compiler to
   runtime to metal API
kmabeeTT added a commit that referenced this issue Oct 28, 2024
)

* Runtime/Compiler - Support ComputeConfig::unpack_to_dest_mode #879

 - Most recent tt-metal uplift removed preserve_fp32_precision and
   replaced with unpack_to_dest_mode vector on ComputeConfig
 - Export NUM_CIRCULAR_BUFFERS from tt-metal into system-desc's ChipDesc
   and use it for assert in tt-mlir compiler to ensure compiler/metal
   view of max number of CB's is aligned.
 - Pass new enum UnpackToDestMode (per CB) down from compiler to
   runtime to metal API
 - Add num_cbs = 32 to hardcoded system_desc in test_grid_set.mlir. Looks 
    like this test is intending to run with custom soc_desc, annoying.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants