Skip to content

Commit

Permalink
MatmulWeightsDecompression tests extended with group decompression an…
Browse files Browse the repository at this point in the history
…d nf4 precision
  • Loading branch information
v-Golubev committed Oct 6, 2023
1 parent a67c106 commit 32ff879
Show file tree
Hide file tree
Showing 3 changed files with 145 additions and 78 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -210,7 +210,7 @@ void Transformations::PreLpt(const std::vector<ov::element::Type>& defaultPrecis
// We need to fuse Transpose to MatMul to have a simpler callback for the next transformation
CPU_REGISTER_PASS_COMMON(manager, ov::pass::TransposeMatMul);
// MarkDequantizationSubgraph is used even in non-LPT pipeline on X64 platforms
// in order to keep compressed u8 MatMul weights with decompression operations as is
// in order to keep compressed MatMul weights with decompression operations as is
CPU_REGISTER_PASS_X64(manager, ov::pass::MarkDequantizationSubgraph, ov::element::TypeVector{ov::element::u8, ov::element::nf4}, true);
CPU_SET_CALLBACK_X64(manager, [](const_node_ptr &node) -> bool {
auto get_single_consumer = [](const_node_ptr &node) -> std::shared_ptr<ov::Node> {
Expand Down
Loading

0 comments on commit 32ff879

Please sign in to comment.