Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CPU] Add RMSNorm jit implementation #26147

Merged

Conversation

luo-cheng2021
Copy link
Contributor

@luo-cheng2021 luo-cheng2021 commented Aug 21, 2024

Details:

  • Jit implementation for RMSNorm, performance data on HBM with batch=2, input=1024(unit is ms):
infer w/o rms     w/ rms    
  all power+reduce updateNodes all rms updateNodes
first token 1994.582 15.591 15.563 1991.487 5.49 13.768
second token 77.507 1.477 2.806 76.326 0.577 2.48
  • ...

Tickets:

@github-actions github-actions bot added category: IE Tests OpenVINO Test: plugins and common category: CPU OpenVINO CPU plugin labels Aug 21, 2024
@luo-cheng2021 luo-cheng2021 added the WIP work in progress label Aug 21, 2024
@luo-cheng2021 luo-cheng2021 marked this pull request as ready for review August 21, 2024 07:32
@luo-cheng2021 luo-cheng2021 requested review from a team as code owners August 21, 2024 07:32
@luo-cheng2021 luo-cheng2021 requested a review from a team as a code owner August 22, 2024 07:30
@luo-cheng2021 luo-cheng2021 requested review from itikhono and removed request for a team August 22, 2024 07:30
@github-actions github-actions bot added the category: transformations OpenVINO Runtime library - Transformations label Aug 22, 2024
@luo-cheng2021 luo-cheng2021 removed the WIP work in progress label Aug 22, 2024
@peterchen-intel peterchen-intel requested a review from usstq August 23, 2024 01:42
@luo-cheng2021 luo-cheng2021 requested a review from usstq August 23, 2024 04:38
Copy link
Contributor

@usstq usstq left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, great work!

@@ -67,8 +67,11 @@ RMSFusion::RMSFusion() {
auto gamma = wrap_type<ov::op::v0::Constant>(type_matches(element::f32));
auto mul2 = wrap_type<ov::op::v1::Multiply>({gamma, mul1});

// compress RMS result
auto comp = wrap_type<ov::op::v0::Convert>({mul2});
std::shared_ptr<ov::Node> comp = mul2;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not a requirement for this PR, but still worth to mention. ConvertPrecision(FP32->FP16) pass keeps normalization subgraph in higher precision to maintain the accuracy, which results in additinal Convert op (FP32->Fp16) in the end of pattern. Ideally even for CPU (with fp16 infer prec) we will need to fuse such Conversion into the RMS for better performance. Basically two possible solutions:

  1. Match two different patterns (with and w/o Convert) in boumds of RMSFusion transformation
  2. Fuse RMS+Convert in separate transformation (ideally it should be generic pass suitable for any parent op)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any reason to implement this test as custom for CPU plugin? I don't see any plugin specific details in it.
Logically it should be shared test with device specific instances for CPU and GPU.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are 2 RMSNorm definitions: rms and rms_norm, I suppose they will be got refactored. May we change it after the refactor was done since the spec will be changed?

src/plugins/intel_cpu/src/nodes/rms_norm.cpp Outdated Show resolved Hide resolved
src/plugins/intel_cpu/src/nodes/rms_norm.cpp Show resolved Hide resolved
src/plugins/intel_cpu/src/nodes/kernels/x64/rms_kernel.hpp Outdated Show resolved Hide resolved
@dmitry-gorokhov dmitry-gorokhov added this to the 2024.4 milestone Aug 27, 2024
@dmitry-gorokhov dmitry-gorokhov added this pull request to the merge queue Aug 27, 2024
Merged via the queue into openvinotoolkit:master with commit 528f85c Aug 27, 2024
142 checks passed
@ilya-lavrenov ilya-lavrenov deleted the luocheng/rms_norm branch August 28, 2024 07:16
alexsu52 pushed a commit to openvinotoolkit/nncf that referenced this pull request Sep 9, 2024
### Changes

as stated in the title
This PR introduced changes in accuracy:
openvinotoolkit/openvino#26147
but it's coming from float error:

![image](https://github.com/user-attachments/assets/e819a14b-3c74-4568-beef-7b43737128a6)

### Reason for changes

As result validation with openvino nightly failed:

![image](https://github.com/user-attachments/assets/23fe22dd-6ec2-4577-98df-f26732c7962a)

### Related tickets

150613
151260


### Tests

- [x] job/NNCF/job/manual/job/post_training_weight_compression/163/
ljaljushkin added a commit to openvinotoolkit/nncf that referenced this pull request Sep 9, 2024
### Changes

as stated in the title
This PR introduced changes in accuracy:
openvinotoolkit/openvino#26147, but it's coming
from float error:


![image](https://github.com/user-attachments/assets/e819a14b-3c74-4568-beef-7b43737128a6)

### Reason for changes

As result validation with openvino nightly failed:


![image](https://github.com/user-attachments/assets/23fe22dd-6ec2-4577-98df-f26732c7962a)

### Related tickets

150613
151260


### Tests

- [x] job/NNCF/job/manual/job/post_training_weight_compression/163/
alexsu52 pushed a commit to openvinotoolkit/nncf that referenced this pull request Nov 11, 2024
…ts (#3065)

### Changes

Revert of #2954
Use [references for the OV
2024.3](https://github.com/ljaljushkin/nncf_pytorch/blob/873fbb80ab07bd41a3b1a2e21f68bc1a7a394810/tests/post_training/data/wc_reference_data.yaml)
when run conformance tests with OV 2024.5.

### Reason for changes

According to the CPU team: 
> 1) In 24.4 time frame, RMS impl has been enabled
(openvinotoolkit/openvino#26147), which
introduce some accuracy change due to vrsqrtss which is done by
approximation.
> 2) In 24.5 time frame, fix has been applied
(openvinotoolkit/openvino#26817) to use vsqrtss
instead. With that we can see the result can be covered to that prior to
openvinotoolkit/openvino#26147
> With that (in 24.5), the accuracy result is recovered to that prior to
RMS impl openvinotoolkit/openvino#26147. Let's
say that 24.4 may has some accuracy issue in some case. 24.5 fix that
issue.

### Related tickets

156605
156776
156237
151260

### Tests

- [x] 47 build of
weekly/job/openvino-nightly/job/post_training_weight_compression

![image](https://github.com/user-attachments/assets/10271e74-4e0b-4506-917a-306481b500e3)
- [x] 48 build for xfail tests

![image](https://github.com/user-attachments/assets/dc5a9424-35fa-4015-965a-ca08fa46e4b7)
nikita-savelyevv pushed a commit to nikita-savelyevv/nncf that referenced this pull request Dec 11, 2024
…ts (openvinotoolkit#3065)

### Changes

Revert of openvinotoolkit#2954
Use [references for the OV
2024.3](https://github.com/ljaljushkin/nncf_pytorch/blob/873fbb80ab07bd41a3b1a2e21f68bc1a7a394810/tests/post_training/data/wc_reference_data.yaml)
when run conformance tests with OV 2024.5.

### Reason for changes

According to the CPU team: 
> 1) In 24.4 time frame, RMS impl has been enabled
(openvinotoolkit/openvino#26147), which
introduce some accuracy change due to vrsqrtss which is done by
approximation.
> 2) In 24.5 time frame, fix has been applied
(openvinotoolkit/openvino#26817) to use vsqrtss
instead. With that we can see the result can be covered to that prior to
openvinotoolkit/openvino#26147
> With that (in 24.5), the accuracy result is recovered to that prior to
RMS impl openvinotoolkit/openvino#26147. Let's
say that 24.4 may has some accuracy issue in some case. 24.5 fix that
issue.

### Related tickets

156605
156776
156237
151260

### Tests

- [x] 47 build of
weekly/job/openvino-nightly/job/post_training_weight_compression

![image](https://github.com/user-attachments/assets/10271e74-4e0b-4506-917a-306481b500e3)
- [x] 48 build for xfail tests

![image](https://github.com/user-attachments/assets/dc5a9424-35fa-4015-965a-ca08fa46e4b7)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
category: CPU OpenVINO CPU plugin category: IE Tests OpenVINO Test: plugins and common category: transformations OpenVINO Runtime library - Transformations
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants