-
Notifications
You must be signed in to change notification settings - Fork 6.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow bfloat16 computations on compatible CPUs with Intel Extension for PyTorch #3649
base: master
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Going to chime in here since I did significant work on the XPU side of IPEX for ComfyUI. This patch basically turns on CPU mode for IPEX, doesn't it? I have been meaning to write a patch for something like this for a while so thanks for doing the work to enable this. Had a few comments and nudges on things that could be improved but nothing else looks terribly wrong and I think this will improve everyone's experience with running the project although I am not sure if the bar to get that speed is enough to make it a default option for people to try, IPEX does have a minimum requirement of AVX2 needed on the CPU in order to even work. I would also suggest changing the README too to note this is available. Hopefully, when @comfyanonymous is less busy with things, he can take a look at the PR.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as resolved.
This comment was marked as resolved.
Modern CPUs have native AVX512 BF16 instructions, which significantly improves matmul and conv2d operations. With Bfloat16 instructions UNET steps are 40-50% faster on both AMD and Intel CPUs. There are minor visible changes with bf16, but no avalanche effects, so this feature is enabled by default with new `--use-cpu-bf16=auto` option. It can be disabled with `--use-cpu-bf16=no`. Signed-off-by: Sv. Lockal <[email protected]>
While testing with Flux, I discovered few interesting things:
So I reworked patch so that there is no requirement for ipex-for-cpu anymore. After checking with flux-schnell (which is already distributed in bf16-format):
|
Modern CPUs have native AVX512 BF16 instructions, which significantly improves matmul and conv2d operations.
With Bfloat16 instructions UNET steps are 40-50% faster on both AMD and Intel CPUs.
There are minor visible changes with bf16, but no avalanche effects, so this feature is enabled by default with new
--use-cpu-bf16=auto
option.It can be disabled with
--use-cpu-bf16=no
.With the following command (note: ComfyUI never mention this, but setting correct environment variables is highly important, see this page), KSampler node is almost 2 times faster (also memory usage is proportionally smaller):
--use-cpu-bf16=no
- 1.68s/it--use-cpu-bf16=auto
- 1.22it/s