-
Notifications
You must be signed in to change notification settings - Fork 104
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- [Nexfort](#nexfort) - [How to use Nexfort](#how-to-use--nexfort) - [Case 1](#case-1) - [Case 2](#case-2) - [Vae](#vae) - [ComfyUI Workflow](#comfyui-workflow) - [Result](#result) - [Lora](#lora) - [ComfyUI Workflow](#comfyui-workflow-1) - [Result](#result-1) - [Controlnet](#controlnet) - [ComfyUI Workflow](#comfyui-workflow-2) - [Result](#result-2) - [IPAdapter](#ipadapter) ## Nexfort - [x] Vae Speedup - [x] Quick Switching Lora - [x] Controlnet Speedup - [x] 支持编译 IPA https://github.com/cubiq/ComfyUI_IPAdapter_plus - [x] 支持编译 PuLID_ComfyUI https://github.com/cubiq/PuLID_ComfyUI - [x] 支持编译 https://github.com/cubiq/ComfyUI_InstantID - [ ] 支持编译 https://github.com/city96/ComfyUI_ExtraModels - [x] Quick Switching checkpoint ```python cd ComfyUI # For CUDA Graph export NEXFORT_FX_CUDAGRAPHS=1 # For best performance export TORCHINDUCTOR_MAX_AUTOTUNE=1 # Enable CUDNN benchmark export NEXFORT_FX_CONV_BENCHMARK=1 # Faster float32 matmul export NEXFORT_FX_MATMUL_ALLOW_TF32=1 # For graph cache to speedup compilation export TORCHINDUCTOR_FX_GRAPH_CACHE=1 # For persistent cache dir export TORCHINDUCTOR_CACHE_DIR=~/.torchinductor # debug # export TORCH_LOGS="+dynamo" # export TORCHDYNAMO_VERBOSE=1 # export NEXFORT_DEBUG=1 NEXFORT_FX_DUMP_GRAPH=1 TORCH_COMPILE_DEBUG=1 python main.py --gpu-only --disable-cuda-malloc --port 8188 --cuda-device 6 ``` - Install: https://github.com/siliconflow/onediff/tree/main/src/onediff/infer_compiler/backends/nexfort#install-nexfort - torch.__version__='2.4.0.dev20240507+cu124' - nexfort.__version__='0.1.dev215+torch240dev20240507cu121' - commit ffc4b7c30e35eb2773ace52a0b00e0ca5c1f4362 (HEAD -> master, origin/master, origin/HEAD) Author: comfyanonymous <[email protected]> Date: Sat May 25 02:31:23 2024 -0400 ## How to use Nexfort ### Case 1 ```python # Compile arbitrary models (torch.nn.Module) import torch import onediff.infer_compiler as infer_compiler class MyModule(torch.nn.Module): def __init__(self): super().__init__() self.lin = torch.nn.Linear(100, 10) def forward(self, x): return torch.nn.functional.relu(self.lin(x)) mod = MyModule().to("cuda").half() with torch.inference_mode(): compiled_mod = infer_compiler.compile(mod, backend="nexfort", options={"mode": "max-autotune:cudagraphs", "dynamic": True, "fullgraph": True}, ) print(compiled_mod(torch.randn(10, 100, device="cuda").half())) ``` ### Case 2 ```python import torch import onediff.infer_compiler as infer_compiler @infer_compiler.compile( backend="nexfort", options={"mode": "max-autotune:cudagraphs", "dynamic": True, "fullgraph": True}, ) def foo(x): return torch.sin(x) + torch.cos(x) print(foo(torch.randn(10, 10, device="cuda").half())) ``` ## Vae ### ComfyUI Workflow ![speedup_vae](https://github.com/siliconflow/onediff/assets/109639975/83f088ff-785b-4ae9-8850-e783703a6db2) ### Result { model: sdxl, batch_size: 1 , image: 1024x1024 , speedup: vae} | Accelerator | Baseline (non-optimized) | OneDiff (Nexfort) | Percentage improvement | | ----------------------- | ------------------------ | ----------------- | ---------------------- | | NVIDIA GeForce RTX 4090 | 3.02 s | 2.95 s | 2.31% | First compilation time: 321.92 seconds <img width="1024" alt="image" src="https://github.com/siliconflow/onediff/assets/109639975/af72f242-d132-4529-a8c0-b1e84d26bb08"> ## Lora ### ComfyUI Workflow ![speedup_vae_unet](https://github.com/siliconflow/onediff/assets/109639975/e15f9f8c-68ee-4b35-b990-2c4be1fde7eb) ### Result { model: sdxl, batch_size: 1 , image: 1024x1024 , speedup: vae + unet} | Accelerator | Baseline (non-optimized) | OneDiff (Nexfort) | Percentage improvement | | ----------------------- | ------------------------ | ----------------- | ---------------------- | | NVIDIA GeForce RTX 4090 | 3.02 s | 1.85 s | 38.07 % | First compilation time: 878.19 seconds <img width="960" alt="image" src="https://github.com/siliconflow/onediff/assets/109639975/1f74acb7-bd8a-451e-96c1-87885d225b0f"> ## Controlnet ### ComfyUI Workflow ![cnet_speedup](https://github.com/siliconflow/onediff/assets/109639975/fe9a4524-a56d-4d0b-8cc7-d84bf2f01cdc) ### Result { model: sdxl, batch_size: 1 , image: 1024x1024 , speedup: controlnet} | Accelerator | Baseline (non-optimized) | OneDiff (Nexfort) | Percentage improvement | | ----------------------- | ------------------------ | ----------------- | ---------------------- | | NVIDIA GeForce RTX 4090 | 4.93 s | 4.07 s | 17.44 % | First compilation time: 437.84 seconds <img width="1428" alt="image" src="https://github.com/siliconflow/onediff/assets/109639975/44a37ee1-2ce3-4040-8b8c-d8062f189b47"> ## IPAdapter
- Loading branch information
Showing
42 changed files
with
1,844 additions
and
188 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,18 +1,57 @@ | ||
import collections | ||
from ..modules.nexfort.booster_basic import BasicNexFortBoosterExecutor | ||
|
||
NODE_CLASS_MAPPINGS = {} | ||
NODE_DISPLAY_NAME_MAPPINGS = {} | ||
|
||
# https://github.com/siliconflow/nexfort?tab=readme-ov-file#suggested-combinations-of-compiler-modes | ||
compiler_modes = collections.OrderedDict( | ||
{ | ||
"jit:disable-runtime-fusion:low-precision": "This compiles super quickly, but the performance might not be optimized very noticeably.", | ||
"jit:benchmark:low-precision:freezing:cudagraphs": "This compiles the model very quickly, but the performance might be not as good as `TorchInductor` optimized models.", | ||
"max-autotune:low-precision": "This will deliver a good performance and adapt quickly to shape changes.", | ||
"max-autotune:benchmark:low-precision:cudagraphs": "This is the most suggested combination of compiler modes. It will deliver a good balance between performance and compilation time.", | ||
"max-optimize:max-autotune:benchmark:low-precision:freezing:cudagraphs": "This is the most aggressive combination of compiler modes. It will deliver the best performance but might slow down the compilation significantly.", | ||
} | ||
) | ||
|
||
|
||
class OneDiffNexfortBooster: | ||
|
||
@classmethod | ||
def INPUT_TYPES(s): | ||
return {} | ||
|
||
return { | ||
"required": { | ||
"fullgraph": ([False, True],), | ||
"dynamic": ([None, True, False],), | ||
"mode": ([mode for mode in compiler_modes.keys()],), | ||
"docs_link": ( | ||
"STRING", | ||
{ | ||
"multiline": True, | ||
"default": "[Note]: \nInstall-nexfort \nhttps://github.com/siliconflow/onediff/tree/main/src/onediff/infer_compiler/backends/nexfort#install-nexfort", | ||
}, | ||
), | ||
} | ||
} | ||
|
||
CATEGORY = "OneDiff/Booster" | ||
RETURN_TYPES = ("TorchCompileBooster",) | ||
FUNCTION = "apply" | ||
|
||
def apply(self, *args, **kwargs): | ||
return (BasicNexFortBoosterExecutor(),) | ||
def apply( | ||
self, | ||
fullgraph=False, | ||
dynamic=None, | ||
mode="max-autotune:cudagraphs", | ||
docs_link=None, | ||
): | ||
return ( | ||
BasicNexFortBoosterExecutor( | ||
fullgraph=fullgraph, mode=f"{mode}:cache-all", dynamic=dynamic | ||
), | ||
) | ||
|
||
|
||
NODE_CLASS_MAPPINGS = { | ||
"OneDiffNexfortBooster": OneDiffNexfortBooster, | ||
} | ||
|
||
NODE_DISPLAY_NAME_MAPPINGS = {"OneDiffNexfortBooster": "Nexfort Booster - OneDiff"} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,2 +1,2 @@ | ||
from .booster_interface import BoosterExecutor | ||
from .booster_scheduler import BoosterScheduler | ||
from .booster_interface import BoosterExecutor, BoosterSettings | ||
from .booster_scheduler import BoosterScheduler |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,49 @@ | ||
import torch | ||
import traceback | ||
from collections import OrderedDict | ||
from comfy.model_patcher import ModelPatcher | ||
from comfy.sd import VAE | ||
from onediff.torch_utils.module_operations import get_sub_module | ||
from onediff.utils.import_utils import is_oneflow_available | ||
|
||
if is_oneflow_available(): | ||
from .oneflow.utils.booster_utils import is_using_oneflow_backend | ||
|
||
|
||
def switch_to_cached_model(new_model: ModelPatcher, cache_model): | ||
assert type(new_model.model) == type(cache_model) | ||
for k, v in new_model.model.state_dict().items(): | ||
cached_v: torch.Tensor = get_sub_module(cache_model, k) | ||
assert v.dtype == cached_v.dtype | ||
cached_v.copy_(v) | ||
new_model.model = cache_model | ||
return new_model | ||
|
||
|
||
class BoosterCacheService: | ||
_cache = OrderedDict() | ||
|
||
def put(self, key, model): | ||
if key is None: | ||
return | ||
# oneflow backends output image error | ||
if is_oneflow_available() and is_using_oneflow_backend(model): | ||
return | ||
self._cache[key] = model.model | ||
|
||
def get(self, key, default=None): | ||
return self._cache.get(key, default) | ||
|
||
def get_cached_model(self, key, model): | ||
cached_model = self.get(key, None) | ||
print(f"Cache lookup: Key='{key}', Cached Model Type='{type(cached_model)}'") | ||
if cached_model is not None: | ||
try: | ||
return switch_to_cached_model(model, cached_model) | ||
except Exception as e: | ||
print("An exception occurred when switching to cached model:") | ||
print(traceback.format_exc()) | ||
del self._cache[key] | ||
torch.cuda.empty_cache() | ||
|
||
return None |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.