Dev nodes nexfort booster (#911)

- [Nexfort](#nexfort) - [How to use Nexfort](#how-to-use--nexfort) - [Case 1](#case-1) - [Case 2](#case-2) - [Vae](#vae) - [ComfyUI Workflow](#comfyui-workflow) - [Result](#result) - [Lora](#lora) - [ComfyUI Workflow](#comfyui-workflow-1) - [Result](#result-1) - [Controlnet](#controlnet) - [ComfyUI Workflow](#comfyui-workflow-2) - [Result](#result-2) - [IPAdapter](#ipadapter) ## Nexfort - [x] Vae Speedup - [x] Quick Switching Lora - [x] Controlnet Speedup - [x] 支持编译 IPA https://github.com/cubiq/ComfyUI_IPAdapter_plus - [x] 支持编译 PuLID_ComfyUI https://github.com/cubiq/PuLID_ComfyUI - [x] 支持编译 https://github.com/cubiq/ComfyUI_InstantID - [ ] 支持编译 https://github.com/city96/ComfyUI_ExtraModels - [x] Quick Switching checkpoint ```python cd ComfyUI # For CUDA Graph export NEXFORT_FX_CUDAGRAPHS=1 # For best performance export TORCHINDUCTOR_MAX_AUTOTUNE=1 # Enable CUDNN benchmark export NEXFORT_FX_CONV_BENCHMARK=1 # Faster float32 matmul export NEXFORT_FX_MATMUL_ALLOW_TF32=1 # For graph cache to speedup compilation export TORCHINDUCTOR_FX_GRAPH_CACHE=1 # For persistent cache dir export TORCHINDUCTOR_CACHE_DIR=~/.torchinductor # debug # export TORCH_LOGS="+dynamo" # export TORCHDYNAMO_VERBOSE=1 # export NEXFORT_DEBUG=1 NEXFORT_FX_DUMP_GRAPH=1 TORCH_COMPILE_DEBUG=1 python main.py --gpu-only --disable-cuda-malloc --port 8188 --cuda-device 6 ``` - Install: https://github.com/siliconflow/onediff/tree/main/src/onediff/infer_compiler/backends/nexfort#install-nexfort - torch.__version__='2.4.0.dev20240507+cu124' - nexfort.__version__='0.1.dev215+torch240dev20240507cu121' - commit ffc4b7c30e35eb2773ace52a0b00e0ca5c1f4362 (HEAD -> master, origin/master, origin/HEAD) Author: comfyanonymous <[email protected]> Date: Sat May 25 02:31:23 2024 -0400 ## How to use Nexfort ### Case 1 ```python # Compile arbitrary models (torch.nn.Module) import torch import onediff.infer_compiler as infer_compiler class MyModule(torch.nn.Module): def __init__(self): super().__init__() self.lin = torch.nn.Linear(100, 10) def forward(self, x): return torch.nn.functional.relu(self.lin(x)) mod = MyModule().to("cuda").half() with torch.inference_mode(): compiled_mod = infer_compiler.compile(mod, backend="nexfort", options={"mode": "max-autotune:cudagraphs", "dynamic": True, "fullgraph": True}, ) print(compiled_mod(torch.randn(10, 100, device="cuda").half())) ``` ### Case 2 ```python import torch import onediff.infer_compiler as infer_compiler @infer_compiler.compile( backend="nexfort", options={"mode": "max-autotune:cudagraphs", "dynamic": True, "fullgraph": True}, ) def foo(x): return torch.sin(x) + torch.cos(x) print(foo(torch.randn(10, 10, device="cuda").half())) ``` ## Vae ### ComfyUI Workflow ![speedup_vae](https://github.com/siliconflow/onediff/assets/109639975/83f088ff-785b-4ae9-8850-e783703a6db2) ### Result { model: sdxl, batch_size: 1 , image: 1024x1024 , speedup: vae} | Accelerator | Baseline (non-optimized) | OneDiff (Nexfort) | Percentage improvement | | ----------------------- | ------------------------ | ----------------- | ---------------------- | | NVIDIA GeForce RTX 4090 | 3.02 s | 2.95 s | 2.31% | First compilation time： 321.92 seconds <img width="1024" alt="image" src="https://github.com/siliconflow/onediff/assets/109639975/af72f242-d132-4529-a8c0-b1e84d26bb08"> ## Lora ### ComfyUI Workflow ![speedup_vae_unet](https://github.com/siliconflow/onediff/assets/109639975/e15f9f8c-68ee-4b35-b990-2c4be1fde7eb) ### Result { model: sdxl, batch_size: 1 , image: 1024x1024 , speedup: vae + unet} | Accelerator | Baseline (non-optimized) | OneDiff (Nexfort) | Percentage improvement | | ----------------------- | ------------------------ | ----------------- | ---------------------- | | NVIDIA GeForce RTX 4090 | 3.02 s | 1.85 s | 38.07 % | First compilation time： 878.19 seconds <img width="960" alt="image" src="https://github.com/siliconflow/onediff/assets/109639975/1f74acb7-bd8a-451e-96c1-87885d225b0f"> ## Controlnet ### ComfyUI Workflow ![cnet_speedup](https://github.com/siliconflow/onediff/assets/109639975/fe9a4524-a56d-4d0b-8cc7-d84bf2f01cdc) ### Result { model: sdxl, batch_size: 1 , image: 1024x1024 , speedup: controlnet} | Accelerator | Baseline (non-optimized) | OneDiff (Nexfort) | Percentage improvement | | ----------------------- | ------------------------ | ----------------- | ---------------------- | | NVIDIA GeForce RTX 4090 | 4.93 s | 4.07 s | 17.44 % | First compilation time： 437.84 seconds <img width="1428" alt="image" src="https://github.com/siliconflow/onediff/assets/109639975/44a37ee1-2ce3-4040-8b8c-d8062f189b47"> ## IPAdapter
siliconflow · Jun 12, 2024 · 323897c · 323897c
1 parent 003420a
commit 323897c
Show file tree

Hide file tree

Showing 42 changed files with 1,844 additions and 188 deletions.
diff --git a/.github/workflows/examples.yml b/.github/workflows/examples.yml
@@ -321,6 +321,8 @@ jobs:
         run: docker exec -w /src/onediff/onediff_diffusers_extensions ${{ env.CONTAINER_NAME }} python3 examples/text_to_image_deep_cache_sd_sdxl_enterprise.py --model /share_nfs/stable-diffusion-xl-base-1.0-int8-deep-cache --model_type sdxl --width 512 --height 512 --saved_image output_enterprise_deepcache_sdxl.png
       - if: matrix.test-suite == 'diffusers_examples' && startsWith(matrix.image, 'onediff-pro')
         run: docker exec -w /src/onediff/onediff_diffusers_extensions ${{ env.CONTAINER_NAME }} python3 examples/text_to_image_deep_cache_sdxl.py --base /share_nfs/hf_models/stable-diffusion-xl-base-1.0 --width 512 --height 512 --run_multiple_resolutions true --saved_image deepcache_sdxl.png
+      - if: matrix.test-suite == 'diffusers_examples'
+        run: docker exec -w /src/onediff ${{ env.CONTAINER_NAME }} python3 tests/test_model_inference.py
       - if: matrix.test-suite == 'diffusers_examples'
         run: docker exec -w /src/onediff/onediff_diffusers_extensions ${{ env.CONTAINER_NAME }} python3 examples/text_to_image.py --model_id=/share_nfs/hf_models/stable-diffusion-v1-5
       - if: matrix.test-suite == 'diffusers_examples'

diff --git a/onediff_comfy_nodes/__init__.py b/onediff_comfy_nodes/__init__.py
@@ -1,4 +1,5 @@
 """OneDiff ComfyUI Speedup Module"""
+from onediff.utils.import_utils import is_nexfort_available, is_oneflow_available
 from ._config import is_disable_oneflow_backend
 from ._nodes import (
     ControlnetSpeedup,
@@ -8,7 +9,6 @@
     OneDiffControlNetLoader,
     VaeSpeedup,
 )
-from .utils.import_utils import is_nexfort_available, is_oneflow_available
 
 NODE_CLASS_MAPPINGS = {
     "ModelSpeedup": ModelSpeedup,
@@ -22,8 +22,8 @@
 NODE_DISPLAY_NAME_MAPPINGS = {
     "ModelSpeedup": "Model Speedup",
     "VaeSpeedup": "VAE Speedup",
+    "OneDiffModelBooster": "Apply Model Booster - OneDiff",
     "ControlnetSpeedup": "ControlNet Speedup",
-    "OneDiffModelBooster": "Apply Model Booster - OneDff",
     "OneDiffCheckpointLoaderSimple": "Load Checkpoint - OneDiff",
 }
 

diff --git a/onediff_comfy_nodes/_config.py b/onediff_comfy_nodes/_config.py
@@ -1,4 +1,5 @@
 import os
+import sys
 import folder_paths
 
 __all__ = [
@@ -14,6 +15,12 @@
     os.environ.get("ONEDIFF_COMFY_NODES_DISABLE_ONEFLOW_BACKEND", "0") == "1"
 )
 
+custom_nodes_path = os.path.join(folder_paths.base_path, "custom_nodes")
+
+# Add paths to sys.path if not already there
+if custom_nodes_path not in sys.path:
+    sys.path.append(custom_nodes_path)
+
 
 if _default_backend not in ["oneflow", "nexfort"]:
     raise ValueError(f"Invalid default backend: {_default_backend}")

diff --git a/onediff_comfy_nodes/_nodes.py b/onediff_comfy_nodes/_nodes.py
@@ -1,12 +1,13 @@
+from typing import Optional, Tuple
 import folder_paths
 import torch
 import comfy
-from onediff.utils.chache_utils import LRUCache
+import uuid
 from nodes import CheckpointLoaderSimple, ControlNetLoader
 from ._config import is_disable_oneflow_backend
-from .modules import BoosterScheduler, BoosterExecutor
-from .utils.import_utils import is_nexfort_available  # type: ignore
-from .utils.import_utils import is_oneflow_available
+from .modules import BoosterScheduler, BoosterExecutor, BoosterSettings
+from onediff.utils.import_utils import is_nexfort_available  # type: ignore
+from onediff.utils.import_utils import is_oneflow_available
 
 if is_oneflow_available() and not is_disable_oneflow_backend():
     from .modules.oneflow import BasicOneFlowBoosterExecutor
@@ -31,50 +32,66 @@
 ]
 
 
-class ModelSpeedup:
-    @classmethod
-    def INPUT_TYPES(s):
-        return {
-            "required": {"model": ("MODEL",), "inplace": ([False, True],),},
-            "optional": {"custom_booster": ("CUSTOM_BOOSTER",),},
-        }
+class SpeedupMixin:
+    """A mix-in class to provide speedup functionality."""
 
-    RETURN_TYPES = ("MODEL",)
     FUNCTION = "speedup"
     CATEGORY = "OneDiff"
 
-    @torch.no_grad()
-    def speedup(self, model, inplace=False, custom_booster: BoosterScheduler = None):
+    @torch.inference_mode()
+    def speedup(
+        self,
+        model,
+        inplace: bool = False,
+        custom_booster: Optional[BoosterScheduler] = None,
+        *args,
+        **kwargs
+    ) -> Tuple:
+        """
+        Speed up the model inference.
+
+        Args:
+            model: The input model to be sped up.
+            inplace (bool, optional): Whether to perform the operation inplace. Defaults to False.
+            custom_booster (BoosterScheduler, optional): Custom booster scheduler to use. Defaults to None.
+            *args: Additional positional arguments to be passed to the underlying functions.
+            **kwargs: Additional keyword arguments to be passed to the underlying functions.
+
+        Returns:
+            Tuple: Tuple containing the optimized model.
+        """
+        if not hasattr(self, "booster_settings"):
+            self.booster_settings = BoosterSettings(tmp_cache_key=str(uuid.uuid4()))
+
         if custom_booster:
             booster = custom_booster
-            booster.inplace = False
+            booster.inplace = inplace
         else:
             booster = BoosterScheduler(BasicBoosterExecutor(), inplace=inplace)
+        booster.settings = self.booster_settings
+        return (booster(model, *args, **kwargs),)
 
-        return (booster(model),)
 
-
-class VaeSpeedup:
+class ModelSpeedup(SpeedupMixin):
     @classmethod
     def INPUT_TYPES(s):
         return {
-            "required": {"vae": ("VAE",),},
+            "required": {"model": ("MODEL",), "inplace": ([False, True],),},
             "optional": {"custom_booster": ("CUSTOM_BOOSTER",),},
         }
 
-    RETURN_TYPES = ("VAE",)
-    FUNCTION = "speedup"
-    CATEGORY = "OneDiff"
+    RETURN_TYPES = ("MODEL",)
 
-    @torch.no_grad()
-    def speedup(self, vae, custom_booster=None):
-        if custom_booster:
-            booster = custom_booster
-        else:
-            booster = BoosterScheduler(BasicBoosterExecutor())
 
-        new_vae = booster(vae)
-        return (new_vae,)
+class VaeSpeedup(SpeedupMixin):
+    @classmethod
+    def INPUT_TYPES(s):
+        return {
+            "required": {"vae": ("VAE",), "inplace": ([False, True],),},
+            "optional": {"custom_booster": ("CUSTOM_BOOSTER",),},
+        }
+
+    RETURN_TYPES = ("VAE",)
 
 
 class ControlnetSpeedup:
@@ -177,8 +194,6 @@ def onediff_load_controlnet(self, control_net_name, custom_booster=None):
 
 
 class OneDiffCheckpointLoaderSimple(CheckpointLoaderSimple):
-    _cache_map = LRUCache(1)
-
     @classmethod
     def INPUT_TYPES(s):
         return {
@@ -226,11 +241,6 @@ def _load_checkpoint(
     def onediff_load_checkpoint(
         self, ckpt_name, vae_speedup="disable", custom_booster: BoosterScheduler = None,
     ):
-        cache_key = (ckpt_name, vae_speedup, custom_booster)
-        out = self._cache_map.get(cache_key, None)
-        if out is None:
-            out = self._load_checkpoint(ckpt_name, vae_speedup, custom_booster)
-            self._cache_map.put(cache_key, out)
-
+        out = self._load_checkpoint(ckpt_name, vae_speedup, custom_booster)
         # Return the loaded checkpoint (modelpatcher, clip, vae)
         return out
diff --git a/onediff_comfy_nodes/extras_nodes/nodes_nexfort_booster.py b/onediff_comfy_nodes/extras_nodes/nodes_nexfort_booster.py
@@ -1,18 +1,57 @@
+import collections
 from ..modules.nexfort.booster_basic import BasicNexFortBoosterExecutor
 
-NODE_CLASS_MAPPINGS = {}
-NODE_DISPLAY_NAME_MAPPINGS = {}
+
+# https://github.com/siliconflow/nexfort?tab=readme-ov-file#suggested-combinations-of-compiler-modes
+compiler_modes = collections.OrderedDict(
+    {
+        "jit:disable-runtime-fusion:low-precision": "This compiles super quickly, but the performance might not be optimized very noticeably.",
+        "jit:benchmark:low-precision:freezing:cudagraphs": "This compiles the model very quickly, but the performance might be not as good as `TorchInductor` optimized models.",
+        "max-autotune:low-precision": "This will deliver a good performance and adapt quickly to shape changes.",
+        "max-autotune:benchmark:low-precision:cudagraphs": "This is the most suggested combination of compiler modes. It will deliver a good balance between performance and compilation time.",
+        "max-optimize:max-autotune:benchmark:low-precision:freezing:cudagraphs": "This is the most aggressive combination of compiler modes. It will deliver the best performance but might slow down the compilation significantly.",
+    }
+)
 
 
 class OneDiffNexfortBooster:
-
     @classmethod
     def INPUT_TYPES(s):
-        return {}
-
+        return {
+            "required": {
+                "fullgraph": ([False, True],),
+                "dynamic": ([None, True, False],),
+                "mode": ([mode for mode in compiler_modes.keys()],),
+                "docs_link": (
+                    "STRING",
+                    {
+                        "multiline": True,
+                        "default": "[Note]: \nInstall-nexfort \nhttps://github.com/siliconflow/onediff/tree/main/src/onediff/infer_compiler/backends/nexfort#install-nexfort",
+                    },
+                ),
+            }
+        }
+
     CATEGORY = "OneDiff/Booster"
     RETURN_TYPES = ("TorchCompileBooster",)
     FUNCTION = "apply"
 
-    def apply(self, *args, **kwargs):
-        return (BasicNexFortBoosterExecutor(),)
+    def apply(
+        self,
+        fullgraph=False,
+        dynamic=None,
+        mode="max-autotune:cudagraphs",
+        docs_link=None,
+    ):
+        return (
+            BasicNexFortBoosterExecutor(
+                fullgraph=fullgraph, mode=f"{mode}:cache-all", dynamic=dynamic
+            ),
+        )
+
+
+NODE_CLASS_MAPPINGS = {
+    "OneDiffNexfortBooster": OneDiffNexfortBooster,
+}
+
+NODE_DISPLAY_NAME_MAPPINGS = {"OneDiffNexfortBooster": "Nexfort Booster - OneDiff"}
diff --git a/onediff_comfy_nodes/extras_nodes/nodes_oneflow_booster.py b/onediff_comfy_nodes/extras_nodes/nodes_oneflow_booster.py
@@ -7,9 +7,9 @@
 from comfy import model_management
 from comfy.cli_args import args
 
-from onediff.infer_compiler.backends.oneflow.utils.version_util import (
-    is_community_version,
-)
+from onediff.utils.import_utils import is_onediff_quant_available
+from onediff.infer_compiler.backends.oneflow.utils.version_util import is_community_version
+
 
 from ..modules import BoosterScheduler
 from ..modules.oneflow import (
@@ -28,7 +28,7 @@
 from ..modules.oneflow.hijack_utils import comfy_utils_hijack
 
 from ..modules.oneflow.utils import OUTPUT_FOLDER, load_graph, save_graph
-from ..utils.import_utils import is_onediff_quant_available
+from ..modules import BoosterScheduler
 
 if is_onediff_quant_available() and not is_community_version():
     from ..modules.oneflow.booster_quantization import (

diff --git a/onediff_comfy_nodes/modules/__init__.py b/onediff_comfy_nodes/modules/__init__.py
@@ -1,2 +1,2 @@
-from .booster_interface import BoosterExecutor
-from .booster_scheduler import BoosterScheduler
+from .booster_interface import BoosterExecutor, BoosterSettings
+from .booster_scheduler import BoosterScheduler
diff --git a/onediff_comfy_nodes/modules/booster_cache.py b/onediff_comfy_nodes/modules/booster_cache.py
@@ -0,0 +1,49 @@
+import torch
+import traceback
+from collections import OrderedDict
+from comfy.model_patcher import ModelPatcher
+from comfy.sd import VAE
+from onediff.torch_utils.module_operations import get_sub_module
+from onediff.utils.import_utils import is_oneflow_available
+
+if is_oneflow_available():
+    from .oneflow.utils.booster_utils import is_using_oneflow_backend
+
+
+def switch_to_cached_model(new_model: ModelPatcher, cache_model):
+    assert type(new_model.model) == type(cache_model)
+    for k, v in new_model.model.state_dict().items():
+        cached_v: torch.Tensor = get_sub_module(cache_model, k)
+        assert v.dtype == cached_v.dtype
+        cached_v.copy_(v)
+    new_model.model = cache_model
+    return new_model
+
+
+class BoosterCacheService:
+    _cache = OrderedDict()
+
+    def put(self, key, model):
+        if key is None:
+            return
+        # oneflow backends output image error
+        if is_oneflow_available() and is_using_oneflow_backend(model):
+            return
+        self._cache[key] = model.model
+
+    def get(self, key, default=None):
+        return self._cache.get(key, default)
+
+    def get_cached_model(self, key, model):
+        cached_model = self.get(key, None)
+        print(f"Cache lookup: Key='{key}', Cached Model Type='{type(cached_model)}'")
+        if cached_model is not None:
+            try:
+                return switch_to_cached_model(model, cached_model)
+            except Exception as e:
+                print("An exception occurred when switching to cached model:")
+                print(traceback.format_exc())
+                del self._cache[key]
+                torch.cuda.empty_cache()
+
+        return None
diff --git a/onediff_comfy_nodes/modules/booster_interface.py b/onediff_comfy_nodes/modules/booster_interface.py
@@ -1,5 +1,7 @@
 # import os
+import uuid
 from abc import ABC, abstractmethod
+import dataclasses
 
 # from functools import singledispatchmethod
 # from typing import Optional
@@ -10,10 +12,22 @@
 # from comfy.model_patcher import ModelPatcher
 # from comfy.sd import VAE
 
+
 class BoosterExecutor(ABC):
     """Interface for optimization."""
 
     @abstractmethod
     def execute(self, model, ckpt_name=None, **kwargs):
         """Apply the optimization strategy to the model."""
         pass
+
+
+@dataclasses.dataclass
+class BoosterSettings:
+    tmp_cache_key: str = None
+
+
+if __name__ == "__main__":
+    print(BoosterSettings(str(uuid.uuid4())).tmp_cache_key)
+    print(BoosterSettings(str(uuid.uuid4())).tmp_cache_key)
+    print(BoosterSettings(str(uuid.uuid4())).tmp_cache_key)