Skip to content

Commit

Permalink
Dev nodes nexfort booster (#911)
Browse files Browse the repository at this point in the history
- [Nexfort](#nexfort)
- [How to use  Nexfort](#how-to-use--nexfort)
  - [Case 1](#case-1)
  - [Case 2](#case-2)
- [Vae](#vae)
  - [ComfyUI Workflow](#comfyui-workflow)
  - [Result](#result)
- [Lora](#lora)
  - [ComfyUI Workflow](#comfyui-workflow-1)
  - [Result](#result-1)
- [Controlnet](#controlnet)
  - [ComfyUI Workflow](#comfyui-workflow-2)
  - [Result](#result-2)
- [IPAdapter](#ipadapter)


## Nexfort 
- [x] Vae Speedup
- [x] Quick Switching Lora 
- [x] Controlnet Speedup
- [x] 支持编译 IPA https://github.com/cubiq/ComfyUI_IPAdapter_plus
- [x] 支持编译  PuLID_ComfyUI https://github.com/cubiq/PuLID_ComfyUI
- [x] 支持编译  https://github.com/cubiq/ComfyUI_InstantID
- [ ] 支持编译  https://github.com/city96/ComfyUI_ExtraModels
- [x] Quick Switching checkpoint  

```python
cd ComfyUI

# For CUDA Graph
export NEXFORT_FX_CUDAGRAPHS=1

# For best performance
export TORCHINDUCTOR_MAX_AUTOTUNE=1
# Enable CUDNN benchmark
export NEXFORT_FX_CONV_BENCHMARK=1
# Faster float32 matmul
export NEXFORT_FX_MATMUL_ALLOW_TF32=1

# For graph cache to speedup compilation
export TORCHINDUCTOR_FX_GRAPH_CACHE=1

# For persistent cache dir
export TORCHINDUCTOR_CACHE_DIR=~/.torchinductor



# debug
# export  TORCH_LOGS="+dynamo" 
# export  TORCHDYNAMO_VERBOSE=1
# export NEXFORT_DEBUG=1 NEXFORT_FX_DUMP_GRAPH=1 TORCH_COMPILE_DEBUG=1

python main.py --gpu-only --disable-cuda-malloc --port 8188 --cuda-device 6

```

- Install:
https://github.com/siliconflow/onediff/tree/main/src/onediff/infer_compiler/backends/nexfort#install-nexfort
- torch.__version__='2.4.0.dev20240507+cu124'
- nexfort.__version__='0.1.dev215+torch240dev20240507cu121'
- commit ffc4b7c30e35eb2773ace52a0b00e0ca5c1f4362 (HEAD -> master,
origin/master, origin/HEAD)
    Author: comfyanonymous <[email protected]>
    Date:   Sat May 25 02:31:23 2024 -0400

## How to use  Nexfort
### Case 1
```python
# Compile arbitrary models (torch.nn.Module)
import torch
import onediff.infer_compiler as infer_compiler

class MyModule(torch.nn.Module):
    def __init__(self):
        super().__init__()
        self.lin = torch.nn.Linear(100, 10)

    def forward(self, x):
        return torch.nn.functional.relu(self.lin(x))

mod = MyModule().to("cuda").half()
with torch.inference_mode():
    compiled_mod = infer_compiler.compile(mod,
        backend="nexfort",
        options={"mode": "max-autotune:cudagraphs", "dynamic": True, "fullgraph": True},
    )
    print(compiled_mod(torch.randn(10, 100, device="cuda").half()))
```

### Case 2

```python
import torch
import onediff.infer_compiler as infer_compiler
@infer_compiler.compile(
    backend="nexfort",
    options={"mode": "max-autotune:cudagraphs", "dynamic": True, "fullgraph": True},
)
def foo(x):
    return torch.sin(x) + torch.cos(x)

print(foo(torch.randn(10, 10, device="cuda").half()))
```

## Vae 
### ComfyUI Workflow

![speedup_vae](https://github.com/siliconflow/onediff/assets/109639975/83f088ff-785b-4ae9-8850-e783703a6db2)

### Result
{ model: sdxl,  batch_size: 1 , image: 1024x1024 , speedup: vae}
| Accelerator | Baseline (non-optimized) | OneDiff (Nexfort) |
Percentage improvement |
| ----------------------- | ------------------------ | -----------------
| ---------------------- |
| NVIDIA GeForce RTX 4090 | 3.02 s | 2.95 s | 2.31% |

First compilation time: 321.92 seconds

<img width="1024" alt="image"
src="https://github.com/siliconflow/onediff/assets/109639975/af72f242-d132-4529-a8c0-b1e84d26bb08">

## Lora
### ComfyUI Workflow

![speedup_vae_unet](https://github.com/siliconflow/onediff/assets/109639975/e15f9f8c-68ee-4b35-b990-2c4be1fde7eb)
### Result

{ model: sdxl,  batch_size: 1 , image: 1024x1024 , speedup: vae + unet}
| Accelerator | Baseline (non-optimized) | OneDiff (Nexfort) |
Percentage improvement |
| ----------------------- | ------------------------ | -----------------
| ---------------------- |
| NVIDIA GeForce RTX 4090 | 3.02 s | 1.85 s | 38.07 % |

First compilation time: 878.19 seconds
<img width="960" alt="image"
src="https://github.com/siliconflow/onediff/assets/109639975/1f74acb7-bd8a-451e-96c1-87885d225b0f">

## Controlnet
### ComfyUI Workflow

![cnet_speedup](https://github.com/siliconflow/onediff/assets/109639975/fe9a4524-a56d-4d0b-8cc7-d84bf2f01cdc)

### Result
{ model: sdxl,  batch_size: 1 , image: 1024x1024 , speedup:  controlnet}
| Accelerator | Baseline (non-optimized) | OneDiff (Nexfort) |
Percentage improvement |
| ----------------------- | ------------------------ | -----------------
| ---------------------- |
| NVIDIA GeForce RTX 4090 | 4.93 s | 4.07 s | 17.44 % |

First compilation time: 437.84 seconds
<img width="1428" alt="image"
src="https://github.com/siliconflow/onediff/assets/109639975/44a37ee1-2ce3-4040-8b8c-d8062f189b47">

## IPAdapter
  • Loading branch information
ccssu authored Jun 12, 2024
1 parent 003420a commit 323897c
Show file tree
Hide file tree
Showing 42 changed files with 1,844 additions and 188 deletions.
2 changes: 2 additions & 0 deletions .github/workflows/examples.yml
Original file line number Diff line number Diff line change
Expand Up @@ -321,6 +321,8 @@ jobs:
run: docker exec -w /src/onediff/onediff_diffusers_extensions ${{ env.CONTAINER_NAME }} python3 examples/text_to_image_deep_cache_sd_sdxl_enterprise.py --model /share_nfs/stable-diffusion-xl-base-1.0-int8-deep-cache --model_type sdxl --width 512 --height 512 --saved_image output_enterprise_deepcache_sdxl.png
- if: matrix.test-suite == 'diffusers_examples' && startsWith(matrix.image, 'onediff-pro')
run: docker exec -w /src/onediff/onediff_diffusers_extensions ${{ env.CONTAINER_NAME }} python3 examples/text_to_image_deep_cache_sdxl.py --base /share_nfs/hf_models/stable-diffusion-xl-base-1.0 --width 512 --height 512 --run_multiple_resolutions true --saved_image deepcache_sdxl.png
- if: matrix.test-suite == 'diffusers_examples'
run: docker exec -w /src/onediff ${{ env.CONTAINER_NAME }} python3 tests/test_model_inference.py
- if: matrix.test-suite == 'diffusers_examples'
run: docker exec -w /src/onediff/onediff_diffusers_extensions ${{ env.CONTAINER_NAME }} python3 examples/text_to_image.py --model_id=/share_nfs/hf_models/stable-diffusion-v1-5
- if: matrix.test-suite == 'diffusers_examples'
Expand Down
4 changes: 2 additions & 2 deletions onediff_comfy_nodes/__init__.py
Original file line number Diff line number Diff line change
@@ -1,4 +1,5 @@
"""OneDiff ComfyUI Speedup Module"""
from onediff.utils.import_utils import is_nexfort_available, is_oneflow_available
from ._config import is_disable_oneflow_backend
from ._nodes import (
ControlnetSpeedup,
Expand All @@ -8,7 +9,6 @@
OneDiffControlNetLoader,
VaeSpeedup,
)
from .utils.import_utils import is_nexfort_available, is_oneflow_available

NODE_CLASS_MAPPINGS = {
"ModelSpeedup": ModelSpeedup,
Expand All @@ -22,8 +22,8 @@
NODE_DISPLAY_NAME_MAPPINGS = {
"ModelSpeedup": "Model Speedup",
"VaeSpeedup": "VAE Speedup",
"OneDiffModelBooster": "Apply Model Booster - OneDiff",
"ControlnetSpeedup": "ControlNet Speedup",
"OneDiffModelBooster": "Apply Model Booster - OneDff",
"OneDiffCheckpointLoaderSimple": "Load Checkpoint - OneDiff",
}

Expand Down
7 changes: 7 additions & 0 deletions onediff_comfy_nodes/_config.py
Original file line number Diff line number Diff line change
@@ -1,4 +1,5 @@
import os
import sys
import folder_paths

__all__ = [
Expand All @@ -14,6 +15,12 @@
os.environ.get("ONEDIFF_COMFY_NODES_DISABLE_ONEFLOW_BACKEND", "0") == "1"
)

custom_nodes_path = os.path.join(folder_paths.base_path, "custom_nodes")

# Add paths to sys.path if not already there
if custom_nodes_path not in sys.path:
sys.path.append(custom_nodes_path)


if _default_backend not in ["oneflow", "nexfort"]:
raise ValueError(f"Invalid default backend: {_default_backend}")
Expand Down
86 changes: 48 additions & 38 deletions onediff_comfy_nodes/_nodes.py
Original file line number Diff line number Diff line change
@@ -1,12 +1,13 @@
from typing import Optional, Tuple
import folder_paths
import torch
import comfy
from onediff.utils.chache_utils import LRUCache
import uuid
from nodes import CheckpointLoaderSimple, ControlNetLoader
from ._config import is_disable_oneflow_backend
from .modules import BoosterScheduler, BoosterExecutor
from .utils.import_utils import is_nexfort_available # type: ignore
from .utils.import_utils import is_oneflow_available
from .modules import BoosterScheduler, BoosterExecutor, BoosterSettings
from onediff.utils.import_utils import is_nexfort_available # type: ignore
from onediff.utils.import_utils import is_oneflow_available

if is_oneflow_available() and not is_disable_oneflow_backend():
from .modules.oneflow import BasicOneFlowBoosterExecutor
Expand All @@ -31,50 +32,66 @@
]


class ModelSpeedup:
@classmethod
def INPUT_TYPES(s):
return {
"required": {"model": ("MODEL",), "inplace": ([False, True],),},
"optional": {"custom_booster": ("CUSTOM_BOOSTER",),},
}
class SpeedupMixin:
"""A mix-in class to provide speedup functionality."""

RETURN_TYPES = ("MODEL",)
FUNCTION = "speedup"
CATEGORY = "OneDiff"

@torch.no_grad()
def speedup(self, model, inplace=False, custom_booster: BoosterScheduler = None):
@torch.inference_mode()
def speedup(
self,
model,
inplace: bool = False,
custom_booster: Optional[BoosterScheduler] = None,
*args,
**kwargs
) -> Tuple:
"""
Speed up the model inference.
Args:
model: The input model to be sped up.
inplace (bool, optional): Whether to perform the operation inplace. Defaults to False.
custom_booster (BoosterScheduler, optional): Custom booster scheduler to use. Defaults to None.
*args: Additional positional arguments to be passed to the underlying functions.
**kwargs: Additional keyword arguments to be passed to the underlying functions.
Returns:
Tuple: Tuple containing the optimized model.
"""
if not hasattr(self, "booster_settings"):
self.booster_settings = BoosterSettings(tmp_cache_key=str(uuid.uuid4()))

if custom_booster:
booster = custom_booster
booster.inplace = False
booster.inplace = inplace
else:
booster = BoosterScheduler(BasicBoosterExecutor(), inplace=inplace)
booster.settings = self.booster_settings
return (booster(model, *args, **kwargs),)

return (booster(model),)


class VaeSpeedup:
class ModelSpeedup(SpeedupMixin):
@classmethod
def INPUT_TYPES(s):
return {
"required": {"vae": ("VAE",),},
"required": {"model": ("MODEL",), "inplace": ([False, True],),},
"optional": {"custom_booster": ("CUSTOM_BOOSTER",),},
}

RETURN_TYPES = ("VAE",)
FUNCTION = "speedup"
CATEGORY = "OneDiff"
RETURN_TYPES = ("MODEL",)

@torch.no_grad()
def speedup(self, vae, custom_booster=None):
if custom_booster:
booster = custom_booster
else:
booster = BoosterScheduler(BasicBoosterExecutor())

new_vae = booster(vae)
return (new_vae,)
class VaeSpeedup(SpeedupMixin):
@classmethod
def INPUT_TYPES(s):
return {
"required": {"vae": ("VAE",), "inplace": ([False, True],),},
"optional": {"custom_booster": ("CUSTOM_BOOSTER",),},
}

RETURN_TYPES = ("VAE",)


class ControlnetSpeedup:
Expand Down Expand Up @@ -177,8 +194,6 @@ def onediff_load_controlnet(self, control_net_name, custom_booster=None):


class OneDiffCheckpointLoaderSimple(CheckpointLoaderSimple):
_cache_map = LRUCache(1)

@classmethod
def INPUT_TYPES(s):
return {
Expand Down Expand Up @@ -226,11 +241,6 @@ def _load_checkpoint(
def onediff_load_checkpoint(
self, ckpt_name, vae_speedup="disable", custom_booster: BoosterScheduler = None,
):
cache_key = (ckpt_name, vae_speedup, custom_booster)
out = self._cache_map.get(cache_key, None)
if out is None:
out = self._load_checkpoint(ckpt_name, vae_speedup, custom_booster)
self._cache_map.put(cache_key, out)

out = self._load_checkpoint(ckpt_name, vae_speedup, custom_booster)
# Return the loaded checkpoint (modelpatcher, clip, vae)
return out
53 changes: 46 additions & 7 deletions onediff_comfy_nodes/extras_nodes/nodes_nexfort_booster.py
Original file line number Diff line number Diff line change
@@ -1,18 +1,57 @@
import collections
from ..modules.nexfort.booster_basic import BasicNexFortBoosterExecutor

NODE_CLASS_MAPPINGS = {}
NODE_DISPLAY_NAME_MAPPINGS = {}

# https://github.com/siliconflow/nexfort?tab=readme-ov-file#suggested-combinations-of-compiler-modes
compiler_modes = collections.OrderedDict(
{
"jit:disable-runtime-fusion:low-precision": "This compiles super quickly, but the performance might not be optimized very noticeably.",
"jit:benchmark:low-precision:freezing:cudagraphs": "This compiles the model very quickly, but the performance might be not as good as `TorchInductor` optimized models.",
"max-autotune:low-precision": "This will deliver a good performance and adapt quickly to shape changes.",
"max-autotune:benchmark:low-precision:cudagraphs": "This is the most suggested combination of compiler modes. It will deliver a good balance between performance and compilation time.",
"max-optimize:max-autotune:benchmark:low-precision:freezing:cudagraphs": "This is the most aggressive combination of compiler modes. It will deliver the best performance but might slow down the compilation significantly.",
}
)


class OneDiffNexfortBooster:

@classmethod
def INPUT_TYPES(s):
return {}

return {
"required": {
"fullgraph": ([False, True],),
"dynamic": ([None, True, False],),
"mode": ([mode for mode in compiler_modes.keys()],),
"docs_link": (
"STRING",
{
"multiline": True,
"default": "[Note]: \nInstall-nexfort \nhttps://github.com/siliconflow/onediff/tree/main/src/onediff/infer_compiler/backends/nexfort#install-nexfort",
},
),
}
}

CATEGORY = "OneDiff/Booster"
RETURN_TYPES = ("TorchCompileBooster",)
FUNCTION = "apply"

def apply(self, *args, **kwargs):
return (BasicNexFortBoosterExecutor(),)
def apply(
self,
fullgraph=False,
dynamic=None,
mode="max-autotune:cudagraphs",
docs_link=None,
):
return (
BasicNexFortBoosterExecutor(
fullgraph=fullgraph, mode=f"{mode}:cache-all", dynamic=dynamic
),
)


NODE_CLASS_MAPPINGS = {
"OneDiffNexfortBooster": OneDiffNexfortBooster,
}

NODE_DISPLAY_NAME_MAPPINGS = {"OneDiffNexfortBooster": "Nexfort Booster - OneDiff"}
8 changes: 4 additions & 4 deletions onediff_comfy_nodes/extras_nodes/nodes_oneflow_booster.py
Original file line number Diff line number Diff line change
Expand Up @@ -7,9 +7,9 @@
from comfy import model_management
from comfy.cli_args import args

from onediff.infer_compiler.backends.oneflow.utils.version_util import (
is_community_version,
)
from onediff.utils.import_utils import is_onediff_quant_available
from onediff.infer_compiler.backends.oneflow.utils.version_util import is_community_version


from ..modules import BoosterScheduler
from ..modules.oneflow import (
Expand All @@ -28,7 +28,7 @@
from ..modules.oneflow.hijack_utils import comfy_utils_hijack

from ..modules.oneflow.utils import OUTPUT_FOLDER, load_graph, save_graph
from ..utils.import_utils import is_onediff_quant_available
from ..modules import BoosterScheduler

if is_onediff_quant_available() and not is_community_version():
from ..modules.oneflow.booster_quantization import (
Expand Down
4 changes: 2 additions & 2 deletions onediff_comfy_nodes/modules/__init__.py
Original file line number Diff line number Diff line change
@@ -1,2 +1,2 @@
from .booster_interface import BoosterExecutor
from .booster_scheduler import BoosterScheduler
from .booster_interface import BoosterExecutor, BoosterSettings
from .booster_scheduler import BoosterScheduler
49 changes: 49 additions & 0 deletions onediff_comfy_nodes/modules/booster_cache.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,49 @@
import torch
import traceback
from collections import OrderedDict
from comfy.model_patcher import ModelPatcher
from comfy.sd import VAE
from onediff.torch_utils.module_operations import get_sub_module
from onediff.utils.import_utils import is_oneflow_available

if is_oneflow_available():
from .oneflow.utils.booster_utils import is_using_oneflow_backend


def switch_to_cached_model(new_model: ModelPatcher, cache_model):
assert type(new_model.model) == type(cache_model)
for k, v in new_model.model.state_dict().items():
cached_v: torch.Tensor = get_sub_module(cache_model, k)
assert v.dtype == cached_v.dtype
cached_v.copy_(v)
new_model.model = cache_model
return new_model


class BoosterCacheService:
_cache = OrderedDict()

def put(self, key, model):
if key is None:
return
# oneflow backends output image error
if is_oneflow_available() and is_using_oneflow_backend(model):
return
self._cache[key] = model.model

def get(self, key, default=None):
return self._cache.get(key, default)

def get_cached_model(self, key, model):
cached_model = self.get(key, None)
print(f"Cache lookup: Key='{key}', Cached Model Type='{type(cached_model)}'")
if cached_model is not None:
try:
return switch_to_cached_model(model, cached_model)
except Exception as e:
print("An exception occurred when switching to cached model:")
print(traceback.format_exc())
del self._cache[key]
torch.cuda.empty_cache()

return None
14 changes: 14 additions & 0 deletions onediff_comfy_nodes/modules/booster_interface.py
Original file line number Diff line number Diff line change
@@ -1,5 +1,7 @@
# import os
import uuid
from abc import ABC, abstractmethod
import dataclasses

# from functools import singledispatchmethod
# from typing import Optional
Expand All @@ -10,10 +12,22 @@
# from comfy.model_patcher import ModelPatcher
# from comfy.sd import VAE


class BoosterExecutor(ABC):
"""Interface for optimization."""

@abstractmethod
def execute(self, model, ckpt_name=None, **kwargs):
"""Apply the optimization strategy to the model."""
pass


@dataclasses.dataclass
class BoosterSettings:
tmp_cache_key: str = None


if __name__ == "__main__":
print(BoosterSettings(str(uuid.uuid4())).tmp_cache_key)
print(BoosterSettings(str(uuid.uuid4())).tmp_cache_key)
print(BoosterSettings(str(uuid.uuid4())).tmp_cache_key)
Loading

0 comments on commit 323897c

Please sign in to comment.