Skip to content

4. Compatibility

av edited this page Nov 2, 2024 · 2 revisions

Known compatibility issues between the services and models.

Format

The format of the page is as follows:

## Service | Model

Short description of nature/cause of the issue

### Affected Service, Model (or combination)

Description of the issue affects the service or combination of services, possible workaround.

Gemma 2 - System Prompt

Gemma 2 models lack a system prompt.

vllm x searxng x webui

When WebRAG is enabled, Open WebUI will send requests to the respective backend that will use the system role (for RAG). Such requests will fail with Gemma 2 (or other models without system prompt support) when running against VLLM.

cmdh

cmdh needs a system prompt to outline the task.

chatui

HuggingFace ChatUI uses system prompt for chat title generation.

Workaround

Unfortunately - switch to another model. Alternatively, disable WebRAG.

LiteLLM - Dynamic Dependencies

When LiteLLM starts, it tries to install Node.js and some other packages right at the runtime. This may cause issues when running in a restricted environment or when related CDNs are unavailable.

Diagnostic:

harbor logs litellm is stuck at:

# First candidate
litellm  | Installing Prisma CLI

# Another candidate
litellm  |  * Install prebuilt node (22.5.1) ..... done.

litellm x webui

When LiteLLM is stuck at the setup phase, WebUI won't load any of the proxied models.

Workaround

Restart the service a few times until it starts successfully.

WebUI - Missing models in the list

WebUI v0.3.11 fails to load models from OpenAI-compatible endpoints when the API key is specified as an empty string (or missing).

Example configuration that'll not work:

{
  "openai": {
		"api_base_urls": [
			"http://mistralrs:8021/v1"
		],
		"api_keys": [
			""
		],
		"enabled": true
	}
}

Workaround

Setup the endpoint to use an actual API key or a fake API key if supported by the service.

{
  "openai": {
		"api_base_urls": [
			"http://mistralrs:8021/v1"
		],
		"api_keys": [
			"sk-mistralrs"
		],
		"enabled": true
	}
}

WebUI - Same audio after audio config change

webui has a built-in cache for tts. It's sometimes used at a sentence level, so after changing the audio model, generating audio for a previously seen sentence will result in the same audio.

Workaround

Use a new sentence for testing audio config changes. For example by re-generating the model response.

Exllama2 - GPTQ

Exllama2 (and related engines - Aphrodite engine, TabbyAPI, etc) only support GPTQ in 4-bits. You can detect this problem when running, for example, a 2-bit GPTQ model and seeing logs like these:

RuntimeError: q_weight and gptq_qzeros have incompatible shapes

Workaround

Switch to a 4-bit GPTQ model if possible. Otherwise, switch to another inference backend.

vLLM - Out of workspace memory in AlignedAlloactor

This is an error between vLLM and FlashInfer attention backend.

Workaround

Ensure you're running the latest vLLM version.

harbor pull vllm

TTS - xtts-v2

webui

openedai-speech only starts downloading the xtts-v2 model when the first generation request is made, not on startup. There are no logs or indication on download progress.

Workaround

Here're sample logs/steps when the xtts-v2 is not downloaded yet:

# Initial startup, xtts-v2 isn't downloaded yet
harbor.tts  | First startup may download 2GB of speech models. Please wait.
harbor.tts  | INFO:     Started server process [27]
harbor.tts  | INFO:     Waiting for application startup.
harbor.tts  | INFO:     Application startup complete.
harbor.tts  | INFO:     Uvicorn running on http://0.0.0.0:8000 (Press CTRL+C to quit)
harbor.tts  | INFO:     192.168.64.3:40698 - "POST /v1/audio/speech HTTP/1.1" 200 OK

# 1. Configure Open WebUI to use tts-1-hd
# 2. Generate speech from some uncached text
# 3. openedai-speech will log this:
harbor.tts  | 2024-08-26 08:51:44.737 | INFO     | __main__:__init__:59 - Loading model xtts to cuda
# ... Takes some time to download the model

# Check the folder size to see download progress
# voices/tts/tts_models--multilingual--multi-dataset--xtts
du -h $(harbor home)/tts

# Sample output when download is complete
user@os:~/code/harbor$ ▼ du -h $(harbor home)/tts
12K	/home/user/code/harbor/tts/config
1.8G	/home/user/code/harbor/tts/voices/tts/tts_models--multilingual--multi-dataset--xtts
1.8G	/home/user/code/harbor/tts/voices/tts
1.9G	/home/user/code/harbor/tts/voices
1.9G	/home/user/code/harbor/tts

Ollama - truncated input

When using OpenAI-compatible endpoints - there's no way to specify num_ctx (context size) for the model. This parameter affects how the model is loaded into memory, so must be known/set ahead of inference and Ollama can't change it on a per-request basis or dynamically.

Workaround

Create a new Modelfile from the base model specifying desired num_ctx parameter by default. Here's an example:

# 1. Export Modelfile for the LLM:
harbor ollama show --modelfile model > Modelfile

# 2. Edit the Modelfile to include the desired num_ctx:
# FROM model
# PARAMETER num_ctx 128000
code ./Modelfile

# 3. Put modelfile into a folder that is shared with ollama service:
cp Modelfile $(harbor home)/ollama/modelfiles/Modelfile

# 4. Import the Modelfile back:
# "/modelfiles" is where the shared folder from above is mounted
harbor ollama create -f /modelfiles/Modelfile model-128k

# 4. Verify the import
harbor ollama show model-128k

ComfyUI - Open WebUI can't connect

Open WebUI might display connection errors when trying to reach default ComfyUI installation in Harbor.

Workarond

Unfortunately, Open WebUI doesn't support any kind of auth settings for ComfyUI connections. So Harbor's ComfyUI auth must be disabled for the integration to work.

harbor comfyui auth false
Clone this wiki locally