-
Notifications
You must be signed in to change notification settings - Fork 33
4. Compatibility
Known compatibility issues between the services and models.
The format of the page is as follows:
## Service | Model
Short description of nature/cause of the issue
### Affected Service, Model (or combination)
Description of the issue affects the service or combination of services, possible workaround.
Gemma 2 models lack a system prompt.
When WebRAG is enabled, Open WebUI will send requests to the respective backend that will use the system role (for RAG). Such requests will fail with Gemma 2 (or other models without system prompt support) when running against VLLM.
cmdh
needs a system prompt to outline the task.
HuggingFace ChatUI uses system prompt for chat title generation.
Unfortunately - switch to another model. Alternatively, disable WebRAG.
When LiteLLM starts, it tries to install Node.js and some other packages right at the runtime. This may cause issues when running in a restricted environment or when related CDNs are unavailable.
Diagnostic:
harbor logs litellm
is stuck at:
# First candidate
litellm | Installing Prisma CLI
# Another candidate
litellm | * Install prebuilt node (22.5.1) ..... done.
When LiteLLM is stuck at the setup phase, WebUI won't load any of the proxied models.
Restart the service a few times until it starts successfully.
WebUI v0.3.11 fails to load models from OpenAI-compatible endpoints when the API key is specified as an empty string (or missing).
Example configuration that'll not work:
{
"openai": {
"api_base_urls": [
"http://mistralrs:8021/v1"
],
"api_keys": [
""
],
"enabled": true
}
}
Setup the endpoint to use an actual API key or a fake API key if supported by the service.
{
"openai": {
"api_base_urls": [
"http://mistralrs:8021/v1"
],
"api_keys": [
"sk-mistralrs"
],
"enabled": true
}
}
webui
has a built-in cache for tts. It's sometimes used at a sentence level, so after changing the audio model, generating audio for a previously seen sentence will result in the same audio.
Use a new sentence for testing audio config changes. For example by re-generating the model response.
Exllama2 (and related engines - Aphrodite engine, TabbyAPI, etc) only support GPTQ in 4-bits. You can detect this problem when running, for example, a 2-bit GPTQ model and seeing logs like these:
RuntimeError: q_weight and gptq_qzeros have incompatible shapes
Switch to a 4-bit GPTQ model if possible. Otherwise, switch to another inference backend.
This is an error between vLLM and FlashInfer attention backend.
Ensure you're running the latest vLLM version.
harbor pull vllm
openedai-speech
only starts downloading the xtts-v2
model when the first generation request is made, not on startup. There are no logs or indication on download progress.
Here're sample logs/steps when the xtts-v2
is not downloaded yet:
# Initial startup, xtts-v2 isn't downloaded yet
harbor.tts | First startup may download 2GB of speech models. Please wait.
harbor.tts | INFO: Started server process [27]
harbor.tts | INFO: Waiting for application startup.
harbor.tts | INFO: Application startup complete.
harbor.tts | INFO: Uvicorn running on http://0.0.0.0:8000 (Press CTRL+C to quit)
harbor.tts | INFO: 192.168.64.3:40698 - "POST /v1/audio/speech HTTP/1.1" 200 OK
# 1. Configure Open WebUI to use tts-1-hd
# 2. Generate speech from some uncached text
# 3. openedai-speech will log this:
harbor.tts | 2024-08-26 08:51:44.737 | INFO | __main__:__init__:59 - Loading model xtts to cuda
# ... Takes some time to download the model
# Check the folder size to see download progress
# voices/tts/tts_models--multilingual--multi-dataset--xtts
du -h $(harbor home)/tts
# Sample output when download is complete
user@os:~/code/harbor$ ▼ du -h $(harbor home)/tts
12K /home/user/code/harbor/tts/config
1.8G /home/user/code/harbor/tts/voices/tts/tts_models--multilingual--multi-dataset--xtts
1.8G /home/user/code/harbor/tts/voices/tts
1.9G /home/user/code/harbor/tts/voices
1.9G /home/user/code/harbor/tts
When using OpenAI-compatible endpoints - there's no way to specify num_ctx
(context size) for the model. This parameter affects how the model is loaded into memory, so must be known/set ahead of inference and Ollama can't change it on a per-request basis or dynamically.
Create a new Modelfile from the base model specifying desired num_ctx
parameter by default. Here's an example:
# 1. Export Modelfile for the LLM:
harbor ollama show --modelfile model > Modelfile
# 2. Edit the Modelfile to include the desired num_ctx:
# FROM model
# PARAMETER num_ctx 128000
code ./Modelfile
# 3. Put modelfile into a folder that is shared with ollama service:
cp Modelfile $(harbor home)/ollama/modelfiles/Modelfile
# 4. Import the Modelfile back:
# "/modelfiles" is where the shared folder from above is mounted
harbor ollama create -f /modelfiles/Modelfile model-128k
# 4. Verify the import
harbor ollama show model-128k
Open WebUI might display connection errors when trying to reach default ComfyUI installation in Harbor.
Unfortunately, Open WebUI doesn't support any kind of auth settings for ComfyUI connections. So Harbor's ComfyUI auth must be disabled for the integration to work.
harbor comfyui auth false