-
Notifications
You must be signed in to change notification settings - Fork 9.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
llama.cpp server with LLava stuck after image is uploaded on the first question #3798
Closed
4 tasks done
Labels
bug
Something isn't working
Comments
Same issue here: Using server API directly, this is the log.
Using server API directly, just omitting the image_data parameter, works.this is the log.
|
ggerganov
added a commit
that referenced
this issue
Oct 26, 2023
Should be fixed now |
Yesterday I was unable to get Llava to work with the following commands and the latest build. Just wanted to say that fixed the issue for me. Thanks! Commands run:
|
Working here too. Thank! |
Working fine. |
mattgauf
added a commit
to mattgauf/llama.cpp
that referenced
this issue
Oct 27, 2023
* master: (350 commits) speculative : ensure draft and target model vocab matches (ggerganov#3812) llama : correctly report GGUFv3 format (ggerganov#3818) simple : fix batch handling (ggerganov#3803) cuda : improve text-generation and batched decoding performance (ggerganov#3776) server : do not release slot on image input (ggerganov#3798) batched-bench : print params at start log : disable pid in log filenames server : add parameter -tb N, --threads-batch N (ggerganov#3584) (ggerganov#3768) server : do not block system prompt update (ggerganov#3767) sync : ggml (conv ops + cuda MSVC fixes) (ggerganov#3765) cmake : add missed dependencies (ggerganov#3763) cuda : add batched cuBLAS GEMM for faster attention (ggerganov#3749) Add more tokenizer tests (ggerganov#3742) metal : handle ggml_scale for n%4 != 0 (close ggerganov#3754) Revert "make : add optional CUDA_NATIVE_ARCH (ggerganov#2482)" issues : separate bug and enhancement template + no default title (ggerganov#3748) Update special token handling in conversion scripts for gpt2 derived tokenizers (ggerganov#3746) llama : remove token functions with `context` args in favor of `model` (ggerganov#3720) Fix baichuan convert script not detecing model (ggerganov#3739) make : add optional CUDA_NATIVE_ARCH (ggerganov#2482) ...
olexiyb
pushed a commit
to Sanctum-AI/llama.cpp
that referenced
this issue
Nov 23, 2023
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Prerequisites
Please answer the following questions for yourself before submitting an issue.
Expected Behavior
Please provide a detailed written description of what you were trying to do, and what you expected
llama.cpp
to do.Current Behavior
Please provide a detailed written description of what
llama.cpp
did, instead.Environment and Context
Running
./server -t 4 -c 4096 -ngl 50 -m /Users/slava/Documents/Development/private/AI/Models/llava1.5/ggml-model-q5_k.gguf --host 0.0.0.0 --port 8007 --mmproj /Users/slava/Documents/Development/private/AI/Models/llava1.5/mmproj-model-f16.gguf
Environment info:
Failure Information (for bugs)
The inference is stuck, no output.
Steps to Reproduce
rec.mp4
Failure Logs
Run log
The text was updated successfully, but these errors were encountered: