Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rpc error code #25

Open
JamborJan opened this issue Mar 8, 2024 · 4 comments
Open

rpc error code #25

JamborJan opened this issue Mar 8, 2024 · 4 comments

Comments

@JamborJan
Copy link

I have setup the local-ai container as described and downloaded the suggested models in the main readme. Whenever I am running a request through nextclouds ai assistant or via local command line I get this error in the container logs:

rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing: dial tcp 127.0.0.1:43575: connect: connection refused"

When running a test within the container:

LOCALAI=http://localhost:8080

curl $LOCALAI/v1/chat/completions -H "Content-Type: application/json" -d '{
     "model": "gpt4all-j", 
     "messages": [{"role": "user", "content": "How are you?"}],
     "temperature": 2 
   }'

I get this error:

{"error":{"code":500,"message":"could not load model: rpc error: code = Unknown desc = failed loading model","type":""}}

I found an issue related to that in the upstream repo: mudler/LocalAI#771 (comment)

As I am not sure where the root cause is, I allow myself to create this issue here, so that others also find that if they encounter this issue.

@JamborJan
Copy link
Author

It is important to note, that if you are running docker on a VM, you have to ensure that avx2 cpu features are enabled. You can check this with grep avx2 /proc/cpuinfo. If there is no result, the required features are not available. To solve that you can adjust the hardware settings of the VM and choose the CPU type host. After that I was able to run the test.

But as I have a GPU installed it would be beneficial to have the GPU used. CPU is spiking to 1600% when a prompt is send.

According to the docs, a different image should be taken in that case.

docker run -p 8080:8080 --gpus all --name local-ai -ti localai/localai:latest-aio-gpu-nvidia-cuda-12

This is discussed here: #26

@bpoulliot
Copy link

Upgrade to 8.0 broke my functioning setup, upgrade to 8.1 doesn't change anything. I can get responses from Assistant for text generation, but it appears to be severely limited and doesn't understand the tasks defined. Image generation is entirely non-functional. Running via Docker on Ubuntu.

@szaimen
Copy link
Owner

szaimen commented May 31, 2024

Hi, can you check if it works now after I changed the docker tag to v2.16.0-aio-cpu with #41 and pushed a new container update?

@lexiconzero
Copy link

For me this still does not work with the latest version of everything.

Error logs similar to this:
�[90m11:06AM�[0m �[32mINF�[0m �[1mLocalAI API is listening! Please connect to the endpoint for API documentation.�[0m �[36mendpoint=�[0mhttp://0.0.0.0:8080 �[90m11:07AM�[0m �[32mINF�[0m �[1mSuccess�[0m �[36mip=�[0m127.0.0.1 �[36mlatency=�[0m"201.922µs" �[36mmethod=�[0mGET �[36mstatus=�[0m200 �[36murl=�[0m/readyz �[90m11:07AM�[0m �[32mINF�[0m �[1mTrying to load the model 'gpt-3.5-turbo' with the backend '[llama-cpp llama-ggml gpt4all llama-cpp-fallback piper rwkv stablediffusion whisper huggingface bert-embeddings /build/backend/python/transformers/run.sh /build/backend/python/vllm/run.sh /build/backend/python/diffusers/run.sh /build/backend/python/exllama/run.sh /build/backend/python/sentencetransformers/run.sh /build/backend/python/sentencetransformers/run.sh /build/backend/python/rerankers/run.sh /build/backend/python/vall-e-x/run.sh /build/backend/python/autogptq/run.sh /build/backend/python/bark/run.sh /build/backend/python/exllama2/run.sh /build/backend/python/parler-tts/run.sh /build/backend/python/transformers-musicgen/run.sh /build/backend/python/petals/run.sh /build/backend/python/mamba/run.sh /build/backend/python/openvoice/run.sh /build/backend/python/coqui/run.sh]'�[0m �[90m11:07AM�[0m �[32mINF�[0m �[1m[llama-cpp] Attempting to load�[0m �[90m11:07AM�[0m �[32mINF�[0m �[1mLoading model 'gpt-3.5-turbo' with backend llama-cpp�[0m �[90m11:07AM�[0m �[32mINF�[0m �[1m[llama-cpp] attempting to load with AVX2 variant�[0m �[90m11:07AM�[0m �[32mINF�[0m �[1m[llama-cpp] Fails: could not load model: rpc error: code = Canceled desc = �[0m

I do have AVX2 on my CPU and the QEMU config is set to 'host'

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants