Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LLAMA_METAL=1 and LLAMA_MPI=1 incompatible? #2166

Closed
magnusviri opened this issue Jul 10, 2023 · 5 comments · Fixed by #2208
Closed

LLAMA_METAL=1 and LLAMA_MPI=1 incompatible? #2166

magnusviri opened this issue Jul 10, 2023 · 5 comments · Fixed by #2208

Comments

@magnusviri
Copy link
Contributor

When following the instructions for MPI (#2099) I get a build error.

> LLAMA_METAL=1 make CC=/opt/homebrew/bin/mpicc CXX=/opt/homebrew/bin/mpicxx LLAMA_MPI=1
I llama.cpp build info:
I UNAME_S:  Darwin
I UNAME_P:  arm
I UNAME_M:  arm64
I CFLAGS:   -I.              -O3 -std=c11   -fPIC -DNDEBUG -Wall -Wextra -Wpedantic -Wcast-qual -Wdouble-promotion -Wshadow -Wstrict-prototypes -Wpointer-arith -pthread -DGGML_USE_K_QUANTS -DGGML_USE_ACCELERATE -DGGML_USE_MPI -Wno-cast-qual
I CXXFLAGS: -I. -I./examples -O3 -std=c++11 -fPIC -DNDEBUG -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wno-multichar -pthread -DGGML_USE_K_QUANTS -DGGML_USE_MPI -Wno-cast-qual
I LDFLAGS:   -framework Accelerate
I CC:       Apple clang version 14.0.0 (clang-1400.0.29.202)
I CXX:      Apple clang version 14.0.0 (clang-1400.0.29.202)

/opt/homebrew/bin/mpicc  -I.              -O3 -std=c11   -fPIC -DNDEBUG -Wall -Wextra -Wpedantic -Wcast-qual -Wdouble-promotion -Wshadow -Wstrict-prototypes -Wpointer-arith -pthread -DGGML_USE_K_QUANTS -DGGML_USE_ACCELERATE -DGGML_USE_MPI -Wno-cast-qual   -c ggml.c -o ggml.o
/opt/homebrew/bin/mpicxx -I. -I./examples -O3 -std=c++11 -fPIC -DNDEBUG -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wno-multichar -pthread -DGGML_USE_K_QUANTS -DGGML_USE_MPI -Wno-cast-qual -c llama.cpp -o llama.o
/opt/homebrew/bin/mpicxx -I. -I./examples -O3 -std=c++11 -fPIC -DNDEBUG -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wno-multichar -pthread -DGGML_USE_K_QUANTS -DGGML_USE_MPI -Wno-cast-qual -c examples/common.cpp -o common.o
/opt/homebrew/bin/mpicc -I.              -O3 -std=c11   -fPIC -DNDEBUG -Wall -Wextra -Wpedantic -Wcast-qual -Wdouble-promotion -Wshadow -Wstrict-prototypes -Wpointer-arith -pthread -DGGML_USE_K_QUANTS -DGGML_USE_ACCELERATE -DGGML_USE_MPI -Wno-cast-qual   -c -o k_quants.o k_quants.c
/opt/homebrew/bin/mpicc -I.              -O3 -std=c11   -fPIC -DNDEBUG -Wall -Wextra -Wpedantic -Wcast-qual -Wdouble-promotion -Wshadow -Wstrict-prototypes -Wpointer-arith -pthread -DGGML_USE_K_QUANTS -DGGML_USE_ACCELERATE -DGGML_USE_MPI -Wno-cast-qual -c ggml-mpi.c -o ggml-mpi.o
CFLAGS   += -DGGML_USE_METAL -DGGML_METAL_NDEBUG
make: CFLAGS: No such file or directory
make: *** [ggml-mpi.o] Error 1

If I run make again it finishes and produces a functional main that is capable of mpi. But the resulting binary claims it wasn't built with GPU support so it ignores --n-gpu-layers. Example:

> ./main -m orca-mini-v2_7b.ggmlv3.q6_K.bin -n 128 --gpu-layers 1 -p "Q. What is the capital of Germany? A. Berlin. Q. What is the capital of France? A."
warning: not compiled with GPU offload support, --n-gpu-layers option will be ignored
warning: see main README.md for information on enabling GPU BLAS support
main: build = 813 (5656d10)
main: seed  = 1689022667
llama.cpp: loading model from orca-mini-v2_7b.ggmlv3.q6_K.bin
llama_model_load_internal: format     = ggjt v3 (latest)
llama_model_load_internal: n_vocab    = 32000
llama_model_load_internal: n_ctx      = 512
...

I tried to figure this out but I'm not that great with make (and gcc and etc). If I build with either LLAMA_METAL or LLAMA_MPI it works. It's when they're both together that it errors out.

I'm on macOS 13 and the latest commit (5656d10). I've got mpich installed with homebrew.

@evanmiller
Copy link
Contributor

The original PR was not designed to work with both, but this would be a great addition. Probably we should have a compile-time message that you need to choose one or the other.

@magnusviri
Copy link
Contributor Author

Would it be possible to have a cli switch that toggled between the 2?

@ggerganov
Copy link
Owner

ggerganov commented Jul 11, 2023

The original PR was not designed to work with both, but this would be a great addition. Probably we should have a compile-time message that you need to choose one or the other.

Actually I fixed that with the last commit before merging

The problem in OP is that the way the Makefile is currently written is wrong. Once you use the flags in a command, they cannot be modified, so all commands have to be at the end. Will need some work to fix this

In any case, you can simply use cmake -DLLAMA_MPI=ON -DLLAMA_METAL=ON and it will work

magnusviri added a commit to magnusviri/llama.cpp that referenced this issue Jul 13, 2023
Fixes ggerganov#2166 by moving commands after the CFLAGS are changed.
ggerganov pushed a commit that referenced this issue Jul 14, 2023
Fixes #2166 by moving commands after the CFLAGS are changed.
@AlphaSue
Copy link

"Each process will use roughly an equal amount of RAM", is there any way to specify ratio of split ram? I have two mac one is 32gb and another is 16gb, wanna test on 65B model which requires about 40gb. @ggerganov

@ggerganov
Copy link
Owner

You can try to launch 2 nodes on the 32GB mac and one node on the 16GB mac

YuMJie pushed a commit to YuMJie/powerinfer that referenced this issue Oct 25, 2024
Fixes ggerganov/llama.cpp#2166 by moving commands after the CFLAGS are changed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants