Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sync : ggml #2573

Merged
merged 41 commits into from
Nov 20, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
41 commits
Select commit Hold shift + click to select a range
06c86c0
scripts : update sync
ggerganov Nov 19, 2024
ce58be7
ggml : build backends as libraries (llama/10256)
slaren Nov 14, 2024
41c9065
backend cpu: add online flow for aarch64 Q4_0 GEMV/GEMM kernels (llam…
chaxu01 Nov 15, 2024
04d1bae
sycl: Use syclcompat::dp4a (llama/10267)
Rbiessy Nov 15, 2024
0df66d6
AVX BF16 and single scale quant optimizations (llama/10212)
netrunnereve Nov 15, 2024
1d49a2e
cmake : restore CMakeLists.txt (llama/10256)
ggerganov Nov 15, 2024
8dffd64
sync : leftovers (ggml/0)
ggerganov Nov 15, 2024
83c7739
ggml : fix some build issues
slaren Nov 15, 2024
f33c7ea
ggml : remove duplicated sources from the last sync (ggml/1017)
ggerganov Nov 15, 2024
adf81dc
ggml: new optimization interface (ggml/988)
JohannesGaessler Nov 16, 2024
4b8ddfb
Make updates to fix issues with clang-cl builds while using AVX512 fl…
Srihari-mcw Nov 15, 2024
49ca481
ggml : optimize Q4_0 into Q4_0_X_Y repack (llama/10324)
eddnjjn Nov 16, 2024
68b198b
vulkan: Optimize some mat-vec mul quant shaders (llama/10296)
jeffbolznv Nov 16, 2024
7caa6b2
llamafile : fix include path (llama/0)
ggerganov Nov 16, 2024
e726307
ggml : fix compile warnings (llama/0)
ggerganov Nov 16, 2024
600728e
ggml : adapt AMX to tensor->grad removal (llama/0)
ggerganov Nov 16, 2024
3f1a78d
ggml : inttypes.h -> cinttypes (llama/0)
ggerganov Nov 16, 2024
c96434f
ggml : fix possible buffer use after free in sched reserve (llama/9930)
slaren Nov 17, 2024
77ea626
CMake: default to -arch=native for CUDA build (llama/10320)
JohannesGaessler Nov 17, 2024
8bd8688
CUDA: remove DMMV, consolidate F16 mult mat vec (llama/10318)
JohannesGaessler Nov 17, 2024
a901ba0
ggml : fix undefined reference to 'getcpu' (llama/10354)
FirstTimeEZ Nov 17, 2024
dca00d8
metal : refactor kernel args into structs (llama/10238)
ggerganov Nov 17, 2024
6b4de57
llama : only use default buffer types for the KV cache (llama/10358)
slaren Nov 17, 2024
fcd8ea6
CMake: fix typo in comment [no ci] (llama/10360)
JohannesGaessler Nov 17, 2024
58b5fc4
CUDA: fix MMV kernel being used for FP16 src1 (llama/10357)
JohannesGaessler Nov 17, 2024
937684c
metal : add `GGML_UNARY_OP_ELU` kernel (ggml/1018)
PABannier Nov 18, 2024
748d633
Vulkan: Fix device info output format specifiers (llama/10366)
0cc4m Nov 18, 2024
c157f62
metal : fox offset integer overflows in im2col (ggml/1015)
pminev Nov 18, 2024
c4f4639
vulkan: remove use of null initializer (llama/10372)
jeffbolznv Nov 18, 2024
761d310
cuda : only use native when supported by cmake (llama/10389)
slaren Nov 18, 2024
8d6e30f
sycl: Revert MUL_MAT_OP support changes (llama/10385)
Alcpz Nov 19, 2024
29894ef
vulkan: Optimize soft_max (llama/10301)
jeffbolznv Nov 19, 2024
d2aaf9e
sycl : Add option to set the SYCL architecture for all targets (llama…
Rbiessy Nov 19, 2024
166237d
cuda : fix CUDA_FLAGS not being applied (llama/10403)
slaren Nov 19, 2024
bfaf1fc
Add required ggml-base and backend libs to cmake pkg (llama/10407)
bandoti Nov 19, 2024
52799f9
ggml : sync resolve (skip) (#0)
ggerganov Nov 19, 2024
0eddc9f
sync : ggml
ggerganov Nov 19, 2024
4e1f516
talk-llama : sync llama.cpp
ggerganov Nov 19, 2024
8c24c64
whisper : adapt to new ggml (wip)
ggerganov Nov 19, 2024
c800966
ggml/sched : do not skip views in pre-assignments
slaren Nov 20, 2024
e611417
whisper : use backend registry (#0)
ggerganov Nov 20, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 4 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -1,5 +1,6 @@
*.o
*.a
*.d
.cache/
.coreml/
.test/
Expand All @@ -19,6 +20,9 @@ build-*/
.swiftpm
*.metallib

ggml-metal-embed.metal
ggml-metal-embed.metal.tmp

/main
/stream
/command
Expand Down
257 changes: 101 additions & 156 deletions Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -444,17 +444,17 @@ endif
else
MK_CFLAGS += -march=rv64gcv -mabi=lp64d
MK_CXXFLAGS += -march=rv64gcv -mabi=lp64d
endif
endif # RISCV

ifndef GGML_NO_ACCELERATE
# Mac OS - include Accelerate framework.
# `-framework Accelerate` works both with Apple Silicon and Mac Intel
ifeq ($(UNAME_S),Darwin)
MK_CPPFLAGS += -DGGML_USE_ACCELERATE -DGGML_USE_BLAS
MK_CPPFLAGS += -DGGML_USE_ACCELERATE -DGGML_USE_BLAS -DGGML_BLAS_USE_ACCELERATE
MK_CPPFLAGS += -DACCELERATE_NEW_LAPACK
MK_CPPFLAGS += -DACCELERATE_LAPACK_ILP64
MK_LDFLAGS += -framework Accelerate
OBJ_GGML += ggml/src/ggml-blas.o
OBJ_GGML += ggml/src/ggml-blas/ggml-blas.o
endif
endif # GGML_NO_ACCELERATE

Expand All @@ -464,29 +464,38 @@ ifndef GGML_NO_OPENMP
MK_CXXFLAGS += -fopenmp
endif # GGML_NO_OPENMP

ifdef WHISPER_COREML
MK_CXXFLAGS += -DWHISPER_USE_COREML
LDFLAGS += -framework Foundation -framework CoreML

ifdef WHISPER_COREML_ALLOW_FALLBACK
MK_CXXFLAGS += -DWHISPER_COREML_ALLOW_FALLBACK
endif
endif # WHISPER_COREML

ifdef GGML_OPENBLAS
MK_CPPFLAGS += -DGGML_USE_BLAS $(shell pkg-config --cflags-only-I openblas)
MK_CFLAGS += $(shell pkg-config --cflags-only-other openblas)
MK_LDFLAGS += $(shell pkg-config --libs openblas)
OBJ_GGML += ggml/src/ggml-blas.o
OBJ_GGML += ggml/src/ggml-blas/ggml-blas.o
endif # GGML_OPENBLAS

ifdef GGML_OPENBLAS64
MK_CPPFLAGS += -DGGML_USE_BLAS $(shell pkg-config --cflags-only-I openblas64)
MK_CFLAGS += $(shell pkg-config --cflags-only-other openblas64)
MK_LDFLAGS += $(shell pkg-config --libs openblas64)
OBJ_GGML += ggml/src/ggml-blas.o
OBJ_GGML += ggml/src/ggml-blas/ggml-blas.o
endif # GGML_OPENBLAS64

ifdef GGML_BLIS
MK_CPPFLAGS += -DGGML_USE_BLAS -I/usr/local/include/blis -I/usr/include/blis
MK_LDFLAGS += -lblis -L/usr/local/lib
OBJ_GGML += ggml/src/ggml-blas.o
OBJ_GGML += ggml/src/ggml-blas/ggml-blas.o
endif # GGML_BLIS

ifdef GGML_RPC
MK_CPPFLAGS += -DGGML_USE_RPC
OBJ_GGML += ggml/src/ggml-rpc.o
OBJ_GGML += ggml/src/ggml-rpc/ggml-rpc.o
endif # GGML_RPC

OBJ_CUDA_TMPL = $(patsubst %.cu,%.o,$(wildcard ggml/src/ggml-cuda/template-instances/fattn-wmma*.cu))
Expand All @@ -513,7 +522,7 @@ ifdef GGML_CUDA
MK_LDFLAGS += -lcuda -lcublas -lculibos -lcudart -lcublasLt -lpthread -ldl -lrt -L$(CUDA_PATH)/lib64 -L/usr/lib64 -L$(CUDA_PATH)/targets/$(UNAME_M)-linux/lib -L$(CUDA_PATH)/lib64/stubs -L/usr/lib/wsl/lib
MK_NVCCFLAGS += -use_fast_math

OBJ_GGML += ggml/src/ggml-cuda.o
OBJ_GGML += ggml/src/ggml-cuda/ggml-cuda.o
OBJ_GGML += $(patsubst %.cu,%.o,$(wildcard ggml/src/ggml-cuda/*.cu))
OBJ_GGML += $(OBJ_CUDA_TMPL)
ifdef WHISPER_FATAL_WARNINGS
Expand Down Expand Up @@ -615,11 +624,11 @@ ggml/src/ggml-cuda/%.o: \
ggml/src/ggml-cuda/common.cuh
$(NVCC_COMPILE)

ggml/src/ggml-cuda.o: \
ggml/src/ggml-cuda.cu \
ggml/src/ggml-cuda/ggml-cuda.o: \
ggml/src/ggml-cuda/ggml-cuda.cu \
ggml/include/ggml-cuda.h \
ggml/include/ggml.h \
ggml/include/ggml-backend.h \
ggml/include/ggml-cuda.h \
ggml/src/ggml-backend-impl.h \
ggml/src/ggml-common.h \
$(wildcard ggml/src/ggml-cuda/*.cuh)
Expand Down Expand Up @@ -742,50 +751,43 @@ endif # GGML_HIPBLAS
ifdef GGML_METAL
MK_CPPFLAGS += -DGGML_USE_METAL
MK_LDFLAGS += -framework Foundation -framework Metal -framework MetalKit
OBJ_GGML += ggml/src/ggml-metal.o
OBJ_GGML += ggml/src/ggml-metal/ggml-metal.o
ifdef GGML_METAL_NDEBUG
MK_CPPFLAGS += -DGGML_METAL_NDEBUG
endif

ifdef GGML_METAL_EMBED_LIBRARY
MK_CPPFLAGS += -DGGML_METAL_EMBED_LIBRARY
OBJ_GGML += ggml/src/ggml-metal-embed.o
OBJ_GGML += ggml/src/ggml-metal/ggml-metal-embed.o
endif
endif # GGML_METAL

ifdef WHISPER_COREML
MK_CXXFLAGS += -DWHISPER_USE_COREML
LDFLAGS += -framework Foundation -framework CoreML

ifdef WHISPER_COREML_ALLOW_FALLBACK
MK_CXXFLAGS += -DWHISPER_COREML_ALLOW_FALLBACK
endif
endif

# ===

ifdef GGML_METAL
ggml/src/ggml-metal.o: \
ggml/src/ggml-metal.m \
ggml/src/ggml-metal/ggml-metal.o: \
ggml/src/ggml-metal/ggml-metal.m \
ggml/src/ggml-metal/ggml-metal-impl.h \
ggml/include/ggml-metal.h \
ggml/include/ggml.h
$(CC) $(CFLAGS) -c $< -o $@

ifdef GGML_METAL_EMBED_LIBRARY
ggml/src/ggml-metal-embed.o: \
ggml/src/ggml-metal.metal \
ggml/src/ggml-metal/ggml-metal-embed.o: \
ggml/src/ggml-metal/ggml-metal.metal \
ggml/src/ggml-metal/ggml-metal-impl.h \
ggml/src/ggml-common.h
@echo "Embedding Metal library"
@sed -e '/#include "ggml-common.h"/r ggml/src/ggml-common.h' -e '/#include "ggml-common.h"/d' < ggml/src/ggml-metal.metal > ggml/src/ggml-metal-embed.metal
$(eval TEMP_ASSEMBLY=$(shell mktemp))
@echo ".section __DATA, __ggml_metallib" > $(TEMP_ASSEMBLY)
@echo ".globl _ggml_metallib_start" >> $(TEMP_ASSEMBLY)
@echo "_ggml_metallib_start:" >> $(TEMP_ASSEMBLY)
@echo ".incbin \"ggml/src/ggml-metal-embed.metal\"" >> $(TEMP_ASSEMBLY)
@echo ".globl _ggml_metallib_end" >> $(TEMP_ASSEMBLY)
@echo "_ggml_metallib_end:" >> $(TEMP_ASSEMBLY)
@$(AS) $(TEMP_ASSEMBLY) -o $@
@rm -f ${TEMP_ASSEMBLY}
@sed -e '/__embed_ggml-common.h__/r ggml/src/ggml-common.h' -e '/__embed_ggml-common.h__/d' < ggml/src/ggml-metal/ggml-metal.metal > ggml/src/ggml-metal/ggml-metal-embed.metal.tmp
@sed -e '/#include "ggml-metal-impl.h"/r ggml/src/ggml-metal/ggml-metal-impl.h' -e '/#include "ggml-metal-impl.h"/d' < ggml/src/ggml-metal/ggml-metal-embed.metal.tmp > ggml/src/ggml-metal/ggml-metal-embed.metal
$(eval TEMP_ASSEMBLY=$(shell mktemp -d))
@echo ".section __DATA, __ggml_metallib" > $(TEMP_ASSEMBLY)/ggml-metal-embed.s
@echo ".globl _ggml_metallib_start" >> $(TEMP_ASSEMBLY)/ggml-metal-embed.s
@echo "_ggml_metallib_start:" >> $(TEMP_ASSEMBLY)/ggml-metal-embed.s
@echo ".incbin \"ggml/src/ggml-metal/ggml-metal-embed.metal\"" >> $(TEMP_ASSEMBLY)/ggml-metal-embed.s
@echo ".globl _ggml_metallib_end" >> $(TEMP_ASSEMBLY)/ggml-metal-embed.s
@echo "_ggml_metallib_end:" >> $(TEMP_ASSEMBLY)/ggml-metal-embed.s
$(CC) $(CFLAGS) -c $(TEMP_ASSEMBLY)/ggml-metal-embed.s -o $@
@rm -f ${TEMP_ASSEMBLY}/ggml-metal-embed.s
@rmdir ${TEMP_ASSEMBLY}
endif
endif # GGML_METAL

Expand All @@ -801,11 +803,17 @@ endif

OBJ_GGML += \
ggml/src/ggml.o \
ggml/src/ggml-cpu.o \
ggml/src/ggml-aarch64.o \
ggml/src/ggml-alloc.o \
ggml/src/ggml-backend.o \
ggml/src/ggml-backend-reg.o \
ggml/src/ggml-opt.o \
ggml/src/ggml-quants.o \
ggml/src/ggml-aarch64.o
ggml/src/ggml-threading.o \
ggml/src/ggml-cpu/ggml-cpu.o \
ggml/src/ggml-cpu/ggml-cpu-cpp.o \
ggml/src/ggml-cpu/ggml-cpu-aarch64.o \
ggml/src/ggml-cpu/ggml-cpu-quants.o

OBJ_WHISPER += \
src/whisper.o
Expand Down Expand Up @@ -910,114 +918,64 @@ endif
# Build libraries
#

# ggml

ggml/src/ggml.o: \
ggml/src/ggml.c \
ggml/include/ggml.h
$(CC) $(CFLAGS) -c $< -o $@

ggml/src/ggml-cpu.o: \
ggml/src/ggml-cpu.c \
ggml/include/ggml.h \
ggml/src/ggml-common.h
$(CC) $(CFLAGS) -c $< -o $@

ggml/src/ggml-alloc.o: \
ggml/src/ggml-alloc.c \
ggml/include/ggml.h \
ggml/include/ggml-alloc.h
$(CC) $(CFLAGS) -c $< -o $@

ggml/src/ggml-backend.o: \
ggml/src/ggml-backend.cpp \
ggml/include/ggml.h \
ggml/include/ggml-backend.h
$(CXX) $(CXXFLAGS) -c $< -o $@

ggml/src/ggml-quants.o: \
ggml/src/ggml-quants.c \
ggml/include/ggml.h \
ggml/src/ggml-quants.h \
ggml/src/ggml-common.h
$(CC) $(CFLAGS) -c $< -o $@

ggml/src/ggml-aarch64.o: \
ggml/src/ggml-aarch64.c \
ggml/include/ggml.h \
ggml/src/ggml-aarch64.h \
ggml/src/ggml-common.h
$(CC) $(CFLAGS) -c $< -o $@
LIB_GGML = libggml.so
LIB_GGML_S = libggml.a

ggml/src/ggml-blas.o: \
ggml/src/ggml-blas.cpp \
ggml/include/ggml-blas.h
$(CXX) $(CXXFLAGS) -c $< -o $@
LIB_LLAMA = libllama.so
LIB_LLAMA_S = libllama.a

ifdef GGML_LLAMAFILE
ggml/src/sgemm.o: \
ggml/src/sgemm.cpp \
ggml/src/sgemm.h \
ggml/include/ggml.h
$(CXX) $(CXXFLAGS) -c $< -o $@
endif # GGML_LLAMAFILE
LIB_COMMON = libcommon.so
LIB_COMMON_S = libcommon.a

ifdef GGML_RPC
ggml/src/ggml-rpc.o: \
ggml/src/ggml-rpc.cpp \
ggml/include/ggml-rpc.h
$(CXX) $(CXXFLAGS) -c $< -o $@
endif # GGML_RPC
LIB_COMMON_SDL = libcommon-sdl.so
LIB_COMMON_SDL_S = libcommon-sdl.a

$(LIB_GGML): \
$(OBJ_GGML)
$(CXX) $(CXXFLAGS) -shared -fPIC -o $@ $^ $(LDFLAGS)
# Targets
BUILD_TARGETS += $(LIB_GGML) $(LIB_GGML_S) $(LIB_LLAMA) $(LIB_LLAMA_S) $(LIB_COMMON) $(LIB_COMMON_S)

$(LIB_GGML_S): \
$(OBJ_GGML)
ar rcs $(LIB_GGML_S) $^
# Dependency files
DEP_FILES = $(OBJ_GGML:.o=.d) $(OBJ_LLAMA:.o=.d) $(OBJ_COMMON:.o=.d)

# whisper
# Default target
all: $(BUILD_TARGETS)

src/whisper.o: \
src/whisper.cpp \
include/whisper.h \
# Note: need this exception because `ggml-cpu.c` and `ggml-cpu.cpp` both produce the same obj/dep files
# g++ -M -I ./ggml/include/ -I ./ggml/src ggml/src/ggml-cpu/ggml-cpu.cpp | grep ggml
ggml/src/ggml-cpu/ggml-cpu-cpp.o: \
ggml/src/ggml-cpu/ggml-cpu.cpp \
ggml/include/ggml-backend.h \
ggml/include/ggml.h \
ggml/include/ggml-alloc.h \
ggml/include/ggml-backend.h \
ggml/include/ggml-cuda.h \
ggml/include/ggml-metal.h
$(CXX) $(CXXFLAGS) -c $< -o $@
ggml/src/ggml-backend-impl.h \
ggml/include/ggml-cpu.h \
ggml/src/ggml-impl.h
$(CXX) $(CXXFLAGS) -c $< -o $@

$(LIB_WHISPER): \
$(OBJ_WHISPER) \
$(LIB_GGML)
$(CXX) $(CXXFLAGS) -shared -fPIC -o $@ $^ $(LDFLAGS)
# Rules for building object files
ggml/%.o: ggml/%.c
$(CC) $(CFLAGS) -MMD -c $< -o $@

$(LIB_WHISPER_S): \
$(OBJ_WHISPER) \
$(OBJ_GGML)
ar rcs $(LIB_WHISPER_S) $^
ggml/%.o: ggml/%.cpp
$(CXX) $(CXXFLAGS) -MMD -c $< -o $@

# common
src/%.o: src/%.cpp
$(CXX) $(CXXFLAGS) -MMD -c $< -o $@

examples/common.o: \
examples/common.cpp \
examples/common.h
$(CXX) $(CXXFLAGS) -c $< -o $@
examples/%.o: examples/%.cpp
$(CXX) $(CXXFLAGS) -MMD -c $< -o $@

examples/common-ggml.o: \
examples/common-ggml.cpp \
examples/common-ggml.h
$(CXX) $(CXXFLAGS) -c $< -o $@
# Rules for building libraries
$(LIB_GGML): $(OBJ_GGML)
$(CXX) $(CXXFLAGS) -shared -fPIC -o $@ $^ $(LDFLAGS)

$(LIB_GGML_S): $(OBJ_GGML)
ar rcs $(LIB_GGML_S) $^

$(LIB_COMMON): \
$(OBJ_COMMON)
$(LIB_LLAMA): $(OBJ_LLAMA) $(LIB_GGML)
$(CXX) $(CXXFLAGS) -shared -fPIC -o $@ $^ $(LDFLAGS)

$(LIB_COMMON_S): \
$(OBJ_COMMON)
ar rcs $(LIB_COMMON_S) $^
$(LIB_LLAMA_S): $(OBJ_LLAMA)
ar rcs $(LIB_LLAMA_S) $^

# common-sdl

Expand All @@ -1029,34 +987,21 @@ examples/common-sdl.o: \
examples/common-sdl.h
$(CXX) $(CXXFLAGS) $(CFLAGS_SDL) -c $< -o $@

$(LIB_COMMON_SDL): \
$(OBJ_SDL)
$(CXX) $(CXXFLAGS) -shared -fPIC -o $@ $^ $(LDFLAGS) $(LDFLAGS_SDL)
$(LIB_COMMON): $(OBJ_COMMON) $(LIB_LLAMA) $(LIB_GGML)
$(CXX) $(CXXFLAGS) -shared -fPIC -o $@ $^ $(LDFLAGS)

$(LIB_COMMON_S): $(OBJ_COMMON)
ar rcs $(LIB_COMMON_S) $^

$(LIB_COMMON_SDL_S): \
$(OBJ_SDL)
ar rcs $(LIB_COMMON_SDL_S) $^
# Include dependency files
-include $(DEP_FILES)

# Clean rule
clean:
rm -vrf *.dot $(BUILD_TARGETS) $(TEST_TARGETS)
rm -rvf src/*.o
rm -rvf src/coreml/*.o
rm -rvf tests/*.o
rm -rvf examples/*.o
rm -rvf *.a
rm -rvf *.dll
rm -rvf *.so
rm -rvf *.dot
rm -rvf ggml/*.a
rm -rvf ggml/*.dll
rm -rvf ggml/*.so
rm -vrf ggml/src/*.o
rm -vrf ggml/src/ggml-metal-embed.metal
rm -vrf ggml/src/ggml-cuda/*.o
rm -vrf ggml/src/ggml-cuda/template-instances/*.o
rm -rvf $(BUILD_TARGETS)
rm -rvf $(TEST_TARGETS)
find examples -type f -name "*.o" -delete
rm -vrf $(BUILD_TARGETS) $(TEST_TARGETS)
rm -rvf *.a *.dll *.so *.dot
find ggml src tests examples -type f -name "*.o" -delete
find ggml src tests examples -type f -name "*.d" -delete

#
# Examples
Expand Down
Loading
Loading