Implementations for Q4_0_8_8 quantization based functions in AVX2 SIMD architecture #8713

Srihari-mcw · 2024-07-26T15:29:59Z

The PR contains replication of Q4_0_8_8 quantization based functions for x86/x64 SIMD Architectures
The PR contains AVX2 implementations for quantize_q8_0_4x8, ggml_gemv_q4_0_8x8_q8_0 and ggml_gemm_q4_0_8x8_q8_0 functions
Good gains were observed especially with prompt processing with the above changes compared to the current default path for Q4_0 model. Currently the Q4_0 model goes through LLAMAFILE(sgemm.cpp) implementation for mul_mat operations by default
PR introduces integer variant for mul_sum_i8_pairs function for performing dot product operations and macros for conversion from half precision to full precision based on F16C intrinsics support
Performance Details

GCC Linux :

Q4_0 Model :

model	size	params	backend	threads	test	t/s	speedup	Commit id
llama 7B Q4_0	3.56 GiB	6.74 B	CPU	6	pp 512	43.28 ± 0.08		de280085
llama 7B Q4_0_8_8	3.56 GiB	6.74 B	CPU	6	pp 512	68.04 ± 0.08	57.2%	9737b2e
llama 7B Q4_0	3.56 GiB	6.74 B	CPU	6	tg 128	14.69 ± 0.00		de280085
llama 7B Q4_0_8_8	3.56 GiB	6.74 B	CPU	6	tg 128	14.90 ± 0.01	1.4%	9737b2e

The models were quantized and tested from meta-llama2 7B model - https://huggingface.co/meta-llama/Llama-2-7b

The PR was tested in AMD Raphael 7600X which supports the following flags by default :

I have read the contributing guidelines
Self-reported review complexity:
- Low
- Medium
- High

Srihari-mcw · 2024-07-26T15:33:48Z

ggml/src/ggml-aarch64.c

+    __m256i requiredOrder = _mm256_set_epi32(3 ,2 ,1 ,0, 7 ,6, 5, 4);
+
+    // Take group of four block_q8_0x4 structures at each pass of the loop and perform dot product operation
+    for (; y < nr / 4; y += 4) {


ne11 is processed in batches of 16 in GEMM function. Leftover ne11 is processed in batches of four. Saw higher boostup in performance while processing ne11 in batches of 16 and leftover in batches of 4 versus just processing ne11 in batches of four

bartowski1182 · 2024-07-26T19:07:25Z

this isn't a new conversion type, right? it's just a new way of calculating Q4_0?

Srihari-mcw · 2024-07-27T12:08:49Z

Hi @bartowski1182 , the Q4_0_8_8 is a format of quantization where the values are stored in the same 4 bit quantized fomat, along with the same delta values as Q4_0. The 4 bit quantized quants values across eight different blocks are interleaved with each other. This was introduced in PR #5780 . Models that needs to use this particular code path, needs to be quantized in this particular format of Q4_0_8_8. Thanks

nisten · 2024-08-01T02:31:55Z

I just tested this. It works. Albeit nowhere as drastically as it helps on ARM cpus but it helps , tps on intel cpu inference of 4bit llama 405B went from 0.78 (meta-llama-405b-Q_4_NL) to 0.89 meta-llama-405b-Q_4_0_8_8 . the q4nl and q4088 shape filesize are identital.

prompt: ./llama-cli -m ~/meta-406b-q4_0_4_4.gguf -t 4 -co -p "You are a Nasa jpl engineer.Human: How to build a city on Mars via calculating Aldrin-Cycler orbits? Assistant:" -fa -e -c 512 -n 512 -t 64 -b 128

Again both x86 and ARM cpus max out batch-size-1 inference at 64 cores or threads, any more and it slows down a bit or stays stagnant.

ggerganov · 2024-09-03T06:23:35Z

Albeit nowhere as drastically as it helps on ARM cpus but it helps , tps on intel cpu inference ..

The main benefit from these changes should be in the prompt processing speed, not the text generation. Better to use llama-bench to make a comparison

ggerganov

@Srihari-mcw Could you make a perplexity comparison before merging? For example against a Q4_0 model at PPL 32 chunks

…8x8_q8_0 and ggml_gemm_q4_0_8x8_q8_0 functions

…o be processed as multiple of 16 in MSVC

Srihari-mcw · 2024-09-04T16:00:40Z

Hi @ggerganov,

The perplexity was measured for models quantized from meta llama2 7B model with the following command :
./llama-perplexity -m <model_name> -f wikitext-2-raw/wiki.test.raw --chunk-size 32

It calculated perplexity over 655 chunks :
perplexity: calculating perplexity over 655 chunks, n_ctx=512, batch_size=2048, n_seq=4

The perplexity results are tabulated as follows :

model	perplexity (Final estimate PPL)	Commit id
llama 7B Q4_0	5.9627 +/- 0.03348	c950fc306
llama 7B Q4_0_8_8	5.9625 +/- 0.03348	c950fc306

The perplexity readings were found to be almost the same post the tests

Further, post the latest changes in master branch and in the PR, the performance readings are as follows

GCC Linux :

Q4_0 Model :

model	size	params	backend	threads	test	t/s	speedup	Commit id
llama 7B Q4_0	3.56 GiB	6.74 B	CPU	6	pp 512	58.20 ± 0.10		7605ae7da (Base Commit before changes)
llama 7B Q4_0_8_8	3.56 GiB	6.74 B	CPU	6	pp 512	68.96 ± 0.08	18.48%	c950fc306
llama 7B Q4_0	3.56 GiB	6.74 B	CPU	6	tg 128	14.48 ± 0.01		7605ae7da (Base Commit before changes)
llama 7B Q4_0_8_8	3.56 GiB	6.74 B	CPU	6	tg 128	14.87 ± 0.00	2.7%	c950fc306

GCC Version = 12.3

The PR was tested in AMD Raphael 7600X which supports the following flags by default :

Thanks

slaren · 2024-09-05T09:56:03Z

It looks like the rocm compiler is crashing when compiling this code, which is breaking the generation of docker images.

2024-09-05T09:46:55.3876908Z #20 800.8 fatal error: error in backend: Instruction Combining seems stuck in an infinite loop after 1000 iterations.
2024-09-05T09:46:55.3878203Z #20 800.8 PLEASE submit a bug report to https://github.com/llvm/llvm-project/issues/ and include the crash backtrace, preprocessed source, and associated run script.
2024-09-05T09:46:55.3878349Z #20 800.8 Stack dump:
2024-09-05T09:46:55.3883113Z #20 800.8 0.	Program arguments: /opt/rocm/llvm/bin/clang -Iggml/include -Iggml/src -Iinclude -Isrc -Icommon -D_XOPEN_SOURCE=600 -D_GNU_SOURCE -DNDEBUG -DGGML_USE_OPENMP -DGGML_USE_LLAMAFILE -DGGML_USE_HIPBLAS -DGGML_USE_CUDA -std=c11 -fPIC -O3 -g -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wshadow -Wstrict-prototypes -Wpointer-arith -Wmissing-prototypes -Werror=implicit-int -Werror=implicit-function-declaration -pthread -march=native -mtune=native -fopenmp -Wunreachable-code-break -Wunreachable-code-return -Wdouble-promotion -c ggml/src/ggml-aarch64.c -o ggml/src/ggml-aarch64.o
2024-09-05T09:46:55.3883318Z #20 800.8 1.	<eof> parser at end of file
2024-09-05T09:46:55.3883453Z #20 800.8 2.	Optimizer
2024-09-05T09:46:55.3884169Z #20 800.8  #0 0x00005579e5d7d866 llvm::sys::PrintStackTrace(llvm::raw_ostream&, int) (/opt/rocm/llvm/bin/clang+0x27cf866)
2024-09-05T09:46:55.3884896Z #20 800.8  #1 0x00005579e5d7b6b4 llvm::sys::CleanupOnSignal(unsigned long) (/opt/rocm/llvm/bin/clang+0x27cd6b4)
2024-09-05T09:46:55.3885528Z #20 800.8  #2 0x00005579e5cd8877 llvm::CrashRecoveryContext::HandleExit(int) (/opt/rocm/llvm/bin/clang+0x272a877)
2024-09-05T09:46:55.3886095Z #20 800.8  #3 0x00005579e5d732c2 llvm::sys::Process::Exit(int, bool) (/opt/rocm/llvm/bin/clang+0x27c52c2)
2024-09-05T09:46:55.3886413Z #20 800.8  #4 0x00005579e44bf197 (/opt/rocm/llvm/bin/clang+0xf11197)
2024-09-05T09:46:55.3887083Z #20 800.8  #5 0x00005579e5ce1bc0 llvm::report_fatal_error(llvm::Twine const&, bool) (/opt/rocm/llvm/bin/clang+0x2733bc0)
2024-09-05T09:46:55.3889765Z #20 800.8  #6 0x00005579e58168ab combineInstructionsOverFunction(llvm::Function&, llvm::InstructionWorklist&, llvm::AAResults*, llvm::AssumptionCache&, llvm::TargetLibraryInfo&, llvm::TargetTransformInfo&, llvm::DominatorTree&, llvm::OptimizationRemarkEmitter&, llvm::BlockFrequencyInfo*, llvm::ProfileSummaryInfo*, unsigned int, llvm::LoopInfo*) InstructionCombining.cpp:0:0
2024-09-05T09:46:55.3890799Z #20 800.8  #7 0x00005579e5816c6e llvm::InstCombinePass::run(llvm::Function&, llvm::AnalysisManager<llvm::Function>&) (/opt/rocm/llvm/bin/clang+0x2268c6e)
2024-09-05T09:46:55.3892398Z #20 800.8  #8 0x00005579e61237c6 llvm::detail::PassModel<llvm::Function, llvm::InstCombinePass, llvm::PreservedAnalyses, llvm::AnalysisManager<llvm::Function>>::run(llvm::Function&, llvm::AnalysisManager<llvm::Function>&) (/opt/rocm/llvm/bin/clang+0x2b757c6)
2024-09-05T09:46:55.3894412Z #20 800.8  #9 0x00005579e4501ce1 llvm::detail::PassModel<llvm::Function, llvm::PassManager<llvm::Function, llvm::AnalysisManager<llvm::Function>>, llvm::PreservedAnalyses, llvm::AnalysisManager<llvm::Function>>::run(llvm::Function&, llvm::AnalysisManager<llvm::Function>&) (/opt/rocm/llvm/bin/clang+0xf53ce1)
2024-09-05T09:46:55.3896174Z #20 800.8 #10 0x00005579e4d6c037 llvm::CGSCCToFunctionPassAdaptor::run(llvm::LazyCallGraph::SCC&, llvm::AnalysisManager<llvm::LazyCallGraph::SCC, llvm::LazyCallGraph&>&, llvm::LazyCallGraph&, llvm::CGSCCUpdateResult&) (/opt/rocm/llvm/bin/clang+0x17be037)
2024-09-05T09:46:55.3899187Z #20 800.8 #11 0x00005579e44f5276 llvm::detail::PassModel<llvm::LazyCallGraph::SCC, llvm::CGSCCToFunctionPassAdaptor, llvm::PreservedAnalyses, llvm::AnalysisManager<llvm::LazyCallGraph::SCC, llvm::LazyCallGraph&>, llvm::LazyCallGraph&, llvm::CGSCCUpdateResult&>::run(llvm::LazyCallGraph::SCC&, llvm::AnalysisManager<llvm::LazyCallGraph::SCC, llvm::LazyCallGraph&>&, llvm::LazyCallGraph&, llvm::CGSCCUpdateResult&) (/opt/rocm/llvm/bin/clang+0xf47276)
2024-09-05T09:46:55.3901682Z #20 800.8 #12 0x00005579e4d654b9 llvm::PassManager<llvm::LazyCallGraph::SCC, llvm::AnalysisManager<llvm::LazyCallGraph::SCC, llvm::LazyCallGraph&>, llvm::LazyCallGraph&, llvm::CGSCCUpdateResult&>::run(llvm::LazyCallGraph::SCC&, llvm::AnalysisManager<llvm::LazyCallGraph::SCC, llvm::LazyCallGraph&>&, llvm::LazyCallGraph&, llvm::CGSCCUpdateResult&) (/opt/rocm/llvm/bin/clang+0x17b74b9)
2024-09-05T09:46:55.3905481Z #20 800.8 #13 0x00005579e5781256 llvm::detail::PassModel<llvm::LazyCallGraph::SCC, llvm::PassManager<llvm::LazyCallGraph::SCC, llvm::AnalysisManager<llvm::LazyCallGraph::SCC, llvm::LazyCallGraph&>, llvm::LazyCallGraph&, llvm::CGSCCUpdateResult&>, llvm::PreservedAnalyses, llvm::AnalysisManager<llvm::LazyCallGraph::SCC, llvm::LazyCallGraph&>, llvm::LazyCallGraph&, llvm::CGSCCUpdateResult&>::run(llvm::LazyCallGraph::SCC&, llvm::AnalysisManager<llvm::LazyCallGraph::SCC, llvm::LazyCallGraph&>&, llvm::LazyCallGraph&, llvm::CGSCCUpdateResult&) (/opt/rocm/llvm/bin/clang+0x21d3256)
2024-09-05T09:46:55.3907077Z #20 800.8 #14 0x00005579e4d689e1 llvm::DevirtSCCRepeatedPass::run(llvm::LazyCallGraph::SCC&, llvm::AnalysisManager<llvm::LazyCallGraph::SCC, llvm::LazyCallGraph&>&, llvm::LazyCallGraph&, llvm::CGSCCUpdateResult&) (/opt/rocm/llvm/bin/clang+0x17ba9e1)
2024-09-05T09:46:55.3910090Z #20 800.8 #15 0x00005579e5781206 llvm::detail::PassModel<llvm::LazyCallGraph::SCC, llvm::DevirtSCCRepeatedPass, llvm::PreservedAnalyses, llvm::AnalysisManager<llvm::LazyCallGraph::SCC, llvm::LazyCallGraph&>, llvm::LazyCallGraph&, llvm::CGSCCUpdateResult&>::run(llvm::LazyCallGraph::SCC&, llvm::AnalysisManager<llvm::LazyCallGraph::SCC, llvm::LazyCallGraph&>&, llvm::LazyCallGraph&, llvm::CGSCCUpdateResult&) (/opt/rocm/llvm/bin/clang+0x21d3206)
2024-09-05T09:46:55.3911158Z #20 800.8 #16 0x00005579e4d665f2 llvm::ModuleToPostOrderCGSCCPassAdaptor::run(llvm::Module&, llvm::AnalysisManager<llvm::Module>&) (/opt/rocm/llvm/bin/clang+0x17b85f2)
2024-09-05T09:46:55.3912136Z #20 800.8 #17 0x00005579e5789bd2 llvm::ModuleInlinerWrapperPass::run(llvm::Module&, llvm::AnalysisManager<llvm::Module>&) (/opt/rocm/llvm/bin/clang+0x21dbbd2)
2024-09-05T09:46:55.3913769Z #20 800.8 #18 0x00005579e7061546 llvm::detail::PassModel<llvm::Module, llvm::ModuleInlinerWrapperPass, llvm::PreservedAnalyses, llvm::AnalysisManager<llvm::Module>>::run(llvm::Module&, llvm::AnalysisManager<llvm::Module>&) (/opt/rocm/llvm/bin/clang+0x3ab3546)
2024-09-05T09:46:55.3915054Z #20 800.8 #19 0x00005579e5658195 llvm::PassManager<llvm::Module, llvm::AnalysisManager<llvm::Module>>::run(llvm::Module&, llvm::AnalysisManager<llvm::Module>&) (/opt/rocm/llvm/bin/clang+0x20aa195)
2024-09-05T09:46:55.3917062Z #20 800.8 #20 0x00005579e6134c13 (anonymous namespace)::EmitAssemblyHelper::RunOptimizationPipeline(clang::BackendAction, std::unique_ptr<llvm::raw_pwrite_stream, std::default_delete<llvm::raw_pwrite_stream>>&, std::unique_ptr<llvm::ToolOutputFile, std::default_delete<llvm::ToolOutputFile>>&) BackendUtil.cpp:0:0
2024-09-05T09:46:55.3919543Z #20 800.8 #21 0x00005579e6137e95 clang::EmitBackendOutput(clang::DiagnosticsEngine&, clang::HeaderSearchOptions const&, clang::CodeGenOptions const&, clang::TargetOptions const&, clang::LangOptions const&, llvm::StringRef, llvm::Module*, clang::BackendAction, std::unique_ptr<llvm::raw_pwrite_stream, std::default_delete<llvm::raw_pwrite_stream>>) (/opt/rocm/llvm/bin/clang+0x2b89e95)
2024-09-05T09:46:55.3920464Z #20 800.8 #22 0x00005579e700507d clang::BackendConsumer::HandleTranslationUnit(clang::ASTContext&) (/opt/rocm/llvm/bin/clang+0x3a5707d)
2024-09-05T09:46:55.3921109Z #20 800.8 #23 0x00005579e7bd6191 clang::ParseAST(clang::Sema&, bool, bool) (/opt/rocm/llvm/bin/clang+0x4628191)
2024-09-05T09:46:55.3921772Z #20 800.8 #24 0x00005579e691ac99 clang::FrontendAction::Execute() (/opt/rocm/llvm/bin/clang+0x336cc99)
2024-09-05T09:46:55.3922579Z #20 800.8 #25 0x00005579e68a3301 clang::CompilerInstance::ExecuteAction(clang::FrontendAction&) (/opt/rocm/llvm/bin/clang+0x32f5301)
2024-09-05T09:46:55.3923346Z #20 800.8 #26 0x00005579e69dc160 clang::ExecuteCompilerInvocation(clang::CompilerInstance*) (/opt/rocm/llvm/bin/clang+0x342e160)
2024-09-05T09:46:55.3924065Z #20 800.8 #27 0x00005579e44c0555 cc1_main(llvm::ArrayRef<char const*>, char const*, void*) (/opt/rocm/llvm/bin/clang+0xf12555)
2024-09-05T09:46:55.3924658Z #20 800.8 #28 0x00005579e44bb84f ExecuteCC1Tool(llvm::SmallVectorImpl<char const*>&) driver.cpp:0:0
2024-09-05T09:46:55.3926850Z #20 800.8 #29 0x00005579e66e7a89 void llvm::function_ref<void ()>::callback_fn<clang::driver::CC1Command::Execute(llvm::ArrayRef<std::optional<llvm::StringRef>>, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char>>*, bool*) const::'lambda'()>(long) Job.cpp:0:0
2024-09-05T09:46:55.3927685Z #20 800.8 #30 0x00005579e5cd8767 llvm::CrashRecoveryContext::RunSafely(llvm::function_ref<void ()>) (/opt/rocm/llvm/bin/clang+0x272a767)
2024-09-05T09:46:55.3929271Z #20 800.8 #31 0x00005579e66e7e17 clang::driver::CC1Command::Execute(llvm::ArrayRef<std::optional<llvm::StringRef>>, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char>>*, bool*) const (.part.0) Job.cpp:0:0
2024-09-05T09:46:55.3930455Z #20 800.8 #32 0x00005579e66a8bc1 clang::driver::Compilation::ExecuteCommand(clang::driver::Command const&, clang::driver::Command const*&, bool) const (/opt/rocm/llvm/bin/clang+0x30fabc1)
2024-09-05T09:46:55.3932662Z #20 800.8 #33 0x00005579e66a95d6 std::_Function_handler<void (), clang::driver::Compilation::ExecuteJobs(clang::driver::JobList const&, llvm::SmallVectorImpl<std::pair<int, clang::driver::Command const*>>&, bool) const::'lambda'()>::_M_invoke(std::_Any_data const&) Compilation.cpp:0:0
2024-09-05T09:46:55.3934080Z #20 800.8 #34 0x00005579e66aeaf8 clang::driver::Compilation::ExecuteJobs(clang::driver::JobList const&, llvm::SmallVectorImpl<std::pair<int, clang::driver::Command const*>>&, bool) const (/opt/rocm/llvm/bin/clang+0x3100af8)
2024-09-05T09:46:55.3935419Z #20 800.8 #35 0x00005579e66beb7c clang::driver::Driver::ExecuteCompilation(clang::driver::Compilation&, llvm::SmallVectorImpl<std::pair<int, clang::driver::Command const*>>&) (/opt/rocm/llvm/bin/clang+0x3110b7c)
2024-09-05T09:46:55.3935924Z #20 800.8 #36 0x00005579e44be4d7 clang_main(int, char**) (/opt/rocm/llvm/bin/clang+0xf104d7)
2024-09-05T09:46:55.3936517Z #20 800.8 #37 0x00007fd4e1d9fd90 __libc_start_call_main ./csu/../sysdeps/nptl/libc_start_call_main.h:58:16
2024-09-05T09:46:55.3937138Z #20 800.8 #38 0x00007fd4e1d9fe40 call_init ./csu/../csu/libc-start.c:128:20
2024-09-05T09:46:55.3937692Z #20 800.8 #39 0x00007fd4e1d9fe40 __libc_start_main ./csu/../csu/libc-start.c:379:5
2024-09-05T09:46:55.3938083Z #20 800.8 #40 0x00005579e44b73f5 _start (/opt/rocm/llvm/bin/clang+0xf093f5)
2024-09-05T09:46:55.3938829Z #20 800.8 clang-16: error: clang frontend command failed with exit code 70 (use -v to see invocation)
2024-09-05T09:46:55.3939964Z #20 800.8 AMD clang version 16.0.0 (https://github.com/RadeonOpenCompute/llvm-project roc-5.6.0 23243 be997b2f3651a41597d7a41441fff8ade4ac59ac)
2024-09-05T09:46:55.3940232Z #20 800.8 Target: x86_64-unknown-linux-gnu
2024-09-05T09:46:55.3940492Z #20 800.8 Thread model: posix
2024-09-05T09:46:55.3940686Z #20 800.8 InstalledDir: /opt/rocm/llvm/bin
2024-09-05T09:46:55.3940932Z #20 800.8 clang-16: note: diagnostic msg: 
2024-09-05T09:46:55.3941071Z #20 800.8 ********************
2024-09-05T09:46:55.3941213Z #20 800.8 
2024-09-05T09:46:55.3941540Z #20 800.8 PLEASE ATTACH THE FOLLOWING FILES TO THE BUG REPORT:
2024-09-05T09:46:55.3941978Z #20 800.8 Preprocessed source(s) and associated run script(s) are located at:
2024-09-05T09:46:55.3942447Z #20 800.8 clang-16: note: diagnostic msg: /tmp/ggml-aarch64-3228e7.c
2024-09-05T09:46:55.3943010Z #20 800.8 clang-16: note: diagnostic msg: /tmp/ggml-aarch64-3228e7.sh
2024-09-05T09:46:55.3943273Z #20 800.8 clang-16: note: diagnostic msg:

* Add AVX2 based implementations for quantize_q8_0_4x8, ggml_gemv_q4_0_8x8_q8_0 and ggml_gemm_q4_0_8x8_q8_0 functions * Update code to fix issues occuring due to non alignment of elements to be processed as multiple of 16 in MSVC * Update comments and indentation * Make updates to reduce number of load instructions

github-actions bot added the ggml changes relating to the ggml tensor library for machine learning label Jul 26, 2024

Srihari-mcw commented Jul 26, 2024

View reviewed changes

mofosyne added the Review Complexity : Medium Generally require more time to grok but manageable by beginner to medium expertise level label Aug 1, 2024

ggerganov approved these changes Sep 3, 2024

View reviewed changes

Srihari-mcw added 4 commits September 3, 2024 23:15

Add AVX2 based implementations for quantize_q8_0_4x8, ggml_gemv_q4_0_…

0c81b7b

…8x8_q8_0 and ggml_gemm_q4_0_8x8_q8_0 functions

Update code to fix issues occuring due to non alignment of elements t…

49af3f5

…o be processed as multiple of 16 in MSVC

Update comments and indentation

364dc96

Make updates to reduce number of load instructions

c950fc3

Srihari-mcw force-pushed the block_interleaving_q4_0_8_8_avx2_implementation branch from 81d9078 to c950fc3 Compare September 4, 2024 13:44

ggerganov merged commit 581c305 into ggerganov:master Sep 4, 2024
52 checks passed

slaren mentioned this pull request Sep 6, 2024

ci : disable rocm image creation #9340

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implementations for Q4_0_8_8 quantization based functions in AVX2 SIMD architecture #8713

Implementations for Q4_0_8_8 quantization based functions in AVX2 SIMD architecture #8713

Srihari-mcw commented Jul 26, 2024

Srihari-mcw Jul 26, 2024 •

edited

Loading

bartowski1182 commented Jul 26, 2024

Srihari-mcw commented Jul 27, 2024

nisten commented Aug 1, 2024 •

edited

Loading

ggerganov commented Sep 3, 2024

ggerganov left a comment

Srihari-mcw commented Sep 4, 2024 •

edited

Loading

slaren commented Sep 5, 2024

Implementations for Q4_0_8_8 quantization based functions in AVX2 SIMD architecture #8713

Implementations for Q4_0_8_8 quantization based functions in AVX2 SIMD architecture #8713

Conversation

Srihari-mcw commented Jul 26, 2024

Srihari-mcw Jul 26, 2024 • edited Loading

Choose a reason for hiding this comment

bartowski1182 commented Jul 26, 2024

Srihari-mcw commented Jul 27, 2024

nisten commented Aug 1, 2024 • edited Loading

ggerganov commented Sep 3, 2024

ggerganov left a comment

Choose a reason for hiding this comment

Srihari-mcw commented Sep 4, 2024 • edited Loading

slaren commented Sep 5, 2024

Srihari-mcw Jul 26, 2024 •

edited

Loading

nisten commented Aug 1, 2024 •

edited

Loading

Srihari-mcw commented Sep 4, 2024 •

edited

Loading