Cuda refactor, multi GPU support #1607

JohannesGaessler · 2023-05-27T09:28:34Z

This PR is quite large. Its primary goal is to lay the groundwork for the implementation of further CUDA kernels for ggml operations. I am also adding multi GPU support because it's easier to integrate now than it would be at a later point.

For Users

Build instructions (Linux):

git clone https://github.com/JohannesGaessler/llama.cpp llama.cpp-johannesgaessler
cd llama.cpp-johannesgaessler                               
git fetch
git switch cuda-refactor-8
make LLAMA_CUBLAS=1

When compiling with LLAMA_CUBLAS=1 the program automatically detects the available NVIDIA devices and splits weights proportional to VRAM. There is not yet a CLI argument for setting the tensor splits. The performance increase on my test systems is relatively low (+70% t/s when going from 1x GTX TITAN X to 4x GTX TITAN X). It's possible that there is still a bug that hampers performance. Please do tell me how well (if at all) it works for you. In any case, this PR should already allow you to pool the VRAM of multiple GPUs to load larger models.

For Developers

This PR is still very much WIP. I will do a refactor to remove artifacts from bad/obsolete design decisions. You can already review the code if you want but many of the flaws are still subject to change. Should be good now.

On master there are separate functions for invoking CUDA kernels. Apart from invoking the actual CUDA kernels they do other things such as copying data between host and device. This PR adds a template ggml_cuda_op that manages

the transfer of data between host and device,
the dequantization of src0 (needed for cuBLAS matrix multiplication),
the broadcasting of src1 across src0 (needed for multiplication),
and multi GPU things.

The actual operations now only need to define how the data should be manipulated.

This PR also moves the entry point for invoking CUDA kernels from the ggml function such as ggml_compute_forward_mul_mat_q_f32 and instead adds a function ggml_cuda_compute_forward that is called from ggml_compute_forward. For this to work I moved ggml_task_type and ggml_compute_params from ggml.c to ggml.h.

This PR adds an int for the layer, an int for the device id, and dedicated device data pointers to ggml_tensor. I need these for bookkeeping. I also changed the backends from GGML_BACKEND_CUDA and GGML_BACKEND_OPENCL to GGML_BACKEND_GPU (tensor data on 1 GPU) and GGML_BACKEND_GPU_SPLIT (tensor data split across all GPUs). Since I think that we don't want to support the simultaneous use of CUDA and OpenCL it's simpler to just use the same backend types for both implementations and to differentiate via defines.

github-actions

clang-tidy made some suggestions

ggml-opencl.cpp

llama.cpp

ENjoyBlue2021 · 2023-05-27T13:21:28Z

Thanks so much for adding multi GPU support, I was looking forward to it. You are the king man.
This is extremly useful to increase my vram for 30b.
I'm posting my stats with my 1080ti+1080.
Amazing stuff, its noticeable faster.

Stats with this version:

./main -m '/media/w/PhoenixSSD/oobabooga/text-generation-webui/models/supercot30b-ggml/ggml-model-q5_1.bin' -n 128 --n-gpu-layers 39 --threads 6 --no-mmap -s 1685192470
WARNING: when using cuBLAS generation results are NOT guaranteed to be reproducible.
main: build = 596 (428c342)
main: seed  = 1685192470
ggml_init_cublas: found 2 CUDA devices:
  1. NVIDIA GeForce GTX 1080 Ti
  2. NVIDIA GeForce GTX 1080
llama.cpp: loading model from /media/w/PhoenixSSD/oobabooga/text-generation-webui/models/supercot30b-ggml/ggml-model-q5_1.bin
llama_model_load_internal: format     = ggjt v2 (pre #1508)
llama_model_load_internal: n_vocab    = 32000
llama_model_load_internal: n_ctx      = 512
llama_model_load_internal: n_embd     = 6656
llama_model_load_internal: n_mult     = 256
llama_model_load_internal: n_head     = 52
llama_model_load_internal: n_layer    = 60
llama_model_load_internal: n_rot      = 128
llama_model_load_internal: ftype      = 9 (mostly Q5_1)
llama_model_load_internal: n_ff       = 17920
llama_model_load_internal: n_parts    = 1
llama_model_load_internal: model size = 30B
llama_model_load_internal: ggml ctx size = 23269.21 MB
llama_model_load_internal: mem required  = 10646.42 MB (+ 3124.00 MB per state)
llama_model_load_internal: [cublas] offloading 39 layers to GPU
llama_model_load_internal: [cublas] total VRAM used: 14926 MB
....................................................................................................
llama_init_from_file: kv self size  =  780.00 MB

system_info: n_threads = 6 / 8 | AVX = 1 | AVX2 = 1 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 1 | VSX = 0 | 
sampling: repeat_last_n = 64, repeat_penalty = 1.100000, presence_penalty = 0.000000, frequency_penalty = 0.000000, top_k = 40, tfs_z = 1.000000, top_p = 0.950000, typical_p = 1.000000, temp = 0.800000, mirostat = 0, mirostat_lr = 0.100000, mirostat_ent = 5.000000
generate: n_ctx = 512, n_batch = 512, n_predict = 128, n_keep = 0


 // Copyright (c) Microsoft. All rights reserved.
// Licensed under the MIT license. See LICENSE file in the project root for full license information.

using System;
using System.IO;
using System.Linq;
using System.Reflection;
using System.Threading;
using Microsoft.Build.Framework;
using Microsoft.Build.Utilities;

namespace MSBuild.ExtensionPack.Platform
{
    /// <summary>
    /// Adds support for a target to find the first non-empty environment variable, by name
llama_print_timings:        load time = 14079.62 ms
llama_print_timings:      sample time =    55.40 ms /   128 runs   (    0.43 ms per token)
llama_print_timings: prompt eval time =   722.20 ms /     2 tokens (  361.10 ms per token)
llama_print_timings:        eval time = 54367.11 ms /   127 runs   (  428.09 ms per token)
llama_print_timings:       total time = 68522.85 ms

Compare old Version:

./main -m '/media/w/PhoenixSSD/oobabooga/text-generation-webui/models/supercot30b-ggml/ggml-model-q5_1.bin' -n 128 --n-gpu-layers 17 --threads 6 --no-mmap -s 1685192470
WARNING: when using cuBLAS generation results are NOT guaranteed to be reproducible.
main: build = 583 (7e4ea5b)
main: seed  = 1685192470
llama.cpp: loading model from /media/w/PhoenixSSD/oobabooga/text-generation-webui/models/supercot30b-ggml/ggml-model-q5_1.bin
llama_model_load_internal: format     = ggjt v2 (pre #1508)
llama_model_load_internal: n_vocab    = 32000
llama_model_load_internal: n_ctx      = 512
llama_model_load_internal: n_embd     = 6656
llama_model_load_internal: n_mult     = 256
llama_model_load_internal: n_head     = 52
llama_model_load_internal: n_layer    = 60
llama_model_load_internal: n_rot      = 128
llama_model_load_internal: ftype      = 9 (mostly Q5_1)
llama_model_load_internal: n_ff       = 17920
llama_model_load_internal: n_parts    = 1
llama_model_load_internal: model size = 30B
llama_model_load_internal: ggml ctx size = 23269.14 MB
llama_model_load_internal: mem required  = 19066.59 MB (+ 3124.00 MB per state)
llama_model_load_internal: [cublas] offloading 17 layers to GPU
llama_model_load_internal: [cublas] total VRAM used: 6506 MB
....................................................................................................
llama_init_from_file: kv self size  =  780.00 MB

system_info: n_threads = 6 / 8 | AVX = 1 | AVX2 = 1 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 1 | VSX = 0 | 
sampling: repeat_last_n = 64, repeat_penalty = 1.100000, presence_penalty = 0.000000, frequency_penalty = 0.000000, top_k = 40, tfs_z = 1.000000, top_p = 0.950000, typical_p = 1.000000, temp = 0.800000, mirostat = 0, mirostat_lr = 0.100000, mirostat_ent = 5.000000
generate: n_ctx = 512, n_batch = 512, n_predict = 128, n_keep = 0


 // Copyright (c) Microsoft. All rights reserved.
// Licensed under the MIT license. See LICENSE file in the project root for full license information.

using System;
using System.IO;
using System.Linq;
using System.Reflection;
using System.Threading;
using Microsoft.Build.Framework;
using Microsoft.Build.Utilities;

namespace MSBuild.ExtensionPack.Platform
{
    /// <summary>
    /// Adds support for a target to find the first non-empty environment variable, by name
llama_print_timings:        load time = 11117.69 ms
llama_print_timings:      sample time =    55.35 ms /   128 runs   (    0.43 ms per token)
llama_print_timings: prompt eval time =   896.98 ms /     2 tokens (  448.49 ms per token)
llama_print_timings:        eval time = 82108.70 ms /   127 runs   (  646.53 ms per token)
llama_print_timings:       total time = 93302.20 ms

SlyEcho · 2023-05-27T13:28:33Z

Would be too much to ask cross platform (Nvidia + AMD) support LMAO.

I will try to check later, I think I can access a machine with 2x 2080 Ti

JohannesGaessler · 2023-05-27T13:51:21Z

Performance numbers from my test machine with an i5-4570S, 16 GB of RAM @ 1600 MHz, and a GTX 1070 + a GTX 1050 ti:

Model	GPU	ms/t	t/s
7b q4_0	GTX 1070	71.37	14.01
7b q4_0	GTX 1070 + GTX 1050 ti	68.66	14.56
13b q4_0	GTX 1070	134.19	7.45
13b q4_0	GTX 1070 + GTX 1050 ti	128.13	7.80
33b q4_0	GTX 1070	Unusable	Unusable
33b q4_0	GTX 1070 + GTX 1050 ti	575.12	1.74

Numbers for single GPU are obtained using the master branch.

Note: previously I was able to run 33b q4_0 with just the GTX 1070 on master; there may be something on master that has increased RAM usage since.

SlyEcho · 2023-05-27T14:30:35Z

33b

You mean 30B? I can run 30B Q4_0 with my 8 GB card with 20 layers loaded only.

howard0su · 2023-05-27T14:32:38Z

llama.cpp


            auto & layer = model.layers[i];

            std::string layers_i = "layers." + std::to_string(i);

-            layer.attention_norm = ml->get_tensor(layers_i + ".attention_norm.weight", {n_embd}, backend);
+            layer.attention_norm = ml->get_tensor(layers_i + ".attention_norm.weight", {n_embd}, i, backend);


we can decide which card we should put the layer on in this function (backend + device_id).

We should make backend like CPU = 0xFFFFFFFF; CUDA=0xEEEEEE00; (256 cards supported); etc.

This PR implements two ways to utilize multiple GPUs: tensors can either be put on a single GPU or they can be split across multiple GPUs. I think that tensors that require little computation should be put on a single GPU to reduce data copying. In the function ggml_cuda_load_data single GPU tensors are distributed to GPUs based on which layer they are in. I guess this distribution could also be done here but I don't think it would make a significant difference.

JohannesGaessler · 2023-05-27T14:43:12Z

You mean 30B? I can run 30B Q4_0 with my 8 GB card with 20 layers loaded only.

"30B" seems to be a typo by Meta that has become dominant. In the paper they talk about a "33B" model so that is the term that I'm using.

ggml-cuda.cu

ggml.h

ggml-cuda.cu

SlyEcho · 2023-05-27T15:20:16Z

llama.cpp

    }

-    struct ggml_tensor * get_tensor_for(llama_load_tensor & lt, ggml_backend backend) {
+    struct ggml_tensor * get_tensor_for(llama_load_tensor & lt, int layer, ggml_backend backend) {


What if it were like this so the tensors that are not on a layer don't have to specify -1?

Suggested change

struct ggml_tensor * get_tensor_for(llama_load_tensor & lt, int layer, ggml_backend backend) {

struct ggml_tensor * get_tensor_for(llama_load_tensor & lt, ggml_backend backend, int layer = -1) {

ggml-cuda.cu

ggml.h

For forward compatibility ggerganov#1607

slaren · 2023-05-27T16:56:38Z

I think ggml_cuda_mul and ggml_cuda_mul_mat can be removed from ggml-cuda.h now and made static.

JohannesGaessler · 2023-05-27T17:13:38Z

I added a comment to explain the weird device to host memcpy for split tensors. Since I as the person who wrote the code won't know: are there other parts of the code that are unintuitive or difficult to understand?

github-actions

clang-tidy made some suggestions

llama.cpp

JohannesGaessler · 2023-05-27T22:05:36Z

I added a CLI argument that lets the user set the tensor split. On my system a less VRAM efficient split of 3:1 seems to do better than 2:1 because it's more efficient in terms of compute:

Model	GPU	ms/t	t/s
7b q4_0	GTX 1070	71.37	14.01
7b q4_0	GTX 1070 + GTX 1050 ti, 2:1 split	68.66	14.56
7b q4_0	GTX 1070 + GTX 1050 ti, 3:1 split	59.03	16.94
13b q4_0	GTX 1070	134.19	7.45
13b q4_0	GTX 1070 + GTX 1050 ti, 2:1 split	128.13	7.80
13b q4_0	GTX 1070 + GTX 1050 ti, 3:1 split	109.14	9.15
33b q4_0	GTX 1070	Unusable	Unusable
33b q4_0	GTX 1070 + GTX 1050 ti, 2:1 split	575.12	1.74
33b q4_0	GTX 1070 + GTX 1050 ti, 3:1 split	571.10	1.75

KerfuffleV2 · 2023-06-03T20:01:19Z

-n N, --n-predict N   number of tokens to predict (default: -1, -1 = infinity)

Please create an issue or discussion topic in the Q&A section when you have general questions instead of asking in pull requests.

Comments in pull requests are primarily for discussing that specific pull.

JohannesGaessler · 2023-06-04T21:39:06Z

Both CUDA and OpenCL seem to be working correctly in short tests. However, right now I'm too tired to test this PR in detail. I'll do it tomorrow morning.

JohannesGaessler · 2023-06-05T08:14:58Z

As far as I can tell everything is working correctly.

KerfuffleV2 · 2023-06-05T09:02:13Z

ggml_init_cublas: found 1 CUDA devices:
  1. NVIDIA GeForce GTX 1060 6GB

Works for me on Linux, performance for generating tokens with main seems the same as master. Identical results with the same seed also. Perplexity (and I'd assume just cuBLAS prompt evaluation in general) is noticeably slower though. (Output abbreviated below.)

`master`

$ taskset -c 0-5 ./perplexity -f /path/wiki.test.raw -m /path/llama-7b.ggmlv3.q4_0.bin -t 6 -ngl 0
system_info: n_threads = 6 / 12 | AVX = 1 | AVX2 = 1 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 1 | VSX = 0 | 
perplexity: calculating perplexity over 655 chunks, batch_size=512
perplexity: 3.66 seconds per pass - ETA 39 minutes
[1]4.4545,[2]4.9402,[3]5.8278

$ taskset -c 0-5 ./perplexity -f /path/wiki.test.raw -m /path/llama-7b.ggmlv3.q4_0.bin -t 6 -ngl 33
system_info: n_threads = 6 / 12 | AVX = 1 | AVX2 = 1 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 1 | VSX = 0 | 
perplexity: calculating perplexity over 655 chunks, batch_size=512
perplexity: 3.37 seconds per pass - ETA 36 minutes
[1]4.4547,[2]4.9404

this pull

$ taskset -c 0-5 ./perplexity -f /path/wiki.test.raw -m /path/llama-7b.ggmlv3.q4_0.bin -t 6 -ngl 0
system_info: n_threads = 6 / 12 | AVX = 1 | AVX2 = 1 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 1 | VSX = 0 | 
perplexity: calculating perplexity over 655 chunks, batch_size=512
perplexity: 5.47 seconds per pass - ETA 59 minutes
[1]4.4545,[2]4.9402

$ taskset -c 0-5 ./perplexity -f /path/wiki.test.raw -m /path/llama-7b.ggmlv3.q4_0.bin -t 6 -ngl 33
system_info: n_threads = 6 / 12 | AVX = 1 | AVX2 = 1 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 1 | VSX = 0 | 
perplexity: calculating perplexity over 655 chunks, batch_size=512
perplexity: 5.12 seconds per pass - ETA 55 minutes
[1]4.4545,[2]4.9402

JohannesGaessler · 2023-06-05T09:27:45Z

Thank you for pointing this out, I forgot to check it. The reason is most likely that on master there are some f16 f32 matrix multiplications where f32 is converted to f16 and then cuBLAS is used to do f16 matrix multiplication. In this PR however the f16 data is converted to f32 and then cuBLAS is used to do f32 matrix multiplication. Originally I did not want to touch the f16 f32 code at all but it caused issues for CPU layers in combination with >1 GPU.

As a user I don't really care about prompt processing speed because it's so fast anyways. As a developer waiting longer for perplexity calculations could be an issue though. I would be fine with delaying the merging of this PR if it's desired to fix prompt processing speed beforehand; otherwise I will at a later point make another PR to fix it.

JohannesGaessler · 2023-06-05T09:41:35Z

I forgot to mention: the f16 f32 matrix multiplications are mostly related to the KV cache and they're done on permuted matrices. To use regular cuBLAS matrix multiplication you first need to copy the matrices to make them contiguous but I'm currently trying to develop a kernel that operates directly on the permuted memory layout. Ideally that will then be faster than master anyways (for GPU layers).

KerfuffleV2 · 2023-06-05T09:51:11Z

As a user I don't really care about prompt processing speed because it's so fast anyways. As a developer waiting longer for perplexity calculations could be an issue though.

This is just the opinion of a random person, but prompt processing is fast compared to inference for sure but at least on relatively old hardware something like an 800 token prompt still takes a pretty noticeable time to evaluate especially on larger models like 33b+. Roughly doubling that time from, I don't know, 15-20sec to 30-40sec is quite noticeable.

Also, with the advances to context size technology and some models being fine tuned to allow >2048 sized context it's likely in the next months or whatever longer prompts are going to get more common.

I've also been doing experimentation on various stuff and using the perplexity tool to check the effect so making it twice as slow (and it's already pretty low for big models) definitely will make me sad.

JohannesGaessler · 2023-06-05T10:58:46Z

One workaround that might work would be to re-add the implementation on master and to only enable it when there is a single GPU.

JohannesGaessler · 2023-06-05T12:44:19Z

I pushed a fix that re-adds the old code for single GPU prompt processing. Prompt processing seems to still be slightly slower but only by ~20%. I'm not sure what's causing the remaining difference; investigating and optimizing the performance would probably take some time. In any case, @KerfuffleV2 , can you please check the performance on your system?

KerfuffleV2 · 2023-06-05T12:54:15Z

Definitely helps.

`-ngl 0`

master: 3.66 seconds per pass - ETA 39 minutes
before: 5.47 seconds per pass - ETA 59 minutes
now: 4.16 seconds per pass - ETA 45 minutes

`-ngl 33`

master: 3.37 seconds per pass - ETA 36 minutes
before: 5.12 seconds per pass - ETA 55 minutes
now: 3.74 seconds per pass - ETA 40 minutes

JohannesGaessler · 2023-06-05T14:18:23Z

Alright, my current status is this: I'm working on another PR based on this one that adds GPU acceleration for the entire LLaMa model rather than just matrix multiplication and component-wise multiplication. On my hardware this WIP version is already faster than master and this PR when it comes to token generation. I would like to get a working version for that first, and then optimize performance afterwards. Optimizing the performance of this PR now seems like it would be a bad investment of my time since I think I will be able to get better performance with a fundamentally different implementation anyways. So I think the two options for this PR are to either merge it and accept a performance regression for prompt processing or to wait until I have the next PR ready which I think will be faster than master across the board.

KerfuffleV2 · 2023-06-05T14:45:42Z

Sounds reasonable.

I'm working on another PR based on this one that adds GPU acceleration for the entire LLaMa model rather than just matrix multiplication and component-wise multiplication.

Feel free to ignore this if it's not the appropriate time/place to ask, but I'm curious if this is going to involve requiring significantly more VRAM?

JohannesGaessler · 2023-06-05T14:54:50Z

llama.cpp master currently reserves 1 GB of scratch buffer memory for 33b and smaller models. Translating this 1:1 to VRAM would be simple but inefficient because these scratch buffers are much larger than they need to be. Also you only need that much memory initially during prompt processing and there the amount required is proportional to the batch size. However, for low-end cards a large batch size probably has diminishing returns anyways. You also need some VRAM for the KV cache but I haven't worked out how much that will be exactly.

My current testing targets are an RTX 3090, a GTX 1070, and a GTX 1050 ti. If I can get universally better performance for all of these that's great, otherwise I plan to add something like a --low-vram option.

llama.cpp

ggerganov · 2023-06-05T16:52:47Z

So I think the two options for this PR are to either merge it and accept a performance regression for prompt processing or to wait until I have the next PR ready which I think will be faster than master across the board.

The perplexity computation speed is quite important to me so if the fix is not trivial, I prefer to wait for the other PR that will hopefully not degrade the performance. The multi-GPU addition is really great, but from practical point of view, we need the perplexity speed right now, even more so because of #1684 efforts that are about to get merged.

I'll leave this PR open for now - if I get the time, I might try to fix the regression and merge it (if the other PR is not ready yet)

JohannesGaessler · 2023-06-05T17:41:30Z

I did a quick test with my WIP PR. The perplexity speed is 2.23 ms/t vs. 3.07 ms/t on master (numbers are for 7b q4_0). The PR has GPU acceleration for add, SiLU, RMS norm, cpy, reshape, view, transpose, and RoPE. As a third option I could prioritize getting that PR in a state that can be merged and then deliver kernels for the rest of the operations in a later PR.

Edit: for 33b q4_0 the speed of the WIP PR is 6.50 ms/t vs. 9.35 ms/t on master.

ggerganov · 2023-06-05T17:49:51Z

As a third option I could prioritize getting that PR in a state that can be merged and then deliver kernels for the rest of the operations in a later PR.

Up to you - whichever way you prefer

JohannesGaessler · 2023-06-05T17:52:49Z

I'll do it.

JohannesGaessler · 2023-06-06T19:34:03Z

Obsolete due to the merging of #1703

commit 3416c98 Merge: 5eb17f0 4c4e435 Author: YellowRoseCx <[email protected]> Date: Fri Aug 25 13:46:56 2023 -0500 Merge remote-tracking branch 'upstream/concedo' commit 5eb17f0 Author: YellowRoseCx <[email protected]> Date: Fri Aug 25 13:38:21 2023 -0500 ROCm Port update * use hipblas based on cublas * Update Makefile for the Cuda kernels * Expand arch list and make it overrideable * Fix multi GPU on multiple amd architectures with rocblas_initialize() (#5) * add hipBLAS to README * new build arg LLAMA_CUDA_MMQ_Y * fix half2 decomposition * Add intrinsics polyfills for AMD * AMD assembly optimized __dp4a * Allow overriding CC_TURING * use "ROCm" instead of "CUDA" * ignore all build dirs * Add Dockerfiles * fix llama-bench * fix -nommq help for non CUDA/HIP --------- Co-Authored-By: YellowRoseCx <[email protected]> Co-Authored-By: ardfork <[email protected]> Co-Authored-By: funnbot <[email protected]> Co-Authored-By: Engininja2 <[email protected]> Co-Authored-By: Kerfuffle <[email protected]> Co-Authored-By: jammm <[email protected]> Co-Authored-By: jdecourval <[email protected]> commit b34f4bd Author: YellowRoseCx <[email protected]> Date: Sat Aug 19 17:12:52 2023 -0500 Update README.md commit 7d11961 Author: YellowRoseCx <[email protected]> Date: Mon Aug 14 23:03:12 2023 -0500 remove force DMMV commit cd61aa0 Author: YellowRoseCx <[email protected]> Date: Sat Aug 12 17:24:31 2023 -0500 restore main_gpu parameter commit 4a042f3 Author: Henri Vasserman <[email protected]> Date: Sat Aug 12 10:51:46 2023 +0300 gfx1100 support --------- Co-authored-by: ardfork <[email protected]> Co-authored-by: jammm <[email protected]> Co-authored-by: jdecourval <[email protected]> commit 8913bc6 Author: Henri Vasserman <[email protected]> Date: Fri Aug 11 10:16:02 2023 +0300 Allow overriding CC_TURING commit e77a4c3 Author: Henri Vasserman <[email protected]> Date: Fri Aug 11 10:00:07 2023 +0300 Merge 'origin/master' into hipblas commit cc4c4e3 Author: Engininja2 <[email protected]> Date: Fri Aug 11 09:43:14 2023 +0300 New __dp4a assembly Now compatible with gfx900 and faster as well. commit 1a03b70 Author: Henri Vasserman <[email protected]> Date: Fri Aug 11 09:30:28 2023 +0300 Undo mess --------- Co-authored-by: ardfork <[email protected]> commit 4366ff9 Author: DannyDaemonic <[email protected]> Date: Thu Aug 10 13:11:36 2023 -0700 Handle `ENABLE_VIRTUAL_TERMINAL_PROCESSING` more gracefully on earlier versions of Windows. commit 811ff85 Author: Christian Demsar <[email protected]> Date: Thu Aug 10 10:28:27 2023 -0400 Add --n-predict -2 for stopping generation on full context (ggerganov#2565) commit 37c9717 Author: Martin Krasser <[email protected]> Date: Thu Aug 10 12:16:38 2023 +0200 Fix grammar-based sampling issue in server (ggerganov#2566) commit d18ecd5 Author: YellowRoseCx <[email protected]> Date: Thu Aug 10 13:19:41 2023 -0500 make mmq gen faster for amd commit 243894a Author: Henri Vasserman <[email protected]> Date: Thu Aug 10 12:14:40 2023 +0300 ws fix commit ac2f14d Author: Engininja2 <[email protected]> Date: Thu Aug 10 12:11:27 2023 +0300 AMD assembly optimized __dp4a Doesn't seem to work for gfx900, so commented out. commit 9dba0c9 Author: Henri Vasserman <[email protected]> Date: Thu Aug 10 12:09:28 2023 +0300 Fix merge --------- Co-authored-by: ardfork <[email protected]> Co-authored-by: Kerfuffle <[email protected]> commit f570b5c Author: YellowRoseCx <[email protected]> Date: Wed Aug 9 22:11:20 2023 -0500 Revert "revert cuda changes as they are bugggy" This reverts commit 1541bf8. commit 1541bf8 Author: Concedo <[email protected]> Date: Wed Aug 9 22:36:41 2023 +0800 revert cuda changes as they are bugggy commit bacc202 Author: YellowRoseCx <[email protected]> Date: Wed Aug 9 20:37:17 2023 -0500 Merge remote-tracking branch 'upstream/concedo' commit b7cb4cf Author: YellowRoseCx <[email protected]> Date: Wed Aug 9 20:00:52 2023 -0500 additional fixes commit fadae72 Merge: 518eb2a 8f8ab6c Author: YellowRoseCx <[email protected]> Date: Wed Aug 9 18:45:50 2023 -0500 Merge branch 'hipblas' into develop4Main commit 518eb2a Merge: bda0215 cae6a84 Author: YellowRoseCx <[email protected]> Date: Wed Aug 9 18:32:10 2023 -0500 Merge remote-tracking branch 'upstream/concedo' into develop2Main commit bda0215 Author: YellowRoseCx <[email protected]> Date: Wed Aug 9 18:17:54 2023 -0500 update makefile to multisystem path commit 8f8ab6c Author: YellowRoseCx <[email protected]> Date: Wed Aug 9 18:05:03 2023 -0500 hipLDFLAG Path change Unix to multisystem in Makefile changed the hardcoded linux distro hipblas LD path from -L/opt/rocm/lib to use the defined ROCM_PATH variable to be flexible with ROCm on non-Linux OS commit 610ba4c Merge: 4024f91 25d43e0 Author: Henri Vasserman <[email protected]> Date: Wed Aug 9 23:54:58 2023 +0300 Merge 'origin/master' into hipblas commit 4024f91 Author: Henri Vasserman <[email protected]> Date: Wed Aug 9 01:56:44 2023 +0300 Add intrinsics polyfills for AMD --------- Co-authored-by: ardfork <[email protected]> Co-authored-by: funnbot <[email protected]> Co-authored-by: Engininja2 <[email protected]> commit ab62128 Merge: d91456a f5bfea0 Author: Henri Vasserman <[email protected]> Date: Wed Aug 9 00:37:01 2023 +0300 Merge 'origin/master' into hipblas commit ee9fa2a Author: YellowRoseCx <[email protected]> Date: Wed Aug 2 01:53:58 2023 -0500 Update Makefile commit d91456a Author: ardfork <[email protected]> Date: Mon Jul 31 20:35:00 2023 +0300 fix half2 decomposition commit c1cb70d Author: Henri Vasserman <[email protected]> Date: Mon Jul 31 19:56:44 2023 +0300 new build arg LLAMA_CUDA_MMQ_Y commit c1664a0 Merge: 4336231 0728c5a Author: Henri Vasserman <[email protected]> Date: Mon Jul 31 19:32:27 2023 +0300 Merge 'origin/master' into hipblas commit 848558d Author: YellowRoseCx <[email protected]> Date: Sun Jul 30 20:02:52 2023 -0500 import vars logic fix commit b650b84 Author: YellowRoseCx <[email protected]> Date: Sun Jul 30 00:21:36 2023 -0500 Update easy_KCPP-ROCm_install.sh commit 8573a67 Author: YellowRoseCx <[email protected]> Date: Sat Jul 29 21:31:12 2023 -0500 remove duplicate code and fix typo remove duplicate tooltip commit 430986e Author: YellowRoseCx <[email protected]> Date: Sat Jul 29 21:07:34 2023 -0500 hide "missing" if all are built move tooltip functions to helper functions section. hides the string "Missing: ..." from showing if all backends are available " if len(runopts)==6 else + " commit dd0db72 Author: YellowRoseCx <[email protected]> Date: Sat Jul 29 20:52:31 2023 -0500 hide "missing" if all are built move tooltip functions to helper functions section. hides the string "Missing: ..." from showing if all backends are available commit 43fffb6 Merge: 0ed65a4 b40550c Author: YellowRoseCx <[email protected]> Date: Sat Jul 29 19:13:15 2023 -0500 Merge branch 'concedo' commit 0ed65a4 Author: YellowRoseCx <[email protected]> Date: Sat Jul 29 18:34:21 2023 -0500 Hide unavailable backends & Add tooltip over backend count Hides unavailable backends from the user and if the program is launched without any backends made, it shows an error message to them stating no backends were found and to make them using the 'make' command Add tooltip when hovering over backend count label hovering over the new label that shows the backend count will explain what the numbers are, and show the users which backends are not available or built commit 2a26398 Merge: cee2e9d 31486eb Author: YellowRoseCx <[email protected]> Date: Sat Jul 29 15:16:33 2023 -0500 Merge remote-tracking branch 'upstream/concedo' commit 4336231 Author: Henri Vasserman <[email protected]> Date: Sat Jul 29 18:35:56 2023 +0300 add hipBLAS to README --------- Co-authored-by: ardfork <[email protected]> commit f8e3fc6 Author: Henri Vasserman <[email protected]> Date: Sat Jul 29 14:16:46 2023 +0300 rocblas init stuff commit d2ade63 Merge: cde52d6 8a88e58 Author: Henri Vasserman <[email protected]> Date: Sat Jul 29 12:59:48 2023 +0300 Merge 'origin/master' into hipblas commit cee2e9d Author: YellowRoseCx <[email protected]> Date: Wed Jul 26 23:36:55 2023 -0500 Only Show Available Backends in GUI Hides unavailable backends from the user and if the program is launched without any backends made, it shows an error message to them stating no backends were found and to make them using the 'make' command commit 7863610 Author: YellowRoseCx <[email protected]> Date: Wed Jul 26 13:27:22 2023 -0500 Update easy_KCPP-ROCm_install.sh commit 731cd6e Author: YellowRoseCx <[email protected]> Date: Tue Jul 25 22:39:50 2023 -0500 Create easy_rocm_install.sh commit f154685 Merge: cbdc1f3 94e0a06 Author: YellowRoseCx <[email protected]> Date: Tue Jul 25 22:25:10 2023 -0500 Merge branch 'concedo_experimentalMAIN' commit cbdc1f3 Merge: 5b838d4 9731682 Author: YellowRoseCx <[email protected]> Date: Mon Jul 24 16:53:21 2023 -0500 Merge remote-tracking branch 'upstream/concedo' commit cde52d6 Merge: 8e8054a 84e09a7 Author: Henri Vasserman <[email protected]> Date: Mon Jul 24 12:22:58 2023 +0300 Merge 'origin/master' into hipblas commit 8e8054a Author: Henri Vasserman <[email protected]> Date: Mon Jul 24 12:20:49 2023 +0300 Add rocblas to build files commit 1f6294d Author: YellowRoseCx <[email protected]> Date: Mon Jul 24 03:52:01 2023 -0500 Fix multi GPU on multiple amd architectures with rocblas_initialize() (#5) * initialize rocblas commit 5b838d4 Author: YellowRoseCx <[email protected]> Date: Mon Jul 24 03:10:35 2023 -0500 amd multigpu full layer offload w/o vram scratch commit 9bfb2fd Merge: b379f9d 66328fc Author: YellowRoseCx <[email protected]> Date: Mon Jul 24 03:07:44 2023 -0500 Merge branch 'concedo_experimental' commit b379f9d Author: YellowRoseCx <[email protected]> Date: Mon Jul 24 03:07:00 2023 -0500 Revert "amd multigpu full layer offload w/o vram scratch" This reverts commit 9adfc8e. commit 9adfc8e Author: YellowRoseCx <[email protected]> Date: Mon Jul 24 02:56:40 2023 -0500 amd multigpu full layer offload w/o vram scratch commit 05c792e Author: YellowRoseCx <[email protected]> Date: Mon Jul 24 00:18:48 2023 -0500 initialize rocblas commit ade68d0 Merge: 521ad6b 56995ca Author: YellowRoseCx <[email protected]> Date: Sun Jul 23 20:25:05 2023 -0500 Merge remote-tracking branch 'upstream/concedo' commit 521ad6b Author: YellowRoseCx <[email protected]> Date: Thu Jul 20 21:42:33 2023 -0500 lazy import_var error handling for saves commit 9553e52 Merge: cac6650 f036109 Author: YellowRoseCx <[email protected]> Date: Thu Jul 20 19:59:41 2023 -0500 Merge remote-tracking branch 'upstream/concedo' commit cac6650 Author: YellowRoseCx <[email protected]> Date: Mon Jul 17 23:05:02 2023 -0500 Makefile fix! Allows hip/clblast build together commit 3db70b5 Merge: 2ec4466 7568d1a Author: Henri Vasserman <[email protected]> Date: Tue Jul 18 01:54:17 2023 +0300 Merge 'origin/master' into hipblas commit f208670 Author: YellowRoseCx <[email protected]> Date: Fri Jul 14 02:56:03 2023 -0500 improve error handling with gpu names commit 860e738 Author: YellowRoseCx <[email protected]> Date: Fri Jul 14 00:33:03 2023 -0500 Show GPU names in GUI, Only show GPUs that exist changed the pre-set 1,2,3 and 1,2,3,all settings that the GPU selector had and replaced them with a function that grabs the GPU names and sets the names as the values for the selector boxes. commit 2ec4466 Author: Henri Vasserman <[email protected]> Date: Thu Jul 13 13:44:02 2023 +0300 Update build flags. GGML_CUDA_DMMV_Y is now GGML_CUDA_MMV_Y so update your build instructions. GGML_CUDA_FORCE_DMMV is always enabled. --------- Co-authored-by: YellowRoseCx <[email protected]> commit cd36b18 Merge: afcb8fe 1cbf561 Author: Henri Vasserman <[email protected]> Date: Thu Jul 13 13:03:01 2023 +0300 Merge 'origin/master' into hipblas commit ac7ebc3 Author: YellowRoseCx <[email protected]> Date: Wed Jul 12 18:32:18 2023 -0500 add hipBLAS name scheme to GUI and update README commit 7f85cc5 Author: YellowRoseCx <[email protected]> Date: Wed Jul 12 17:35:54 2023 -0500 update makefile and ggml.c commit 6ca3499 Author: YellowRoseCx <[email protected]> Date: Wed Jul 12 15:43:45 2023 -0500 ggml.c fix commit 770e674 Merge: 2b289cd 5941514 Author: YellowRoseCx <[email protected]> Date: Wed Jul 12 15:24:36 2023 -0500 Merge remote-tracking branch 'upstream/concedo' commit 2b289cd Author: YellowRoseCx <[email protected]> Date: Wed Jul 12 14:30:00 2023 -0500 Update c-cpp.yml commit 5dae95a Author: YellowRoseCx <[email protected]> Date: Wed Jul 12 14:28:51 2023 -0500 Update c-cpp.yml commit b37cd73 Author: YellowRoseCx <[email protected]> Date: Wed Jul 12 14:27:04 2023 -0500 Create c-cpp.yml to test Actions commit afcb8fe Author: Henri Vasserman <[email protected]> Date: Tue Jul 11 18:09:27 2023 +0300 Add new config option commit 8c2c497 Merge: e610466 2347463 Author: Henri Vasserman <[email protected]> Date: Tue Jul 11 17:53:54 2023 +0300 Merge 'origin/master' into hipblas commit e610466 Author: Henri Vasserman <[email protected]> Date: Tue Jul 11 17:53:14 2023 +0300 Expand arch list and make it overrideable commit 80e4e54 Merge: 7735c5a 1d16309 Author: Henri Vasserman <[email protected]> Date: Mon Jul 10 02:09:28 2023 +0300 Merge 'origin/master' into hipblas commit 8432e9d Author: YellowRoseCx <[email protected]> Date: Sun Jul 9 16:55:30 2023 -0500 Update Makefile commit b58c189 Author: YellowRoseCx <[email protected]> Date: Sun Jul 9 16:20:00 2023 -0500 Add multi-gpu CuBLAS support to new GUI commit 0c1c71b Author: YellowRoseCx <[email protected]> Date: Sat Jul 8 07:56:57 2023 -0500 Update Makefile commit f864f60 Author: Johannes Gäßler <[email protected]> Date: Sat Jul 8 00:25:15 2023 +0200 CUDA: add __restrict__ to mul mat vec kernels (ggerganov#2140) commit 4539bc2 Author: YellowRoseCx <[email protected]> Date: Sat Jul 8 01:36:14 2023 -0500 update makefile for changes commit 912e31e Merge: 74e2703 ddaa4f2 Author: YellowRoseCx <[email protected]> Date: Fri Jul 7 23:15:37 2023 -0500 Merge remote-tracking branch 'upstream/concedo' commit 74e2703 Merge: cf65429 f9108ba Author: YellowRoseCx <[email protected]> Date: Wed Jul 5 15:16:49 2023 -0500 Merge branch 'LostRuins:concedo' into main commit 7735c5a Merge: c3e3733 7ee76e4 Author: Henri Vasserman <[email protected]> Date: Tue Jul 4 17:09:16 2023 +0300 Merge 'origin/master' into hipblas commit cf65429 Author: YellowRoseCx <[email protected]> Date: Mon Jul 3 16:56:40 2023 -0500 print cuda or opencl based on what's used commit 72c16d2 Author: YellowRoseCx <[email protected]> Date: Mon Jul 3 16:45:39 2023 -0500 Revert "fix my mistake that broke other arches" This reverts commit 777aed5. commit 777aed5 Author: YellowRoseCx <[email protected]> Date: Mon Jul 3 15:53:32 2023 -0500 fix my mistake that broke other arches commit 27780a9 Author: YellowRoseCx <[email protected]> Date: Sun Jul 2 16:03:27 2023 -0500 rocm fixes commit f52c7d4 Author: YellowRoseCx <[email protected]> Date: Sun Jul 2 16:02:58 2023 -0500 Revert "rocm fixes" This reverts commit 2fe9927. commit 2fe9927 Author: YellowRoseCx <[email protected]> Date: Sun Jul 2 15:58:21 2023 -0500 rocm fixes commit efe7560 Author: YellowRoseCx <[email protected]> Date: Sun Jul 2 15:55:43 2023 -0500 Revert "move HIPBLAS definitions into ggml-cuda.h" This reverts commit bf49a93. commit 4fc0181 Author: YellowRoseCx <[email protected]> Date: Sun Jul 2 15:55:36 2023 -0500 Revert "move hipblas definitions to header files" This reverts commit 2741ffb. commit 89eb576 Merge: 2741ffb 3d2907d Author: YellowRoseCx <[email protected]> Date: Sun Jul 2 14:44:13 2023 -0500 Merge branch 'LostRuins:concedo' into main commit c3e3733 Author: Henri Vasserman <[email protected]> Date: Sun Jul 2 15:51:31 2023 +0300 ROCm fixes commit 15db19a Merge: 04419f1 46088f7 Author: Henri Vasserman <[email protected]> Date: Sun Jul 2 15:39:57 2023 +0300 Merge 'origin/master' into hipblas commit 2741ffb Author: YellowRoseCx <[email protected]> Date: Sat Jul 1 17:07:42 2023 -0500 move hipblas definitions to header files commit bf49a93 Author: YellowRoseCx <[email protected]> Date: Sat Jul 1 16:38:50 2023 -0500 move HIPBLAS definitions into ggml-cuda.h commit 540f4e0 Merge: 2c3b46f eda663f Author: YellowRoseCx <[email protected]> Date: Sat Jul 1 14:58:32 2023 -0500 Merge remote-tracking branch 'upstream/concedo' commit 2c3b46f Author: YellowRoseCx <[email protected]> Date: Thu Jun 29 18:43:43 2023 -0500 changes to fix build commit c9e1103 Author: YellowRoseCx <[email protected]> Date: Thu Jun 29 18:20:07 2023 -0500 Update ggml_v2-cuda-legacy.cu for ROCM commit b858fc5 Author: YellowRoseCx <[email protected]> Date: Thu Jun 29 17:49:39 2023 -0500 changes to work with upstream commit 69a0c25 Merge: 096f0b0 1347d3a Author: YellowRoseCx <[email protected]> Date: Thu Jun 29 16:59:06 2023 -0500 Merge remote-tracking branch 'upstream/concedo' commit 04419f1 Merge: bb16eff d3494bb Author: Henri Vasserman <[email protected]> Date: Wed Jun 28 23:30:10 2023 +0300 Merge 'origin/master' into hipblas commit bb16eff Author: YellowRoseCx <[email protected]> Date: Wed Jun 28 15:27:10 2023 -0500 headers fix; add kquants_iter for hipblas and add gfx803 (#1) * kquants_iter for hipblas and add gfx803 * Update CMakeLists.txt with hipblas kquants_iter and DMMV_F16 * remove dmmv_f16 for now commit 096f0b0 Author: YellowRoseCx <[email protected]> Date: Wed Jun 28 15:27:02 2023 -0500 revert unnecessary hipblas conditionals commit d81e81a Author: YellowRoseCx <[email protected]> Date: Wed Jun 28 14:48:23 2023 -0500 Update Makefile hipblas nvcc correction commit c8ae945 Merge: c1e5c83 0be54f7 Author: Henri Vasserman <[email protected]> Date: Tue Jun 27 10:50:37 2023 +0300 Merge 'origin/master' into hipblas commit 2579ecf Merge: abed427 d2034ce Author: YellowRoseCx <[email protected]> Date: Sun Jun 25 17:50:04 2023 -0500 Merge branch 'LostRuins:concedo' into main commit c1e5c83 Merge: 35a6031 447ccbe Author: Henri Vasserman <[email protected]> Date: Sun Jun 25 21:40:05 2023 +0300 Merge 'origin/master' into hipblas commit 35a6031 Merge: df7346c 66a2555 Author: Henri Vasserman <[email protected]> Date: Sun Jun 25 10:57:48 2023 +0300 Merge 'origin/master' into hipblas commit abed427 Author: YellowRoseCx <[email protected]> Date: Sat Jun 24 19:16:30 2023 -0500 reorganize If statements to include proper headers commit 06c3bf0 Merge: ea6d320 8342fe8 Author: YellowRoseCx <[email protected]> Date: Sat Jun 24 16:57:20 2023 -0500 Merge branch 'LostRuins:concedo' into main commit ea6d320 Author: YellowRoseCx <[email protected]> Date: Fri Jun 23 01:53:28 2023 -0500 Update README.md commit 4d56ad8 Author: YellowRoseCx <[email protected]> Date: Thu Jun 22 16:19:43 2023 -0500 Update README.md commit 21f9308 Author: YellowRoseCx <[email protected]> Date: Thu Jun 22 15:42:05 2023 -0500 kquants_iter for hipblas and add gfx803 commit df7346c Merge: 5dd2fbe 7487137 Author: Henri Vasserman <[email protected]> Date: Thu Jun 22 20:51:09 2023 +0300 Merge 'origin/master' into hipblas commit b6ff890 Merge: eb094f0 e6ddb15 Author: YellowRoseCx <[email protected]> Date: Thu Jun 22 12:42:09 2023 -0500 Merge branch 'LostRuins:concedo' into main commit eb094f0 Author: YellowRoseCx <[email protected]> Date: Wed Jun 21 23:59:18 2023 -0500 lowvram parameter description commit 3a5dfeb Merge: 665cc11 b1f00fa Author: YellowRoseCx <[email protected]> Date: Wed Jun 21 16:53:03 2023 -0500 Merge branch 'LostRuins:concedo' into koboldcpp-rocm commit 665cc11 Author: YellowRoseCx <[email protected]> Date: Wed Jun 21 01:13:19 2023 -0500 add lowvram parameter commit 222cbbb Author: YellowRoseCx <[email protected]> Date: Tue Jun 20 19:03:28 2023 -0500 add additional hipblas conditions for cublas commit e1f9581 Author: YellowRoseCx <[email protected]> Date: Tue Jun 20 16:51:59 2023 -0500 Add hip def for cuda v2 commit 3bff5c0 Merge: a7e74b3 266d47a Author: YellowRoseCx <[email protected]> Date: Tue Jun 20 13:38:06 2023 -0500 Merge branch 'LostRuins:concedo' into koboldcpp-rocm commit a7e74b3 Author: YellowRoseCx <[email protected]> Date: Mon Jun 19 22:04:18 2023 -0500 Update README.md commit 5e99b3c Author: YellowRoseCx <[email protected]> Date: Mon Jun 19 22:03:42 2023 -0500 Update Makefile commit 9190b17 Author: YellowRoseCx <[email protected]> Date: Mon Jun 19 21:47:10 2023 -0500 Update README.md commit 5dd2fbe Merge: 67e229b 20568fe Author: Henri Vasserman <[email protected]> Date: Tue Jun 20 01:23:12 2023 +0300 Merge 'origin/master' into hipblas commit 2780ea2 Author: YellowRoseCx <[email protected]> Date: Sun Jun 18 15:48:00 2023 -0500 Update Makefile commit 04a3e64 Author: YellowRoseCx <[email protected]> Date: Sun Jun 18 14:33:39 2023 -0500 remove extra line commit cccbca9 Author: YellowRoseCx <[email protected]> Date: Sun Jun 18 14:31:17 2023 -0500 attempt adding ROCM hipblas commit a44a1d4 Author: YellowRoseCx <[email protected]> Date: Sun Jun 18 14:31:01 2023 -0500 attempt adding ROCM hipblas commit b088184 Author: YellowRoseCx <[email protected]> Date: Sun Jun 18 14:30:54 2023 -0500 attempt adding ROCM hipblas commit 67e229b Merge: 6f7c156 b241649 Author: Henri Vasserman <[email protected]> Date: Sun Jun 18 00:36:54 2023 +0300 Merge 'origin/master' into hipblas commit 6f7c156 Merge: 61df8e9 fc45a81 Author: Henri Vasserman <[email protected]> Date: Sat Jun 17 16:53:22 2023 +0300 Merge 'origin/master' into hipblas commit 61df8e9 Author: Henri Vasserman <[email protected]> Date: Wed Jun 14 22:46:10 2023 +0300 add cudaMemset commit a836529 Merge: 85f902d 254a7a7 Author: Henri Vasserman <[email protected]> Date: Wed Jun 14 22:41:55 2023 +0300 Merge 'origin/master' into hipblas commit 85f902d Merge: 4362e80 b50b570 Author: Henri Vasserman <[email protected]> Date: Thu Jun 8 10:50:28 2023 +0300 Merge 'origin/master' into hipblas commit 4362e80 Merge: fa5b3d7 17366df Author: Henri Vasserman <[email protected]> Date: Tue Jun 6 23:14:40 2023 +0300 Merge 'origin/master' into hipblas commit fa5b3d7 Author: Henri Vasserman <[email protected]> Date: Tue Jun 6 18:47:00 2023 +0300 fix makefile. commit 1ba4ce4 Author: Henri Vasserman <[email protected]> Date: Tue Jun 6 18:41:08 2023 +0300 Revert "warp size fixes" It seems like 32 is faster for me, at least and it won't cause so many conflicts. This reverts commit 5d6eb72. commit 5d6eb72 Author: Henri Vasserman <[email protected]> Date: Tue Jun 6 18:32:41 2023 +0300 warp size fixes commit 33091a9 Merge: 9fdaa1d 2d43387 Author: Henri Vasserman <[email protected]> Date: Tue Jun 6 16:19:23 2023 +0300 Merge 'origin/master' into hipblas commit 9fdaa1d Author: Henri Vasserman <[email protected]> Date: Sat May 27 19:17:53 2023 +0300 Add more defs For forward compatibility ggerganov#1607 commit a4648c1 Merge: 4c8b3fb 0ecb1bb Author: Henri Vasserman <[email protected]> Date: Sat May 27 18:22:39 2023 +0300 Merge 'origin/master' into hipblas commit 4c8b3fb Author: Henri Vasserman <[email protected]> Date: Fri May 26 01:08:53 2023 +0300 add configurable vars commit 30d921a Author: Henri Vasserman <[email protected]> Date: Fri May 26 01:03:56 2023 +0300 and makefile commit a593a4f Author: Henri Vasserman <[email protected]> Date: Fri May 26 00:55:28 2023 +0300 Add missing parameters commit 174bf6a Merge: f80ce7a 1fcdcc2 Author: Henri Vasserman <[email protected]> Date: Fri May 26 00:44:23 2023 +0300 Merge 'origin/master' into hipblas commit f80ce7a Merge: 600ace3 ac7876a Author: Henri Vasserman <[email protected]> Date: Thu May 25 00:02:50 2023 +0300 Merge branch 'origin/master' into hipblas commit 600ace3 Author: Henri Vasserman <[email protected]> Date: Sat May 20 23:42:20 2023 +0300 update warp size commit b19fefe Author: Henri Vasserman <[email protected]> Date: Sat May 20 23:28:08 2023 +0300 Forwardcompat commit c66115b Merge: a0b2d5f b8ee340 Author: Henri Vasserman <[email protected]> Date: Sat May 20 18:29:31 2023 +0300 Merge 'origin/master' into hipblas commit a0b2d5f Merge: 8bab456 2a5ee02 Author: Henri Vasserman <[email protected]> Date: Tue May 16 17:08:29 2023 +0300 Merge 'origin/master' into hipblas commit 8bab456 Merge: 2956630 b5c9295 Author: Henri Vasserman <[email protected]> Date: Mon May 15 00:01:12 2023 +0300 Merge 'origin/master' into hipblas commit 2956630 Merge: 0fe6384 f048af0 Author: Henri Vasserman <[email protected]> Date: Sat May 13 13:12:52 2023 +0300 Merge 'origin/master' into hipblas commit 0fe6384 Author: Henri Vasserman <[email protected]> Date: Fri May 12 17:22:11 2023 +0300 fix makefile commit 605560d Merge: 127f68e 089b1c9 Author: Henri Vasserman <[email protected]> Date: Fri May 12 16:12:53 2023 +0300 Merge 'origin/master' into hipblas commit 127f68e Merge: 070cbcc b608b55 Author: Henri Vasserman <[email protected]> Date: Thu May 11 20:21:27 2023 +0300 Merge 'origin/master' into hipblas commit 070cbcc Author: Henri Vasserman <[email protected]> Date: Sun May 7 18:10:56 2023 +0300 occupanct function commit a3296d5 Merge: 0aefa6a e129551 Author: Henri Vasserman <[email protected]> Date: Sun May 7 18:06:04 2023 +0300 Merge 'origin/master' into hipblas commit 0aefa6a Merge: baeb482 1b0fd45 Author: Henri Vasserman <[email protected]> Date: Sun May 7 12:24:41 2023 +0300 Merge 'origin/master' into hipblas commit baeb482 Author: Henri Vasserman <[email protected]> Date: Sun May 7 12:24:12 2023 +0300 Revert to default copy commit 289073a Merge: 1107194 173d0e6 Author: Henri Vasserman <[email protected]> Date: Sat May 6 19:59:41 2023 +0300 Merge 'origin/master' into hipblas commit 1107194 Merge: 04c0d48 a3b85b2 Author: Henri Vasserman <[email protected]> Date: Sat May 6 00:38:20 2023 +0300 Merge 'origin/master' into hipblas commit 04c0d48 Author: Henri Vasserman <[email protected]> Date: Thu May 4 12:31:16 2023 +0300 Move all HIP stuff to ggml-cuda.cu commit d83cfba Merge: b67cc50 799fdc1 Author: Henri Vasserman <[email protected]> Date: Thu May 4 11:31:16 2023 +0300 Merge 'origin/master' into hipblas commit b67cc50 Merge: fcbc262 e216aa0 Author: Henri Vasserman <[email protected]> Date: Wed May 3 15:04:51 2023 +0300 Merge 'origin/master' into hipblas commit fcbc262 Merge: c73def1 f4cef87 Author: Henri Vasserman <[email protected]> Date: Mon May 1 22:45:29 2023 +0300 Merge 'origin/master' into hipblas commit c73def1 Merge: d8ea75e f0d70f1 Author: Henri Vasserman <[email protected]> Date: Sun Apr 30 18:40:42 2023 +0300 Merge 'origin/master' into hipblas commit d8ea75e Merge: d194586 334637e Author: Henri Vasserman <[email protected]> Date: Sat Apr 29 11:25:51 2023 +0300 Merge 'origin/master' into hipblas commit d194586 Merge: 2ab9d11 7f15c5c Author: Henri Vasserman <[email protected]> Date: Fri Apr 28 23:03:52 2023 +0300 Merge 'origin/master' into hipblas commit 2ab9d11 Merge: 3b4a531 04aaae1 Author: Henri Vasserman <[email protected]> Date: Fri Apr 28 16:30:05 2023 +0300 Merge 'origin/master' into hipblas commit 3b4a531 Merge: a1caa48 0b2da20 Author: Henri Vasserman <[email protected]> Date: Fri Apr 28 10:08:41 2023 +0300 Merge 'origin/master' into hipblas commit a1caa48 Author: Henri Vasserman <[email protected]> Date: Fri Apr 28 10:08:21 2023 +0300 add more cuda defines This is so 'slaren/cuda-f16f32' would merge. commit ecc0565 Author: Henri Vasserman <[email protected]> Date: Fri Apr 28 01:58:27 2023 +0300 only .cu file needs to be complied as device commit ef51e9e Merge: d571d16 4afcc37 Author: Henri Vasserman <[email protected]> Date: Wed Apr 26 12:46:26 2023 +0300 Merge branch 'ggerganov:master' into hipblas commit d571d16 Merge: 608aa33 dd0eabc Author: Henri Vasserman <[email protected]> Date: Tue Apr 25 21:15:33 2023 +0300 Merge 'origin/master' into hipblas commit 608aa33 Author: Henri Vasserman <[email protected]> Date: Tue Apr 25 21:15:04 2023 +0300 change default GPU arch to match CMake commit 3a004b2 Author: Henri Vasserman <[email protected]> Date: Mon Apr 24 02:24:54 2023 +0300 add rpath commit db7a012 Merge: 3677235 284685f Author: Henri Vasserman <[email protected]> Date: Sun Apr 23 21:49:28 2023 +0300 Merge 'origin/master' into hipblas commit 3677235 Author: Henri Vasserman <[email protected]> Date: Sat Apr 22 23:28:00 2023 +0300 More build file changes commit d3e1984 Author: Henri Vasserman <[email protected]> Date: Fri Apr 21 03:32:06 2023 +0300 add rpath commit 0e005f7 Author: Henri Vasserman <[email protected]> Date: Fri Apr 21 02:13:00 2023 +0300 Build file changes Now HIP Clang is not required, the CMake scripts will configure the needed compiler, which can be system clang++. Also other code can still use GCC, but CMake will force the clang to link. commit 54a63c1 Author: Henri Vasserman <[email protected]> Date: Thu Apr 20 22:19:22 2023 +0300 Update Makefile for the Cuda kernels commit 0fd8363 Author: Henri Vasserman <[email protected]> Date: Thu Apr 20 02:04:00 2023 +0300 use hipblas based on cublas

* koboldcpp-ROCm Port commit 3416c98 Merge: 5eb17f0 4c4e435 Author: YellowRoseCx <[email protected]> Date: Fri Aug 25 13:46:56 2023 -0500 Merge remote-tracking branch 'upstream/concedo' commit 5eb17f0 Author: YellowRoseCx <[email protected]> Date: Fri Aug 25 13:38:21 2023 -0500 ROCm Port update * use hipblas based on cublas * Update Makefile for the Cuda kernels * Expand arch list and make it overrideable * Fix multi GPU on multiple amd architectures with rocblas_initialize() (#5) * add hipBLAS to README * new build arg LLAMA_CUDA_MMQ_Y * fix half2 decomposition * Add intrinsics polyfills for AMD * AMD assembly optimized __dp4a * Allow overriding CC_TURING * use "ROCm" instead of "CUDA" * ignore all build dirs * Add Dockerfiles * fix llama-bench * fix -nommq help for non CUDA/HIP --------- Co-Authored-By: YellowRoseCx <[email protected]> Co-Authored-By: ardfork <[email protected]> Co-Authored-By: funnbot <[email protected]> Co-Authored-By: Engininja2 <[email protected]> Co-Authored-By: Kerfuffle <[email protected]> Co-Authored-By: jammm <[email protected]> Co-Authored-By: jdecourval <[email protected]> commit b34f4bd Author: YellowRoseCx <[email protected]> Date: Sat Aug 19 17:12:52 2023 -0500 Update README.md commit 7d11961 Author: YellowRoseCx <[email protected]> Date: Mon Aug 14 23:03:12 2023 -0500 remove force DMMV commit cd61aa0 Author: YellowRoseCx <[email protected]> Date: Sat Aug 12 17:24:31 2023 -0500 restore main_gpu parameter commit 4a042f3 Author: Henri Vasserman <[email protected]> Date: Sat Aug 12 10:51:46 2023 +0300 gfx1100 support --------- Co-authored-by: ardfork <[email protected]> Co-authored-by: jammm <[email protected]> Co-authored-by: jdecourval <[email protected]> commit 8913bc6 Author: Henri Vasserman <[email protected]> Date: Fri Aug 11 10:16:02 2023 +0300 Allow overriding CC_TURING commit e77a4c3 Author: Henri Vasserman <[email protected]> Date: Fri Aug 11 10:00:07 2023 +0300 Merge 'origin/master' into hipblas commit cc4c4e3 Author: Engininja2 <[email protected]> Date: Fri Aug 11 09:43:14 2023 +0300 New __dp4a assembly Now compatible with gfx900 and faster as well. commit 1a03b70 Author: Henri Vasserman <[email protected]> Date: Fri Aug 11 09:30:28 2023 +0300 Undo mess --------- Co-authored-by: ardfork <[email protected]> commit 4366ff9 Author: DannyDaemonic <[email protected]> Date: Thu Aug 10 13:11:36 2023 -0700 Handle `ENABLE_VIRTUAL_TERMINAL_PROCESSING` more gracefully on earlier versions of Windows. commit 811ff85 Author: Christian Demsar <[email protected]> Date: Thu Aug 10 10:28:27 2023 -0400 Add --n-predict -2 for stopping generation on full context (ggerganov#2565) commit 37c9717 Author: Martin Krasser <[email protected]> Date: Thu Aug 10 12:16:38 2023 +0200 Fix grammar-based sampling issue in server (ggerganov#2566) commit d18ecd5 Author: YellowRoseCx <[email protected]> Date: Thu Aug 10 13:19:41 2023 -0500 make mmq gen faster for amd commit 243894a Author: Henri Vasserman <[email protected]> Date: Thu Aug 10 12:14:40 2023 +0300 ws fix commit ac2f14d Author: Engininja2 <[email protected]> Date: Thu Aug 10 12:11:27 2023 +0300 AMD assembly optimized __dp4a Doesn't seem to work for gfx900, so commented out. commit 9dba0c9 Author: Henri Vasserman <[email protected]> Date: Thu Aug 10 12:09:28 2023 +0300 Fix merge --------- Co-authored-by: ardfork <[email protected]> Co-authored-by: Kerfuffle <[email protected]> commit f570b5c Author: YellowRoseCx <[email protected]> Date: Wed Aug 9 22:11:20 2023 -0500 Revert "revert cuda changes as they are bugggy" This reverts commit 1541bf8. commit 1541bf8 Author: Concedo <[email protected]> Date: Wed Aug 9 22:36:41 2023 +0800 revert cuda changes as they are bugggy commit bacc202 Author: YellowRoseCx <[email protected]> Date: Wed Aug 9 20:37:17 2023 -0500 Merge remote-tracking branch 'upstream/concedo' commit b7cb4cf Author: YellowRoseCx <[email protected]> Date: Wed Aug 9 20:00:52 2023 -0500 additional fixes commit fadae72 Merge: 518eb2a 8f8ab6c Author: YellowRoseCx <[email protected]> Date: Wed Aug 9 18:45:50 2023 -0500 Merge branch 'hipblas' into develop4Main commit 518eb2a Merge: bda0215 cae6a84 Author: YellowRoseCx <[email protected]> Date: Wed Aug 9 18:32:10 2023 -0500 Merge remote-tracking branch 'upstream/concedo' into develop2Main commit bda0215 Author: YellowRoseCx <[email protected]> Date: Wed Aug 9 18:17:54 2023 -0500 update makefile to multisystem path commit 8f8ab6c Author: YellowRoseCx <[email protected]> Date: Wed Aug 9 18:05:03 2023 -0500 hipLDFLAG Path change Unix to multisystem in Makefile changed the hardcoded linux distro hipblas LD path from -L/opt/rocm/lib to use the defined ROCM_PATH variable to be flexible with ROCm on non-Linux OS commit 610ba4c Merge: 4024f91 25d43e0 Author: Henri Vasserman <[email protected]> Date: Wed Aug 9 23:54:58 2023 +0300 Merge 'origin/master' into hipblas commit 4024f91 Author: Henri Vasserman <[email protected]> Date: Wed Aug 9 01:56:44 2023 +0300 Add intrinsics polyfills for AMD --------- Co-authored-by: ardfork <[email protected]> Co-authored-by: funnbot <[email protected]> Co-authored-by: Engininja2 <[email protected]> commit ab62128 Merge: d91456a f5bfea0 Author: Henri Vasserman <[email protected]> Date: Wed Aug 9 00:37:01 2023 +0300 Merge 'origin/master' into hipblas commit ee9fa2a Author: YellowRoseCx <[email protected]> Date: Wed Aug 2 01:53:58 2023 -0500 Update Makefile commit d91456a Author: ardfork <[email protected]> Date: Mon Jul 31 20:35:00 2023 +0300 fix half2 decomposition commit c1cb70d Author: Henri Vasserman <[email protected]> Date: Mon Jul 31 19:56:44 2023 +0300 new build arg LLAMA_CUDA_MMQ_Y commit c1664a0 Merge: 4336231 0728c5a Author: Henri Vasserman <[email protected]> Date: Mon Jul 31 19:32:27 2023 +0300 Merge 'origin/master' into hipblas commit 848558d Author: YellowRoseCx <[email protected]> Date: Sun Jul 30 20:02:52 2023 -0500 import vars logic fix commit b650b84 Author: YellowRoseCx <[email protected]> Date: Sun Jul 30 00:21:36 2023 -0500 Update easy_KCPP-ROCm_install.sh commit 8573a67 Author: YellowRoseCx <[email protected]> Date: Sat Jul 29 21:31:12 2023 -0500 remove duplicate code and fix typo remove duplicate tooltip commit 430986e Author: YellowRoseCx <[email protected]> Date: Sat Jul 29 21:07:34 2023 -0500 hide "missing" if all are built move tooltip functions to helper functions section. hides the string "Missing: ..." from showing if all backends are available " if len(runopts)==6 else + " commit dd0db72 Author: YellowRoseCx <[email protected]> Date: Sat Jul 29 20:52:31 2023 -0500 hide "missing" if all are built move tooltip functions to helper functions section. hides the string "Missing: ..." from showing if all backends are available commit 43fffb6 Merge: 0ed65a4 b40550c Author: YellowRoseCx <[email protected]> Date: Sat Jul 29 19:13:15 2023 -0500 Merge branch 'concedo' commit 0ed65a4 Author: YellowRoseCx <[email protected]> Date: Sat Jul 29 18:34:21 2023 -0500 Hide unavailable backends & Add tooltip over backend count Hides unavailable backends from the user and if the program is launched without any backends made, it shows an error message to them stating no backends were found and to make them using the 'make' command Add tooltip when hovering over backend count label hovering over the new label that shows the backend count will explain what the numbers are, and show the users which backends are not available or built commit 2a26398 Merge: cee2e9d 31486eb Author: YellowRoseCx <[email protected]> Date: Sat Jul 29 15:16:33 2023 -0500 Merge remote-tracking branch 'upstream/concedo' commit 4336231 Author: Henri Vasserman <[email protected]> Date: Sat Jul 29 18:35:56 2023 +0300 add hipBLAS to README --------- Co-authored-by: ardfork <[email protected]> commit f8e3fc6 Author: Henri Vasserman <[email protected]> Date: Sat Jul 29 14:16:46 2023 +0300 rocblas init stuff commit d2ade63 Merge: cde52d6 8a88e58 Author: Henri Vasserman <[email protected]> Date: Sat Jul 29 12:59:48 2023 +0300 Merge 'origin/master' into hipblas commit cee2e9d Author: YellowRoseCx <[email protected]> Date: Wed Jul 26 23:36:55 2023 -0500 Only Show Available Backends in GUI Hides unavailable backends from the user and if the program is launched without any backends made, it shows an error message to them stating no backends were found and to make them using the 'make' command commit 7863610 Author: YellowRoseCx <[email protected]> Date: Wed Jul 26 13:27:22 2023 -0500 Update easy_KCPP-ROCm_install.sh commit 731cd6e Author: YellowRoseCx <[email protected]> Date: Tue Jul 25 22:39:50 2023 -0500 Create easy_rocm_install.sh commit f154685 Merge: cbdc1f3 94e0a06 Author: YellowRoseCx <[email protected]> Date: Tue Jul 25 22:25:10 2023 -0500 Merge branch 'concedo_experimentalMAIN' commit cbdc1f3 Merge: 5b838d4 9731682 Author: YellowRoseCx <[email protected]> Date: Mon Jul 24 16:53:21 2023 -0500 Merge remote-tracking branch 'upstream/concedo' commit cde52d6 Merge: 8e8054a 84e09a7 Author: Henri Vasserman <[email protected]> Date: Mon Jul 24 12:22:58 2023 +0300 Merge 'origin/master' into hipblas commit 8e8054a Author: Henri Vasserman <[email protected]> Date: Mon Jul 24 12:20:49 2023 +0300 Add rocblas to build files commit 1f6294d Author: YellowRoseCx <[email protected]> Date: Mon Jul 24 03:52:01 2023 -0500 Fix multi GPU on multiple amd architectures with rocblas_initialize() (#5) * initialize rocblas commit 5b838d4 Author: YellowRoseCx <[email protected]> Date: Mon Jul 24 03:10:35 2023 -0500 amd multigpu full layer offload w/o vram scratch commit 9bfb2fd Merge: b379f9d 66328fc Author: YellowRoseCx <[email protected]> Date: Mon Jul 24 03:07:44 2023 -0500 Merge branch 'concedo_experimental' commit b379f9d Author: YellowRoseCx <[email protected]> Date: Mon Jul 24 03:07:00 2023 -0500 Revert "amd multigpu full layer offload w/o vram scratch" This reverts commit 9adfc8e. commit 9adfc8e Author: YellowRoseCx <[email protected]> Date: Mon Jul 24 02:56:40 2023 -0500 amd multigpu full layer offload w/o vram scratch commit 05c792e Author: YellowRoseCx <[email protected]> Date: Mon Jul 24 00:18:48 2023 -0500 initialize rocblas commit ade68d0 Merge: 521ad6b 56995ca Author: YellowRoseCx <[email protected]> Date: Sun Jul 23 20:25:05 2023 -0500 Merge remote-tracking branch 'upstream/concedo' commit 521ad6b Author: YellowRoseCx <[email protected]> Date: Thu Jul 20 21:42:33 2023 -0500 lazy import_var error handling for saves commit 9553e52 Merge: cac6650 f036109 Author: YellowRoseCx <[email protected]> Date: Thu Jul 20 19:59:41 2023 -0500 Merge remote-tracking branch 'upstream/concedo' commit cac6650 Author: YellowRoseCx <[email protected]> Date: Mon Jul 17 23:05:02 2023 -0500 Makefile fix! Allows hip/clblast build together commit 3db70b5 Merge: 2ec4466 7568d1a Author: Henri Vasserman <[email protected]> Date: Tue Jul 18 01:54:17 2023 +0300 Merge 'origin/master' into hipblas commit f208670 Author: YellowRoseCx <[email protected]> Date: Fri Jul 14 02:56:03 2023 -0500 improve error handling with gpu names commit 860e738 Author: YellowRoseCx <[email protected]> Date: Fri Jul 14 00:33:03 2023 -0500 Show GPU names in GUI, Only show GPUs that exist changed the pre-set 1,2,3 and 1,2,3,all settings that the GPU selector had and replaced them with a function that grabs the GPU names and sets the names as the values for the selector boxes. commit 2ec4466 Author: Henri Vasserman <[email protected]> Date: Thu Jul 13 13:44:02 2023 +0300 Update build flags. GGML_CUDA_DMMV_Y is now GGML_CUDA_MMV_Y so update your build instructions. GGML_CUDA_FORCE_DMMV is always enabled. --------- Co-authored-by: YellowRoseCx <[email protected]> commit cd36b18 Merge: afcb8fe 1cbf561 Author: Henri Vasserman <[email protected]> Date: Thu Jul 13 13:03:01 2023 +0300 Merge 'origin/master' into hipblas commit ac7ebc3 Author: YellowRoseCx <[email protected]> Date: Wed Jul 12 18:32:18 2023 -0500 add hipBLAS name scheme to GUI and update README commit 7f85cc5 Author: YellowRoseCx <[email protected]> Date: Wed Jul 12 17:35:54 2023 -0500 update makefile and ggml.c commit 6ca3499 Author: YellowRoseCx <[email protected]> Date: Wed Jul 12 15:43:45 2023 -0500 ggml.c fix commit 770e674 Merge: 2b289cd 5941514 Author: YellowRoseCx <[email protected]> Date: Wed Jul 12 15:24:36 2023 -0500 Merge remote-tracking branch 'upstream/concedo' commit 2b289cd Author: YellowRoseCx <[email protected]> Date: Wed Jul 12 14:30:00 2023 -0500 Update c-cpp.yml commit 5dae95a Author: YellowRoseCx <[email protected]> Date: Wed Jul 12 14:28:51 2023 -0500 Update c-cpp.yml commit b37cd73 Author: YellowRoseCx <[email protected]> Date: Wed Jul 12 14:27:04 2023 -0500 Create c-cpp.yml to test Actions commit afcb8fe Author: Henri Vasserman <[email protected]> Date: Tue Jul 11 18:09:27 2023 +0300 Add new config option commit 8c2c497 Merge: e610466 2347463 Author: Henri Vasserman <[email protected]> Date: Tue Jul 11 17:53:54 2023 +0300 Merge 'origin/master' into hipblas commit e610466 Author: Henri Vasserman <[email protected]> Date: Tue Jul 11 17:53:14 2023 +0300 Expand arch list and make it overrideable commit 80e4e54 Merge: 7735c5a 1d16309 Author: Henri Vasserman <[email protected]> Date: Mon Jul 10 02:09:28 2023 +0300 Merge 'origin/master' into hipblas commit 8432e9d Author: YellowRoseCx <[email protected]> Date: Sun Jul 9 16:55:30 2023 -0500 Update Makefile commit b58c189 Author: YellowRoseCx <[email protected]> Date: Sun Jul 9 16:20:00 2023 -0500 Add multi-gpu CuBLAS support to new GUI commit 0c1c71b Author: YellowRoseCx <[email protected]> Date: Sat Jul 8 07:56:57 2023 -0500 Update Makefile commit f864f60 Author: Johannes Gäßler <[email protected]> Date: Sat Jul 8 00:25:15 2023 +0200 CUDA: add __restrict__ to mul mat vec kernels (ggerganov#2140) commit 4539bc2 Author: YellowRoseCx <[email protected]> Date: Sat Jul 8 01:36:14 2023 -0500 update makefile for changes commit 912e31e Merge: 74e2703 ddaa4f2 Author: YellowRoseCx <[email protected]> Date: Fri Jul 7 23:15:37 2023 -0500 Merge remote-tracking branch 'upstream/concedo' commit 74e2703 Merge: cf65429 f9108ba Author: YellowRoseCx <[email protected]> Date: Wed Jul 5 15:16:49 2023 -0500 Merge branch 'LostRuins:concedo' into main commit 7735c5a Merge: c3e3733 7ee76e4 Author: Henri Vasserman <[email protected]> Date: Tue Jul 4 17:09:16 2023 +0300 Merge 'origin/master' into hipblas commit cf65429 Author: YellowRoseCx <[email protected]> Date: Mon Jul 3 16:56:40 2023 -0500 print cuda or opencl based on what's used commit 72c16d2 Author: YellowRoseCx <[email protected]> Date: Mon Jul 3 16:45:39 2023 -0500 Revert "fix my mistake that broke other arches" This reverts commit 777aed5. commit 777aed5 Author: YellowRoseCx <[email protected]> Date: Mon Jul 3 15:53:32 2023 -0500 fix my mistake that broke other arches commit 27780a9 Author: YellowRoseCx <[email protected]> Date: Sun Jul 2 16:03:27 2023 -0500 rocm fixes commit f52c7d4 Author: YellowRoseCx <[email protected]> Date: Sun Jul 2 16:02:58 2023 -0500 Revert "rocm fixes" This reverts commit 2fe9927. commit 2fe9927 Author: YellowRoseCx <[email protected]> Date: Sun Jul 2 15:58:21 2023 -0500 rocm fixes commit efe7560 Author: YellowRoseCx <[email protected]> Date: Sun Jul 2 15:55:43 2023 -0500 Revert "move HIPBLAS definitions into ggml-cuda.h" This reverts commit bf49a93. commit 4fc0181 Author: YellowRoseCx <[email protected]> Date: Sun Jul 2 15:55:36 2023 -0500 Revert "move hipblas definitions to header files" This reverts commit 2741ffb. commit 89eb576 Merge: 2741ffb 3d2907d Author: YellowRoseCx <[email protected]> Date: Sun Jul 2 14:44:13 2023 -0500 Merge branch 'LostRuins:concedo' into main commit c3e3733 Author: Henri Vasserman <[email protected]> Date: Sun Jul 2 15:51:31 2023 +0300 ROCm fixes commit 15db19a Merge: 04419f1 46088f7 Author: Henri Vasserman <[email protected]> Date: Sun Jul 2 15:39:57 2023 +0300 Merge 'origin/master' into hipblas commit 2741ffb Author: YellowRoseCx <[email protected]> Date: Sat Jul 1 17:07:42 2023 -0500 move hipblas definitions to header files commit bf49a93 Author: YellowRoseCx <[email protected]> Date: Sat Jul 1 16:38:50 2023 -0500 move HIPBLAS definitions into ggml-cuda.h commit 540f4e0 Merge: 2c3b46f eda663f Author: YellowRoseCx <[email protected]> Date: Sat Jul 1 14:58:32 2023 -0500 Merge remote-tracking branch 'upstream/concedo' commit 2c3b46f Author: YellowRoseCx <[email protected]> Date: Thu Jun 29 18:43:43 2023 -0500 changes to fix build commit c9e1103 Author: YellowRoseCx <[email protected]> Date: Thu Jun 29 18:20:07 2023 -0500 Update ggml_v2-cuda-legacy.cu for ROCM commit b858fc5 Author: YellowRoseCx <[email protected]> Date: Thu Jun 29 17:49:39 2023 -0500 changes to work with upstream commit 69a0c25 Merge: 096f0b0 1347d3a Author: YellowRoseCx <[email protected]> Date: Thu Jun 29 16:59:06 2023 -0500 Merge remote-tracking branch 'upstream/concedo' commit 04419f1 Merge: bb16eff d3494bb Author: Henri Vasserman <[email protected]> Date: Wed Jun 28 23:30:10 2023 +0300 Merge 'origin/master' into hipblas commit bb16eff Author: YellowRoseCx <[email protected]> Date: Wed Jun 28 15:27:10 2023 -0500 headers fix; add kquants_iter for hipblas and add gfx803 (#1) * kquants_iter for hipblas and add gfx803 * Update CMakeLists.txt with hipblas kquants_iter and DMMV_F16 * remove dmmv_f16 for now commit 096f0b0 Author: YellowRoseCx <[email protected]> Date: Wed Jun 28 15:27:02 2023 -0500 revert unnecessary hipblas conditionals commit d81e81a Author: YellowRoseCx <[email protected]> Date: Wed Jun 28 14:48:23 2023 -0500 Update Makefile hipblas nvcc correction commit c8ae945 Merge: c1e5c83 0be54f7 Author: Henri Vasserman <[email protected]> Date: Tue Jun 27 10:50:37 2023 +0300 Merge 'origin/master' into hipblas commit 2579ecf Merge: abed427 d2034ce Author: YellowRoseCx <[email protected]> Date: Sun Jun 25 17:50:04 2023 -0500 Merge branch 'LostRuins:concedo' into main commit c1e5c83 Merge: 35a6031 447ccbe Author: Henri Vasserman <[email protected]> Date: Sun Jun 25 21:40:05 2023 +0300 Merge 'origin/master' into hipblas commit 35a6031 Merge: df7346c 66a2555 Author: Henri Vasserman <[email protected]> Date: Sun Jun 25 10:57:48 2023 +0300 Merge 'origin/master' into hipblas commit abed427 Author: YellowRoseCx <[email protected]> Date: Sat Jun 24 19:16:30 2023 -0500 reorganize If statements to include proper headers commit 06c3bf0 Merge: ea6d320 8342fe8 Author: YellowRoseCx <[email protected]> Date: Sat Jun 24 16:57:20 2023 -0500 Merge branch 'LostRuins:concedo' into main commit ea6d320 Author: YellowRoseCx <[email protected]> Date: Fri Jun 23 01:53:28 2023 -0500 Update README.md commit 4d56ad8 Author: YellowRoseCx <[email protected]> Date: Thu Jun 22 16:19:43 2023 -0500 Update README.md commit 21f9308 Author: YellowRoseCx <[email protected]> Date: Thu Jun 22 15:42:05 2023 -0500 kquants_iter for hipblas and add gfx803 commit df7346c Merge: 5dd2fbe 7487137 Author: Henri Vasserman <[email protected]> Date: Thu Jun 22 20:51:09 2023 +0300 Merge 'origin/master' into hipblas commit b6ff890 Merge: eb094f0 e6ddb15 Author: YellowRoseCx <[email protected]> Date: Thu Jun 22 12:42:09 2023 -0500 Merge branch 'LostRuins:concedo' into main commit eb094f0 Author: YellowRoseCx <[email protected]> Date: Wed Jun 21 23:59:18 2023 -0500 lowvram parameter description commit 3a5dfeb Merge: 665cc11 b1f00fa Author: YellowRoseCx <[email protected]> Date: Wed Jun 21 16:53:03 2023 -0500 Merge branch 'LostRuins:concedo' into koboldcpp-rocm commit 665cc11 Author: YellowRoseCx <[email protected]> Date: Wed Jun 21 01:13:19 2023 -0500 add lowvram parameter commit 222cbbb Author: YellowRoseCx <[email protected]> Date: Tue Jun 20 19:03:28 2023 -0500 add additional hipblas conditions for cublas commit e1f9581 Author: YellowRoseCx <[email protected]> Date: Tue Jun 20 16:51:59 2023 -0500 Add hip def for cuda v2 commit 3bff5c0 Merge: a7e74b3 266d47a Author: YellowRoseCx <[email protected]> Date: Tue Jun 20 13:38:06 2023 -0500 Merge branch 'LostRuins:concedo' into koboldcpp-rocm commit a7e74b3 Author: YellowRoseCx <[email protected]> Date: Mon Jun 19 22:04:18 2023 -0500 Update README.md commit 5e99b3c Author: YellowRoseCx <[email protected]> Date: Mon Jun 19 22:03:42 2023 -0500 Update Makefile commit 9190b17 Author: YellowRoseCx <[email protected]> Date: Mon Jun 19 21:47:10 2023 -0500 Update README.md commit 5dd2fbe Merge: 67e229b 20568fe Author: Henri Vasserman <[email protected]> Date: Tue Jun 20 01:23:12 2023 +0300 Merge 'origin/master' into hipblas commit 2780ea2 Author: YellowRoseCx <[email protected]> Date: Sun Jun 18 15:48:00 2023 -0500 Update Makefile commit 04a3e64 Author: YellowRoseCx <[email protected]> Date: Sun Jun 18 14:33:39 2023 -0500 remove extra line commit cccbca9 Author: YellowRoseCx <[email protected]> Date: Sun Jun 18 14:31:17 2023 -0500 attempt adding ROCM hipblas commit a44a1d4 Author: YellowRoseCx <[email protected]> Date: Sun Jun 18 14:31:01 2023 -0500 attempt adding ROCM hipblas commit b088184 Author: YellowRoseCx <[email protected]> Date: Sun Jun 18 14:30:54 2023 -0500 attempt adding ROCM hipblas commit 67e229b Merge: 6f7c156 b241649 Author: Henri Vasserman <[email protected]> Date: Sun Jun 18 00:36:54 2023 +0300 Merge 'origin/master' into hipblas commit 6f7c156 Merge: 61df8e9 fc45a81 Author: Henri Vasserman <[email protected]> Date: Sat Jun 17 16:53:22 2023 +0300 Merge 'origin/master' into hipblas commit 61df8e9 Author: Henri Vasserman <[email protected]> Date: Wed Jun 14 22:46:10 2023 +0300 add cudaMemset commit a836529 Merge: 85f902d 254a7a7 Author: Henri Vasserman <[email protected]> Date: Wed Jun 14 22:41:55 2023 +0300 Merge 'origin/master' into hipblas commit 85f902d Merge: 4362e80 b50b570 Author: Henri Vasserman <[email protected]> Date: Thu Jun 8 10:50:28 2023 +0300 Merge 'origin/master' into hipblas commit 4362e80 Merge: fa5b3d7 17366df Author: Henri Vasserman <[email protected]> Date: Tue Jun 6 23:14:40 2023 +0300 Merge 'origin/master' into hipblas commit fa5b3d7 Author: Henri Vasserman <[email protected]> Date: Tue Jun 6 18:47:00 2023 +0300 fix makefile. commit 1ba4ce4 Author: Henri Vasserman <[email protected]> Date: Tue Jun 6 18:41:08 2023 +0300 Revert "warp size fixes" It seems like 32 is faster for me, at least and it won't cause so many conflicts. This reverts commit 5d6eb72. commit 5d6eb72 Author: Henri Vasserman <[email protected]> Date: Tue Jun 6 18:32:41 2023 +0300 warp size fixes commit 33091a9 Merge: 9fdaa1d 2d43387 Author: Henri Vasserman <[email protected]> Date: Tue Jun 6 16:19:23 2023 +0300 Merge 'origin/master' into hipblas commit 9fdaa1d Author: Henri Vasserman <[email protected]> Date: Sat May 27 19:17:53 2023 +0300 Add more defs For forward compatibility ggerganov#1607 commit a4648c1 Merge: 4c8b3fb 0ecb1bb Author: Henri Vasserman <[email protected]> Date: Sat May 27 18:22:39 2023 +0300 Merge 'origin/master' into hipblas commit 4c8b3fb Author: Henri Vasserman <[email protected]> Date: Fri May 26 01:08:53 2023 +0300 add configurable vars commit 30d921a Author: Henri Vasserman <[email protected]> Date: Fri May 26 01:03:56 2023 +0300 and makefile commit a593a4f Author: Henri Vasserman <[email protected]> Date: Fri May 26 00:55:28 2023 +0300 Add missing parameters commit 174bf6a Merge: f80ce7a 1fcdcc2 Author: Henri Vasserman <[email protected]> Date: Fri May 26 00:44:23 2023 +0300 Merge 'origin/master' into hipblas commit f80ce7a Merge: 600ace3 ac7876a Author: Henri Vasserman <[email protected]> Date: Thu May 25 00:02:50 2023 +0300 Merge branch 'origin/master' into hipblas commit 600ace3 Author: Henri Vasserman <[email protected]> Date: Sat May 20 23:42:20 2023 +0300 update warp size commit b19fefe Author: Henri Vasserman <[email protected]> Date: Sat May 20 23:28:08 2023 +0300 Forwardcompat commit c66115b Merge: a0b2d5f b8ee340 Author: Henri Vasserman <[email protected]> Date: Sat May 20 18:29:31 2023 +0300 Merge 'origin/master' into hipblas commit a0b2d5f Merge: 8bab456 2a5ee02 Author: Henri Vasserman <[email protected]> Date: Tue May 16 17:08:29 2023 +0300 Merge 'origin/master' into hipblas commit 8bab456 Merge: 2956630 b5c9295 Author: Henri Vasserman <[email protected]> Date: Mon May 15 00:01:12 2023 +0300 Merge 'origin/master' into hipblas commit 2956630 Merge: 0fe6384 f048af0 Author: Henri Vasserman <[email protected]> Date: Sat May 13 13:12:52 2023 +0300 Merge 'origin/master' into hipblas commit 0fe6384 Author: Henri Vasserman <[email protected]> Date: Fri May 12 17:22:11 2023 +0300 fix makefile commit 605560d Merge: 127f68e 089b1c9 Author: Henri Vasserman <[email protected]> Date: Fri May 12 16:12:53 2023 +0300 Merge 'origin/master' into hipblas commit 127f68e Merge: 070cbcc b608b55 Author: Henri Vasserman <[email protected]> Date: Thu May 11 20:21:27 2023 +0300 Merge 'origin/master' into hipblas commit 070cbcc Author: Henri Vasserman <[email protected]> Date: Sun May 7 18:10:56 2023 +0300 occupanct function commit a3296d5 Merge: 0aefa6a e129551 Author: Henri Vasserman <[email protected]> Date: Sun May 7 18:06:04 2023 +0300 Merge 'origin/master' into hipblas commit 0aefa6a Merge: baeb482 1b0fd45 Author: Henri Vasserman <[email protected]> Date: Sun May 7 12:24:41 2023 +0300 Merge 'origin/master' into hipblas commit baeb482 Author: Henri Vasserman <[email protected]> Date: Sun May 7 12:24:12 2023 +0300 Revert to default copy commit 289073a Merge: 1107194 173d0e6 Author: Henri Vasserman <[email protected]> Date: Sat May 6 19:59:41 2023 +0300 Merge 'origin/master' into hipblas commit 1107194 Merge: 04c0d48 a3b85b2 Author: Henri Vasserman <[email protected]> Date: Sat May 6 00:38:20 2023 +0300 Merge 'origin/master' into hipblas commit 04c0d48 Author: Henri Vasserman <[email protected]> Date: Thu May 4 12:31:16 2023 +0300 Move all HIP stuff to ggml-cuda.cu commit d83cfba Merge: b67cc50 799fdc1 Author: Henri Vasserman <[email protected]> Date: Thu May 4 11:31:16 2023 +0300 Merge 'origin/master' into hipblas commit b67cc50 Merge: fcbc262 e216aa0 Author: Henri Vasserman <[email protected]> Date: Wed May 3 15:04:51 2023 +0300 Merge 'origin/master' into hipblas commit fcbc262 Merge: c73def1 f4cef87 Author: Henri Vasserman <[email protected]> Date: Mon May 1 22:45:29 2023 +0300 Merge 'origin/master' into hipblas commit c73def1 Merge: d8ea75e f0d70f1 Author: Henri Vasserman <[email protected]> Date: Sun Apr 30 18:40:42 2023 +0300 Merge 'origin/master' into hipblas commit d8ea75e Merge: d194586 334637e Author: Henri Vasserman <[email protected]> Date: Sat Apr 29 11:25:51 2023 +0300 Merge 'origin/master' into hipblas commit d194586 Merge: 2ab9d11 7f15c5c Author: Henri Vasserman <[email protected]> Date: Fri Apr 28 23:03:52 2023 +0300 Merge 'origin/master' into hipblas commit 2ab9d11 Merge: 3b4a531 04aaae1 Author: Henri Vasserman <[email protected]> Date: Fri Apr 28 16:30:05 2023 +0300 Merge 'origin/master' into hipblas commit 3b4a531 Merge: a1caa48 0b2da20 Author: Henri Vasserman <[email protected]> Date: Fri Apr 28 10:08:41 2023 +0300 Merge 'origin/master' into hipblas commit a1caa48 Author: Henri Vasserman <[email protected]> Date: Fri Apr 28 10:08:21 2023 +0300 add more cuda defines This is so 'slaren/cuda-f16f32' would merge. commit ecc0565 Author: Henri Vasserman <[email protected]> Date: Fri Apr 28 01:58:27 2023 +0300 only .cu file needs to be complied as device commit ef51e9e Merge: d571d16 4afcc37 Author: Henri Vasserman <[email protected]> Date: Wed Apr 26 12:46:26 2023 +0300 Merge branch 'ggerganov:master' into hipblas commit d571d16 Merge: 608aa33 dd0eabc Author: Henri Vasserman <[email protected]> Date: Tue Apr 25 21:15:33 2023 +0300 Merge 'origin/master' into hipblas commit 608aa33 Author: Henri Vasserman <[email protected]> Date: Tue Apr 25 21:15:04 2023 +0300 change default GPU arch to match CMake commit 3a004b2 Author: Henri Vasserman <[email protected]> Date: Mon Apr 24 02:24:54 2023 +0300 add rpath commit db7a012 Merge: 3677235 284685f Author: Henri Vasserman <[email protected]> Date: Sun Apr 23 21:49:28 2023 +0300 Merge 'origin/master' into hipblas commit 3677235 Author: Henri Vasserman <[email protected]> Date: Sat Apr 22 23:28:00 2023 +0300 More build file changes commit d3e1984 Author: Henri Vasserman <[email protected]> Date: Fri Apr 21 03:32:06 2023 +0300 add rpath commit 0e005f7 Author: Henri Vasserman <[email protected]> Date: Fri Apr 21 02:13:00 2023 +0300 Build file changes Now HIP Clang is not required, the CMake scripts will configure the needed compiler, which can be system clang++. Also other code can still use GCC, but CMake will force the clang to link. commit 54a63c1 Author: Henri Vasserman <[email protected]> Date: Thu Apr 20 22:19:22 2023 +0300 Update Makefile for the Cuda kernels commit 0fd8363 Author: Henri Vasserman <[email protected]> Date: Thu Apr 20 02:04:00 2023 +0300 use hipblas based on cublas * Merge Fixes * readme merge fix * remove old ggmlv2 changes * bring ggml v2_cuda up to date with AMD changes * Revert ggml v2_cuda changes BC they werent needed This reverts commit 3385dd4. * avoid launching subprocesses to get device names for now, but other than that seems to be working --------- Co-authored-by: Concedo <[email protected]>

JohannesGaessler added the performance Speed related topics label May 27, 2023

JohannesGaessler marked this pull request as draft May 27, 2023 09:29

JohannesGaessler requested review from ggerganov, slaren, SlyEcho and 0cc4m May 27, 2023 09:29

JohannesGaessler added enhancement New feature or request hardware Hardware related refactoring Refactoring labels May 27, 2023

This was referenced May 27, 2023

Leverage mmap to offloading the tensors #1597

Merged

Fixed WSL cuda's OOM error #1594

Merged

github-actions bot reviewed May 27, 2023

View reviewed changes

ggml-opencl.cpp Outdated Show resolved Hide resolved

ggml-opencl.cpp Outdated Show resolved Hide resolved

llama.cpp Outdated Show resolved Hide resolved

howard0su reviewed May 27, 2023

View reviewed changes

SlyEcho reviewed May 27, 2023

View reviewed changes

ggml-cuda.cu Show resolved Hide resolved

SlyEcho reviewed May 27, 2023

View reviewed changes

ggml.h Show resolved Hide resolved

slaren reviewed May 27, 2023

View reviewed changes

ggml-cuda.cu Outdated Show resolved Hide resolved

slaren reviewed May 27, 2023

View reviewed changes

ggml-cuda.cu Outdated Show resolved Hide resolved

SlyEcho reviewed May 27, 2023

View reviewed changes

slaren reviewed May 27, 2023

View reviewed changes

ggml-cuda.cu Outdated Show resolved Hide resolved

ggerganov reviewed May 27, 2023

View reviewed changes

ggml.h Outdated Show resolved Hide resolved

SlyEcho added a commit to SlyEcho/llama.cpp that referenced this pull request May 27, 2023

Add more defs

9fdaa1d

For forward compatibility ggerganov#1607

github-actions bot reviewed May 27, 2023

View reviewed changes

llama.cpp Show resolved Hide resolved

JohannesGaessler added 3 commits June 4, 2023 23:09

CUDA op template

071dcd3

ggml_cuda_compute_forward

971920e

Tensor parallelism

4f9640b

JohannesGaessler force-pushed the cuda-refactor-8 branch from 46d4278 to 4f9640b Compare June 4, 2023 21:36

Fixed single GPU performance regression

11af678

JohannesGaessler force-pushed the cuda-refactor-8 branch from 2bd7b54 to 11af678 Compare June 5, 2023 12:39

ggerganov reviewed Jun 5, 2023

View reviewed changes

llama.cpp Show resolved Hide resolved

JohannesGaessler mentioned this pull request Jun 5, 2023

Multi GPU support, CUDA refactor, CUDA scratch buffer #1703

Merged

ymcui mentioned this pull request Jun 6, 2023

llama.cpp如何使用多卡进行推理？ ymcui/Chinese-LLaMA-Alpaca#516

Closed

JohannesGaessler closed this Jun 6, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cuda refactor, multi GPU support #1607

Cuda refactor, multi GPU support #1607

JohannesGaessler commented May 27, 2023 •

edited

Loading

github-actions bot left a comment

ENjoyBlue2021 commented May 27, 2023 •

edited by JohannesGaessler

Loading

SlyEcho commented May 27, 2023

JohannesGaessler commented May 27, 2023 •

edited

Loading

SlyEcho commented May 27, 2023

howard0su May 27, 2023

JohannesGaessler May 27, 2023

JohannesGaessler commented May 27, 2023

SlyEcho May 27, 2023

slaren commented May 27, 2023

JohannesGaessler commented May 27, 2023 •

edited

Loading

github-actions bot left a comment

JohannesGaessler commented May 27, 2023

KerfuffleV2 commented Jun 3, 2023

JohannesGaessler commented Jun 4, 2023

JohannesGaessler commented Jun 5, 2023

KerfuffleV2 commented Jun 5, 2023 •

edited

Loading

JohannesGaessler commented Jun 5, 2023

JohannesGaessler commented Jun 5, 2023

KerfuffleV2 commented Jun 5, 2023

JohannesGaessler commented Jun 5, 2023

JohannesGaessler commented Jun 5, 2023

KerfuffleV2 commented Jun 5, 2023

JohannesGaessler commented Jun 5, 2023

KerfuffleV2 commented Jun 5, 2023

JohannesGaessler commented Jun 5, 2023

ggerganov commented Jun 5, 2023

JohannesGaessler commented Jun 5, 2023 •

edited

Loading

ggerganov commented Jun 5, 2023

JohannesGaessler commented Jun 5, 2023

JohannesGaessler commented Jun 6, 2023

	struct ggml_tensor * get_tensor_for(llama_load_tensor & lt, int layer, ggml_backend backend) {
	struct ggml_tensor * get_tensor_for(llama_load_tensor & lt, ggml_backend backend, int layer = -1) {

Cuda refactor, multi GPU support #1607

Cuda refactor, multi GPU support #1607

Conversation

JohannesGaessler commented May 27, 2023 • edited Loading

For Users

For Developers

github-actions bot left a comment

Choose a reason for hiding this comment

ENjoyBlue2021 commented May 27, 2023 • edited by JohannesGaessler Loading

SlyEcho commented May 27, 2023

JohannesGaessler commented May 27, 2023 • edited Loading

SlyEcho commented May 27, 2023

howard0su May 27, 2023

Choose a reason for hiding this comment

JohannesGaessler May 27, 2023

Choose a reason for hiding this comment

JohannesGaessler commented May 27, 2023

SlyEcho May 27, 2023

Choose a reason for hiding this comment

slaren commented May 27, 2023

JohannesGaessler commented May 27, 2023 • edited Loading

github-actions bot left a comment

Choose a reason for hiding this comment

JohannesGaessler commented May 27, 2023

KerfuffleV2 commented Jun 3, 2023

JohannesGaessler commented Jun 4, 2023

JohannesGaessler commented Jun 5, 2023

KerfuffleV2 commented Jun 5, 2023 • edited Loading

master

this pull

JohannesGaessler commented Jun 5, 2023

JohannesGaessler commented Jun 5, 2023

KerfuffleV2 commented Jun 5, 2023

JohannesGaessler commented Jun 5, 2023

JohannesGaessler commented Jun 5, 2023

KerfuffleV2 commented Jun 5, 2023

-ngl 0

-ngl 33

JohannesGaessler commented Jun 5, 2023

KerfuffleV2 commented Jun 5, 2023

JohannesGaessler commented Jun 5, 2023

ggerganov commented Jun 5, 2023

JohannesGaessler commented Jun 5, 2023 • edited Loading

ggerganov commented Jun 5, 2023

JohannesGaessler commented Jun 5, 2023

JohannesGaessler commented Jun 6, 2023

JohannesGaessler commented May 27, 2023 •

edited

Loading

ENjoyBlue2021 commented May 27, 2023 •

edited by JohannesGaessler

Loading

JohannesGaessler commented May 27, 2023 •

edited

Loading

JohannesGaessler commented May 27, 2023 •

edited

Loading

KerfuffleV2 commented Jun 5, 2023 •

edited

Loading

`master`

`-ngl 0`

`-ngl 33`

JohannesGaessler commented Jun 5, 2023 •

edited

Loading