[Feature] Prefix sharing. #53

Duyi-Wang · 2023-11-14T08:52:03Z

Support Llama, chatGLM2, Baichuan, and Opt. Not support chatGLM 1 model.

Signed-off-by: Duyi-Wang <[email protected]>

pujiang2018 · 2023-11-23T05:43:21Z

src/models/kvcache_manager.h

@@ -24,27 +24,41 @@ class KVCacheManager {
        this->layers = layers;
        this->cachedKeys = new KVCacheTensor<KVCacheT>[layers];
        this->cachedValues = new KVCacheTensor<KVCacheT>[layers];
+        this->cachedPrefixKeys = new KVCacheTensor<KVCacheT>[layers];


If prefix_sharing=false, do not need to allocate it (although small memory).
Suggest allocating it when really needed.

pujiang2018 · 2023-11-23T09:42:03Z

src/models/common_decoder.h

+                this->getPositionIds(prefixIDs, batchSize, pastSeqLen, 0);
+
+                free(prefixIDs);
+                ids = newIDs;


any chance to free the ID in future since it is dynamically allocated?

pujiang2018 · 2023-11-23T09:46:46Z

src/models/common_decoder.h

+
+                this->prepareAttnMask(prefixIDs, 0);
+
+                this->getPositionIds(prefixIDs, batchSize, pastSeqLen, 0);


Do we really need to call getPositionIds?

pujiang2018 · 2023-11-23T09:49:56Z

src/layers/attention.h

-                                p[keyLen - 1] * ctx->attFactor);
+                                p[2] * ctx->attFactor, p[strideC - 3] * ctx->attFactor, p[strideC - 2] * ctx->attFactor,
+                                p[strideC - 1] * ctx->attFactor);
+                        // for (int qki = 0; qki < queryLen; qki++) {


if not need, pls remove such commented code.

This is used to print the whole QK score and attention mask matrix.

pujiang2018 · 2023-11-23T11:58:25Z

src/models/common_decoder.h

+                    memcpy(newIDs + inputSeqLen * bs, ids + seqLen * bs + pastSeqLen, inputSeqLen * sizeof(int));
+                }
+
+                this->prepareAttnMask(prefixIDs, 0);


The purpose of this step is?

Duyi-Wang marked this pull request as draft November 14, 2023 08:52

Duyi-Wang added the enhancement New feature or request label Nov 15, 2023

Duyi-Wang added 5 commits November 16, 2023 09:38

add perfix sharing.

6b7a806

Signed-off-by: Duyi-Wang <[email protected]>

Fix seqlen error.

be5a956

fix

60008a9

fix position id for llama

0a4a14d

format attention.h

20237ba

Duyi-Wang force-pushed the prefix branch from 1d2b8ce to 20237ba Compare November 16, 2023 01:47

Duyi-Wang added 11 commits November 17, 2023 15:03

Fix attnmask error

2e3e105

fix chatglm2 attnmask.

e82f564

add torch api

1b65dbf

update debug output

bd7cb84

fix chatGLM2 error

bd94227

add perfix sharing demo.

6c63714

add distribute support.

3ea284e

python script.

f075c27

baichuan support.

bd81dc5

chatglm not support

0e7c83e

opt support

4a21cb8

Duyi-Wang requested a review from pujiang2018 November 22, 2023 08:20

Duyi-Wang marked this pull request as ready for review November 22, 2023 08:20

fix pastseqlen uninitlization

3df9df7

pujiang2018 reviewed Nov 23, 2023

View reviewed changes

Duyi-Wang requested a review from pujiang2018 November 24, 2023 01:59

lazy prefix kvcache and free tmp ids.

d4f6b8d

pujiang2018 approved these changes Nov 28, 2023

View reviewed changes

pujiang2018 merged commit 637eb49 into intel:main Nov 28, 2023
1 check passed

Duyi-Wang deleted the prefix branch November 29, 2023 08:02

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature] Prefix sharing. #53

[Feature] Prefix sharing. #53

Duyi-Wang commented Nov 14, 2023 •

edited

Loading

pujiang2018 Nov 23, 2023

pujiang2018 Nov 23, 2023

pujiang2018 Nov 23, 2023

pujiang2018 Nov 23, 2023

Duyi-Wang Nov 24, 2023

pujiang2018 Nov 23, 2023


		this->prepareAttnMask(prefixIDs, 0);

		this->getPositionIds(prefixIDs, batchSize, pastSeqLen, 0);

[Feature] Prefix sharing. #53

[Feature] Prefix sharing. #53

Conversation

Duyi-Wang commented Nov 14, 2023 • edited Loading

pujiang2018 Nov 23, 2023

Choose a reason for hiding this comment

pujiang2018 Nov 23, 2023

Choose a reason for hiding this comment

pujiang2018 Nov 23, 2023

Choose a reason for hiding this comment

pujiang2018 Nov 23, 2023

Choose a reason for hiding this comment

Duyi-Wang Nov 24, 2023

Choose a reason for hiding this comment

pujiang2018 Nov 23, 2023

Choose a reason for hiding this comment

Duyi-Wang commented Nov 14, 2023 •

edited

Loading