Align the tokenized result between deepseek coder python model and gguf model #3986

DOGEwbx · 2023-11-08T03:52:31Z

Thanks for your efforts to support convert deepseek coder model according to
#3633
I just found that the tokenized result of the quantinized model is sightly different from the original because deepseek coder use different pretokenizer from the gpt2 style preprocess implement in your code. So I add deepseekcoder process pipeline to your work.
The convert script is modified from strutive07's work.

python convert-deepseek-coder-hf-to-gguf.py <MODEL_PATH> --outfile <OUTPUT_NAME> --vocab-dir <MODEL_PATH> --padvocab
./quantize <OUTPUT_NAME> <OUTPUT_NAME_q4.0> q4_0

I ran above commands and did some test to make sure the tokenized result are the same

DOGEwbx · 2023-11-08T03:54:24Z

I'm still working on transfer the whole pretokenizer pipeline to cpp, hope we could directly read the pretokenize process from tokenizer.json instead of using current hard code solution.

cebtenzzre · 2023-11-08T04:05:30Z

llama.cpp

+            // std::cout<<utf_char<<std::endl;
+            // forward backward lookups
+            const std::string & utf_char_next = (i + 1 < (int)text_utf.size()) ? text_utf[i + 1] : "";
+            const std::string & utf_char_next_next = (i + 2 < (int)text_utf.size()) ? text_utf[i + 2] : "";


I don't think C++ allows you to bind a reference to a temporary std::string (converted from "") like this, and then store it in a variable.
edit: I guess we're doing that too, so it must work well enough. There's quite a lot of copy-pasting going on here...

cebtenzzre · 2023-11-08T04:14:38Z

convert-deepseek-coder-hf-to-gguf.py

I'm sure you don't need to copy-paste the whole convert.py for this. What version of convert.py did you base this on? It's not a very recent one.

I reference #3633 as the convert script

cebtenzzre

This is way too much extra (mostly duplicate) code to maintain for a single model (with only tokenizer changes). You need to find a better way to do this.

Galunid · 2023-11-08T05:07:05Z

I'd suggest you give a look at #3838, that introduces convert-hf-to-gguf.py. It should be relatively easy to just create new class and overwrite set_vocab and maybe set_gguf_parameters methods to do what you need.

cebtenzzre reviewed Nov 8, 2023

View reviewed changes

cebtenzzre requested changes Nov 8, 2023

View reviewed changes

KerfuffleV2 mentioned this pull request Nov 9, 2023

WizardCoder Inference accuracy dropped a lot compared to fastchat or vllm #4001

Closed

DOGEwbx closed this Nov 13, 2023

DOGEwbx force-pushed the master branch from deedd99 to 21fd874 Compare November 13, 2023 03:58

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Align the tokenized result between deepseek coder python model and gguf model #3986

Align the tokenized result between deepseek coder python model and gguf model #3986

DOGEwbx commented Nov 8, 2023 •

edited

Loading

DOGEwbx commented Nov 8, 2023

cebtenzzre Nov 8, 2023 •

edited

Loading

cebtenzzre Nov 8, 2023 •

edited

Loading

DOGEwbx Nov 8, 2023

cebtenzzre left a comment

Galunid commented Nov 8, 2023

Align the tokenized result between deepseek coder python model and gguf model #3986

Align the tokenized result between deepseek coder python model and gguf model #3986

Conversation

DOGEwbx commented Nov 8, 2023 • edited Loading

DOGEwbx commented Nov 8, 2023

cebtenzzre Nov 8, 2023 • edited Loading

Choose a reason for hiding this comment

cebtenzzre Nov 8, 2023 • edited Loading

Choose a reason for hiding this comment

DOGEwbx Nov 8, 2023

Choose a reason for hiding this comment

cebtenzzre left a comment

Choose a reason for hiding this comment

Galunid commented Nov 8, 2023

DOGEwbx commented Nov 8, 2023 •

edited

Loading

cebtenzzre Nov 8, 2023 •

edited

Loading

cebtenzzre Nov 8, 2023 •

edited

Loading