-
Notifications
You must be signed in to change notification settings - Fork 108
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Model always responds in Chinese, ignores system prompts stating to only reply in English #12
Comments
We find that this issue is due to quantization, as we did not find this problem in our bf16 checkpoint. |
This is interesting! Is there anything we can do when quantizing the model to prevent this from occurring? I know that for models used a lot with embeddings, keeping the token-embedding type at f16 while quantizing the rest of the model as normal can help. |
Recommend this quantization method: https://github.com/spcl/QuaRot. Alternatively, do not quantize attention or share experts. |
Another approach is to change the quantization data. The quantization data should be organized in our chat template; by default, it is likely organized in the base format. |
According to your README, the best chat template to use is:
I believe the template I'm using is the same, here it is in Ollama's Modelfile template format (note that the BOS/EOS tokens are automatically added):
|
Here's an example of a popular quantized version of the model: https://huggingface.co/bartowski/DeepSeek-Coder-V2-Lite-Instruct-GGUF?show_file_info=DeepSeek-Coder-V2-Lite-Instruct-Q6_K.gguf Here we see:
and a template of:
However - if the template is provided in the Ollama Modelfile (as in my previous comment) it will be used instead as long as those bos/eos/pad token IDs are correct. |
The chat template of ollama is correct. I mean that during the quantization process, it might be better to organize the data in the form of a chat template. |
We did not encounter this issue during our internal testing of FP16. We will re-test this prompt for you. |
Thank you, that would be very interesting to see. Here's my conversion steps if it helps: #!/usr/bin/env bash
# 0. Clone llama.cpp and install requirements for convert-hf-to-gguf.py
# 1. Download the model from huggingface
# 2. Update the paths below
# 3. Run this script
/path/to/llama.cpp/convert-hf-to-gguf.py \
/models/DeepSeek-Coder-V2-Lite-Instruct \
--outtype f16 \
--outfile DeepSeek-Coder-V2-Lite-Instruct.f16.gguf |
I used this code to test your prompt. It generated almost the same output, except that the English text was translated into Chinese. I need some time to confirm what caused this issue. It might be the process of converting to GGUF. |
Can you try using --outtype bf16? This will help us analyze the issue. Thank you. |
It seems Ollama doesn't support BF16, I can try with llama.cpp but it's getting late here so I'll give it a go tomorrow. -- Also FYI - HF->GGUF convertion logs if its useful:
|
The model crashes llama.cpp when run in bf16:
However, it works in fp16, but outputs at least part of the time in Chinese:
|
the more or less correct template for ollama is:
this way everything is in English. the correct Continue.dev template is:
|
@goodov thank you - you nailed it! After using your updated template - it works flawlessly following instructions to only output English every time.
@jmorganca, I can't see where to submit this as a PR to Ollama, but I believe the official Ollama templates for DeepSeek Coder v2 need to be updated based on goodov's findings. Lastly, @guoday thank you for bearing with me while we figured out what's going on. I had several friends really wanting to use DS Coder v2 that gave up thinking it wasn't good for English - now they're excited to give it a go. |
Closing this off as resolved, I've dropped a post on r/LocalLLaMA and updated a conversation thread on the Ollama Discord with the updated template. |
DeepSeek Coder V2 seems to only respond in Chinese, this occurs even when the system prompt explicitly states to only respond in English:
Still results in Chinese rather than English:
The text was updated successfully, but these errors were encountered: