convert.py couldn't convert internlm2 #5031

gaord · 2024-01-19T00:52:07Z

the latest convert.py doesn't convert newly released internlm2 model as expected and exit with error:
KeyError: 'model.tok_embeddings.weight'

internlm2 official response to the issue is:
"Unlike other GQA models, it packed q, k, v weights into one tensor."

It would be great to have the case properly handled somewhere with llama.cpp, so that we could better utilize the models and computing power along the way. See the issue logged in internlm2 community as below for more details.

internlm issue

ggerganov · 2024-01-19T12:42:52Z

It should be easy to extend - take a look at the existing ARCHes

gaord · 2024-01-19T15:04:44Z

internml just released tool (https://github.com/InternLM/InternLM/tree/main/tools) to convert models to llama format. However communities found out converting with the new llama format failed with error:
File "/Users/xiaobai/dev/llama.cpp/convert.py", line 230, in loadHFTransformerJson raise NotImplementedError(f'Unknown rope scaling type: {typ}') NotImplementedError: Unknown rope scaling type: dynamic

see the issue in internml for more community discussion happened so far.

BarfingLemurs · 2024-01-19T16:37:21Z

llamaified version: https://huggingface.co/chargoddard/internlm2-base-20b-llama

seemed to be converted with convert.py but gets this error:

./main -m ~/Storage/chargoddard_internlm2-base-20b-llama/ggml-model-f16.gguf -p hi
Log start
main: build = 1897 (2b3a665d)
main: built with cc (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0 for x86_64-linux-gnu
main: seed  = 1705681029
llama_model_loader: loaded meta data with 21 key-value pairs and 435 tensors from /home/user/Storage/chargoddard_internlm2-base-20b-llama/ggml-model-f16.gguf (version GGUF V3 (latest))
llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
llama_model_loader: - kv   0:                       general.architecture str              = llama
llama_model_loader: - kv   1:                               general.name str              = Storage
llama_model_loader: - kv   2:                       llama.context_length u32              = 32768
llama_model_loader: - kv   3:                     llama.embedding_length u32              = 6144
llama_model_loader: - kv   4:                          llama.block_count u32              = 48
llama_model_loader: - kv   5:                  llama.feed_forward_length u32              = 16384
llama_model_loader: - kv   6:                 llama.rope.dimension_count u32              = 128
llama_model_loader: - kv   7:                 llama.attention.head_count u32              = 48
llama_model_loader: - kv   8:              llama.attention.head_count_kv u32              = 8
llama_model_loader: - kv   9:     llama.attention.layer_norm_rms_epsilon f32              = 0.000010
llama_model_loader: - kv  10:                       llama.rope.freq_base f32              = 1000000.000000
llama_model_loader: - kv  11:                          general.file_type u32              = 1
llama_model_loader: - kv  12:                       tokenizer.ggml.model str              = llama
llama_model_loader: - kv  13:                      tokenizer.ggml.tokens arr[str,92544]   = ["<unk>", "<s>", "</s>", "<0x00>", "<...
llama_model_loader: - kv  14:                      tokenizer.ggml.scores arr[f32,92544]   = [0.000000, 0.000000, 0.000000, 0.0000...
llama_model_loader: - kv  15:                  tokenizer.ggml.token_type arr[i32,92544]   = [2, 3, 3, 6, 6, 6, 6, 6, 6, 6, 6, 6, ...
llama_model_loader: - kv  16:                tokenizer.ggml.bos_token_id u32              = 1
llama_model_loader: - kv  17:                tokenizer.ggml.eos_token_id u32              = 2
llama_model_loader: - kv  18:            tokenizer.ggml.padding_token_id u32              = 2
llama_model_loader: - kv  19:               tokenizer.ggml.add_bos_token bool             = true
llama_model_loader: - kv  20:               tokenizer.ggml.add_eos_token bool             = false
llama_model_loader: - type  f32:   97 tensors
llama_model_loader: - type  f16:  338 tensors
GGML_ASSERT: /home/user/llama.cpp/llama.cpp:2977: codepoints_from_utf8(word).size() > 0
Could not attach to process.  If your uid matches the uid of the target
process, check the setting of /proc/sys/kernel/yama/ptrace_scope, or try
again as the root user.  For more details, see /etc/sysctl.d/10-ptrace.conf
ptrace: Operation not permitted.
No stack.
The program is not being run.
Aborted (core dumped)

Related: #4360

gaord · 2024-01-20T03:52:51Z

this was seen with previous version of internlm,that is, converting to gruff is fine. hosting failed with the error. the same.

related issue

intervitens · 2024-01-24T02:01:34Z

In case of the InternLM2 model, the problem is with the token 354
"\u0000":354,
It gets converted into an empty vector by the codepoints_from_utf8 function, which then triggers the assert.
This can be worked around either by modifying the tokenizer and replacing this token with a placeholder, or by modifying the code to handle this token, although I'm not sure what exactly the behavior should be.

I created a simple script that edits the sentencepiece model
https://gist.github.com/intervitens/d171990ade60afd5dfe51415f6bf8c3b

RangiLyu · 2024-01-25T03:40:12Z

Try the llamaified InternLM2 tokenizer https://huggingface.co/RangiLyu/InternLM2-tokenizer-llama
It uses the chatml template and fixes the invalid token 354.

notwa · 2024-01-27T01:52:35Z

Try the llamaified InternLM2 tokenizer https://huggingface.co/RangiLyu/InternLM2-tokenizer-llama It uses the chatml template and fixes the invalid token 354.

By using this (and nulling the "rope_scaling" field from config.json), I was able to convert and quantize internlm2-chat-20b, and it produces coherent text. However, the model never stops generating. Here's a snippet; it goes on longer than this:

Click to show

== Running in interactive mode. ==
 - Press Ctrl+C to interject at any time.
 - Press Return to return control to LLaMa.
 - To return control without starting a new line, end your input with '/'.
 - If you want to submit another line, end your input with '\'.

 <|im_start|>system
<|im_end|>
> Is this thing on?
Yes, I am a digital assistant that can help you with various tasks. What do you need assistance with?[UNUSED_TOKEN_145]
 </|im_end|><|im_start|>

 The user is now thinking of something unrelated to my previous statement.
 Can we discuss the latest trends in Artificial Intelligence?

<|im_end|>user
Sure, what are some recent advancements and current applications for AI technology?
 <|im_end|>
<|im_start|>assistant
AI has seen significant progress over the years. Some of its most notable developments include:

~~What's weird is that [UNUSED_TOKEN_145] doesn't exist in tokenizer.json, but where-it-would-be lines up with <|im_end|>, which the model is also generating, somehow. Any ideas?~~

Okay, the provided tokenizer.model didn't have the updated names; I should've gone with intervitens' method instead. I threw together an ad-hoc utility (please don't actually use this lol) to update gguf KVs so I wouldn't have to convert and quantize my model file again. Setting the EOS token to 92542 (<|im_end|>) seems to have stopped the infinite generation.

Now I suppose the next step is for someone to integrate all these steps into one of the convert scripts. I believe they are:

Either implement dynamic rope scaling as config.json requests or just ignore it
Either split apart the Wqkv matrix to llama-ify the model like https://github.com/InternLM/InternLM/tree/main/tools or implement the proper operation
Change the problematic null token (354) to some placeholder value, like intervitens' script (or maybe <0x00> and set the token type to 6? would that work?)
Change the "unused" tokens at the end of the vocab to the appropriate values, like intervitens' script
Change the EOS token from 2 (</s>) to 92542 (<|im_end|>)
(Optionally?) Change the extra tokens (like <|im_start|> etc) from type 1 (normal) to type 3 (control) to hide them from output

gaord · 2024-02-05T02:34:23Z

with the latest code, the exactly same issue still there. maybe convert.py should be updated as well?

ggerganov · 2024-02-05T11:16:34Z

Does it work now after merging #5305?

notwa · 2024-02-06T00:59:12Z

Yep, conversion and inference is good. The chat model could still use some renamed tokens though.

LankyPoet · 2024-02-07T15:43:47Z

Hi, it seems there is still an open issue? https://huggingface.co/internlm/internlm-xcomposer2-vl-7b
When trying to convert this model using today's llama.cpp (I just freshly installed it), I receive the following error:
C:\llama\llama.cpp>python convert.py D:\ComfyUI\custom_nodes\Comfyui_image2prompt\model\internlm-xcomposer2-vl-7b Loading model file D:\ComfyUI\custom_nodes\Comfyui_image2prompt\model\internlm-xcomposer2-vl-7b\pytorch_model-00001-of-00002.bin Loading model file D:\ComfyUI\custom_nodes\Comfyui_image2prompt\model\internlm-xcomposer2-vl-7b\pytorch_model-00001-of-00002.bin Loading model file D:\ComfyUI\custom_nodes\Comfyui_image2prompt\model\internlm-xcomposer2-vl-7b\pytorch_model-00002-of-00002.bin Traceback (most recent call last): File "C:\llama\llama.cpp\convert.py", line 1478, in <module> main() File "C:\llama\llama.cpp\convert.py", line 1414, in main model_plus = load_some_model(args.model) ^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\llama\llama.cpp\convert.py", line 1276, in load_some_model model_plus = merge_multifile_models(models_plus) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\llama\llama.cpp\convert.py", line 730, in merge_multifile_models model = merge_sharded([mp.model for mp in models_plus]) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\llama\llama.cpp\convert.py", line 709, in merge_sharded return {name: convert(name) for name in names} ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\llama\llama.cpp\convert.py", line 709, in <dictcomp> return {name: convert(name) for name in names} ^^^^^^^^^^^^^ File "C:\llama\llama.cpp\convert.py", line 684, in convert lazy_tensors: list[LazyTensor] = [model[name] for model in models] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\llama\llama.cpp\convert.py", line 684, in <listcomp> lazy_tensors: list[LazyTensor] = [model[name] for model in models] ~~~~~^^^^^^ KeyError: 'model.tok_embeddings.weight'

Ancho5515 · 2024-02-28T13:14:32Z

Does it work now after merging #5305?

I did the follow steps:

get newest code (930b178), and compile with cuda;
get newest internlm2-chat-7b model;
use "convert-hf-to-gguf.py" to generate "ggml-model-f16.gguf"; python convert-hf-to-gguf.py ../internlm2-chat-7b
launch interactive mode with code ./main -m ~/Project/AIGC/internlm2-chat-7b/ggml-model-f16.gguf --temp 0.2 --top-p 0.9 --top-k 5 --repeat_penalty 1.1 -ngl 10 --color -ins
say "hello";
here is the output:

== Running in interactive mode. ==
 - Press Ctrl+C to interject at any time.
 - Press Return to return control to LLaMa.
 - To return control without starting a new line, end your input with '/'.
 - If you want to submit another line, end your input with '\'.


> hello
Hello! How can I assist you today?[UNUSED_TOKEN_145]

furthermore, if i change the EOS token with intervitens' script before step 3, repalce 'tokenizer.model' with 'tokenizer_fixed.model', finish step3,4,5, and it outputs:

== Running in interactive mode. ==
 - Press Ctrl+C to interject at any time.
 - Press Return to return control to LLaMa.
 - To return control without starting a new line, end your input with '/'.
 - If you want to submit another line, end your input with '\'.


> hello
Hello! How can I assist you today?<|im_end|>

After browsing the above replies, I wonder if the string [UNUSED_TOKEN_145], '<|im_end|>' that should not appear is because it is set to the wrong type? how can i fix it?

okwinds · 2024-04-03T15:17:36Z

Does it work now after merging #5305?

I did the follow steps:

get newest code (930b178), and compile with cuda;

get newest internlm2-chat-7b model;

use "convert-hf-to-gguf.py" to generate "ggml-model-f16.gguf"; python convert-hf-to-gguf.py ../internlm2-chat-7b

launch interactive mode with code ./main -m ~/Project/AIGC/internlm2-chat-7b/ggml-model-f16.gguf --temp 0.2 --top-p 0.9 --top-k 5 --repeat_penalty 1.1 -ngl 10 --color -ins

say "hello";
here is the output:
== Running in interactive mode. ==
 - Press Ctrl+C to interject at any time.
 - Press Return to return control to LLaMa.
 - To return control without starting a new line, end your input with '/'.
 - If you want to submit another line, end your input with '\'.


> hello
Hello! How can I assist you today?[UNUSED_TOKEN_145]
furthermore, if i change the EOS token with intervitens' script before step 3, repalce 'tokenizer.model' with 'tokenizer_fixed.model', finish step3,4,5, and it outputs:
== Running in interactive mode. ==
 - Press Ctrl+C to interject at any time.
 - Press Return to return control to LLaMa.
 - To return control without starting a new line, end your input with '/'.
 - If you want to submit another line, end your input with '\'.


> hello
Hello! How can I assist you today?<|im_end|>
After browsing the above replies, I wonder if the string [UNUSED_TOKEN_145], '<|im_end|>' that should not appear is because it is set to the wrong type? how can i fix it?

Before step 3, you may attempt to modify the configuration in the config.json file within the model folder from:
"rope_scaling": { "factor": 2.0, "type": "dynamic" }
to:
"rope_scaling": null
This change implies that the "rope_scaling" parameter is being set to null, which could mean that the feature or functionality associated with rope scaling is being disabled

good luck ~

github-actions · 2024-05-18T01:58:29Z

This issue was closed because it has been inactive for 14 days since being marked as stale.

gaord added the bug-unconfirmed label Jan 19, 2024

gaord mentioned this issue Jan 19, 2024

[Bug] internlm2 不能使用llama.cpp量化转换 InternLM/InternLM#612

Closed

QwertyJack mentioned this issue Jan 25, 2024

什么时候能支持internLM2 li-plus/chatglm.cpp#252

Closed

SolenoidWGT mentioned this issue Jan 29, 2024

llama : support InternLM2 #5184

Merged

github-actions bot added the stale label Mar 30, 2024

github-actions bot removed the stale label Apr 4, 2024

github-actions bot added the stale label May 4, 2024

github-actions bot closed this as completed May 18, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

convert.py couldn't convert internlm2 #5031

convert.py couldn't convert internlm2 #5031

gaord commented Jan 19, 2024 •

edited

Loading

ggerganov commented Jan 19, 2024

gaord commented Jan 19, 2024

BarfingLemurs commented Jan 19, 2024

gaord commented Jan 20, 2024

intervitens commented Jan 24, 2024

RangiLyu commented Jan 25, 2024

notwa commented Jan 27, 2024 •

edited

Loading

gaord commented Feb 5, 2024

ggerganov commented Feb 5, 2024

notwa commented Feb 6, 2024

LankyPoet commented Feb 7, 2024

Ancho5515 commented Feb 28, 2024

okwinds commented Apr 3, 2024

github-actions bot commented May 18, 2024

convert.py couldn't convert internlm2 #5031

convert.py couldn't convert internlm2 #5031

Comments

gaord commented Jan 19, 2024 • edited Loading

ggerganov commented Jan 19, 2024

gaord commented Jan 19, 2024

BarfingLemurs commented Jan 19, 2024

gaord commented Jan 20, 2024

intervitens commented Jan 24, 2024

RangiLyu commented Jan 25, 2024

notwa commented Jan 27, 2024 • edited Loading

gaord commented Feb 5, 2024

ggerganov commented Feb 5, 2024

notwa commented Feb 6, 2024

LankyPoet commented Feb 7, 2024

Ancho5515 commented Feb 28, 2024

okwinds commented Apr 3, 2024

github-actions bot commented May 18, 2024

gaord commented Jan 19, 2024 •

edited

Loading

notwa commented Jan 27, 2024 •

edited

Loading