Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix missing bos token #6050

Merged
merged 2 commits into from
May 27, 2024
Merged

Conversation

belladoreai
Copy link
Contributor

@belladoreai belladoreai commented May 24, 2024

Expected behavior

When the UI checkbox to add bos token is checked, user expects bos token to be added.

Actual behavior

When the UI checkbox to add bos token is checked, some models add bos token, but some models don't add bos token. The (mis)configuration of the tokenizer config file affects whether bos token gets added or not.

Scope

Affects both UI users and API users (in API case tested with API parameter to add bos token rather than UI checkbox).

Affects at least model loaders which use huggingface transformers tokenizer (e.g. ExllamaV2_HF).

I checked various models I had on disk, and about half of Llama 3 models from my disk were affected by this issue. I also tried some Lllama 2 models, none of them were affected.

A few examples of affected models:

I don't know if it makes a difference, but I didn't use the downloader script to download models, I manually downloaded them.

Cause

My understanding is that Meta distributed some misconfigured tokenizer.json files and silently updated them later. Copies of the misconfigured tokenizer.json files are now circulating amongst fine tuners and quantizers.

How are the tokenizer.json files misconfigured? They have the bos token defined in some places, but not in all the places where it's needed. I don't really know what these files are supposed to contain. I looked at one llama 3 tokenizer file which appeared to be working correctly, and I found that it had this definition, which was missing from the misconfigured tokenizer files:

"special_tokens": {
          "<|begin_of_text|>": {
            "id": "<|begin_of_text|>",
            "ids": [
              128000
            ],
            "tokens": [
              "<|begin_of_text|>"
            ]
          }
        }

The way bos token is implemented in TGW is that the transformers tokenizer is called with add_special_tokens=True, so if the bos token is not defined in special_tokens, it won't be added.

Do we need a fix in TGW if it's a model issue?

We need a fix in TGW because we have UI checkbox and API parameter that set user expectations.

Also, many models quality is severely degraded in TGW if it's not fixed (even if it's technically somebody else's fault).

How to fix this

After transformers tokenizer encode, we check that the bos token is added and manually add it if it is not.

Checklist:

@Ph0rk0z
Copy link
Contributor

Ph0rk0z commented May 25, 2024

Oh shit.. is this why this was happening to me? I too observed it in some models and not others and resorted to running in verbose to double check. The BOS being missing made me think verbose wasn't outputting the entire prompt when enabled.

When I would manually add the BOS token, often times it would get removed by the code above your PR.

@oobabooga
Copy link
Owner

That's a very important fix, thank you.

@oobabooga oobabooga merged commit a363cdf into oobabooga:dev May 27, 2024
@TheLounger
Copy link
Contributor

TheLounger commented May 28, 2024

This breaks everything in some situations (eg. using alfred-40B-1023-GGUF with llamacpp_HF), checkbox on or off, doesn't matter. I'm sure it's something small...

Traceback (most recent call last):
  File "/home/lounger/ai/text/webui/installer_files/env/lib/python3.11/site-packages/gradio/queueing.py", line 566, in process_events
    response = await route_utils.call_process_api(
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/lounger/ai/text/webui/installer_files/env/lib/python3.11/site-packages/gradio/route_utils.py", line 261, in call_process_api
    output = await app.get_blocks().process_api(
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/lounger/ai/text/webui/installer_files/env/lib/python3.11/site-packages/gradio/blocks.py", line 1786, in process_api
    result = await self.call_function(
             ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/lounger/ai/text/webui/installer_files/env/lib/python3.11/site-packages/gradio/blocks.py", line 1350, in call_function
    prediction = await utils.async_iteration(iterator)
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/lounger/ai/text/webui/installer_files/env/lib/python3.11/site-packages/gradio/utils.py", line 583, in async_iteration
    return await iterator.__anext__()
           ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/lounger/ai/text/webui/installer_files/env/lib/python3.11/site-packages/gradio/utils.py", line 576, in __anext__
    return await anyio.to_thread.run_sync(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/lounger/ai/text/webui/installer_files/env/lib/python3.11/site-packages/anyio/to_thread.py", line 33, in run_sync
    return await get_asynclib().run_sync_in_worker_thread(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/lounger/ai/text/webui/installer_files/env/lib/python3.11/site-packages/anyio/_backends/_asyncio.py", line 877, in run_sync_in_worker_thread
    return await future
           ^^^^^^^^^^^^
  File "/home/lounger/ai/text/webui/installer_files/env/lib/python3.11/site-packages/anyio/_backends/_asyncio.py", line 807, in run
    result = context.run(func, *args)
             ^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/lounger/ai/text/webui/installer_files/env/lib/python3.11/site-packages/gradio/utils.py", line 559, in run_sync_iterator_async
    return next(iterator)
           ^^^^^^^^^^^^^^
  File "/home/lounger/ai/text/webui/installer_files/env/lib/python3.11/site-packages/gradio/utils.py", line 742, in gen_wrapper
    response = next(iterator)
               ^^^^^^^^^^^^^^
  File "/home/lounger/ai/text/webui/modules/chat.py", line 406, in generate_chat_reply_wrapper
    for i, history in enumerate(generate_chat_reply(text, state, regenerate, _continue, loading_message=True, for_ui=True)):
  File "/home/lounger/ai/text/webui/modules/chat.py", line 374, in generate_chat_reply
    for history in chatbot_wrapper(text, state, regenerate=regenerate, _continue=_continue, loading_message=loading_message, for_ui=for_ui):
  File "/home/lounger/ai/text/webui/modules/chat.py", line 318, in chatbot_wrapper
    prompt = generate_chat_prompt(text, state, **kwargs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/lounger/ai/text/webui/modules/chat.py", line 187, in generate_chat_prompt
    encoded_length = get_encoded_length(prompt)
                     ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/lounger/ai/text/webui/modules/text_generation.py", line 189, in get_encoded_length
    return len(encode(prompt)[0])
               ^^^^^^^^^^^^^^
  File "/home/lounger/ai/text/webui/modules/text_generation.py", line 146, in encode
    bos_tensor = torch.tensor([[shared.tokenizer.bos_token_id]])
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
RuntimeError: Could not infer dtype of NoneType

@belladoreai
Copy link
Contributor Author

Thanks for reporting @TheLounger !

Based on your error log it seems that there are situations where bos_token_id exists but is None? I added a quick fix here: #6061

PoetOnTheRun pushed a commit to PoetOnTheRun/text-generation-webui that referenced this pull request Oct 22, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants