The UI does not set the truncation length on the OpenAI API server. #3153

atisharma · 2023-07-15T11:37:04Z

Problem

The UI does not set the truncation length on the server.
Right now there is no other way for the openai api or its client to know the model size.

Expected behaviour

The truncation_length should:

be set correctly at loading and then be respected (currently the case)
be specified as an API parameter in the request
ideally be returned in the model information so the client can plan appropriately, perhaps at /v1/models?

However, the original OpenAI API does not have this parameter, so I'm not sure setting it as a request parameter is the correct way forward. If not, please close this issue.

Workaround

According to @matatonic in #3049 (comment)_
you can configure your models/config-user.yaml to include a line setting the truncation length for the model (or a pattern).

For example:

.*superhot-8k:
  truncation_length: 8192

Testing

I loaded Lazarus-30b-SuperHOT-8k-GPTQ and Guanaco-33B-SuperHOT-8K-GPTQ with

python server.py \
    --extensions openai \
    --listen \
    --notebook \
    --model_type LLaMA \
    --loader exllama \
    --gpu-split 10,24 \
    --max_seq_len 8192 \
    --alpha_value 4 \
    --model-menu \
    "$@"

and tested in the web UI and via API.

The model using the UI was able to remember a secret password after about 3k tokens. Using the API, requests with over 2k context were handled fine by Lazarus, but I get deranged garbage with Guanaco. That is presumably an issue with the model, not the API.

The text was updated successfully, but these errors were encountered:

matatonic · 2023-07-15T15:59:31Z

Just to provide some addition info for anyone else finding this issue, here is an example from my own config-user.yaml:

# I always load this with the kaiokendev_superhot-30b-8k-no-rlhf-test Lora
# and set compress_pos_emb: 2, max_seq_len: 3072
CalderaAI_30B-Lazarus-GPTQ4bit:
  truncation_length: 3072
.*(7|13)b-.*-superhot-8k:
  truncation_length: 6144
.*(30|33)b-.*-superhot-8k:
  truncation_length: 3072

The major issue is that the built in API and WEBUI all pass truncation_length as a parameter, so it's not an issue for them, but the openai API doesn't have such a thing so we need to rely on the server updating the shared.settings['truncation_length'] value, which only happens when the model settings are loaded. Changing the truncation_length value in the UI has no effect on this server setting.

Ph0rk0z · 2023-07-15T21:19:18Z

I'm confused. OpenAI is it's own thing. Max sequence length already sets the length. It is up to the client to manage context, is it not?

Previously I had issues with --api not allowing longer context and always cutting it to 2048. Now models work fine through silly tavern and the built in chat/notebook.

matatonic · 2023-07-15T21:33:46Z

neither context length nor truncation length are available in the openai api, there is only max_tokens to control how many tokens to generate.

It's a rather poor setup because as a user of the openai api, you can't even use the API to determine the context length, you need to look it up on the model web page and just 'know' it when you select the model in your code.

The text-generation-webui has a value loaded form the model which is set called shared.settings['truncation_length'] but this doesn't get set to anything higher if you use compression and a higher max_seq_length (it probably should), and because the blocking api and webui use client side truncation_length you don't notice it's not updated on the server.

This is exclusively an openai api problem at this point, but the workaround is pretty simple for now, just update your models/config-user.yaml with the truncation length you want.

If this doesn't get fixed another way I'll try another PR soon, the last fix sat for a few weeks and is not mergeable anymore, but I think there is probably a better fix now anyways.

github-actions · 2023-08-14T23:16:08Z

This issue has been closed due to inactivity for 30 days. If you believe it is still relevant, please leave a comment below.

matatonic · 2023-08-27T14:42:31Z

I'm reopening this, it's still an issue with only a workaround so far.

teddybear082 · 2023-09-08T00:14:43Z

It would be great if this could be fixed. I implemented using textgenwebui as a potential option for users of a Skyrim AI NPC mod and this inability to get the proper truncation length working through the openai API is causing issues for them. Only hard coding the value in completions.py of 4096 instead of referring back to settings.setting['truncation_length'] worked to fix the issue. Thanks for all your work on this matonic, hope you or someone else can figure this out and it gets merged.

spreck · 2023-10-02T21:27:57Z

This workaround no longer works. I set the models/config-user.yaml to be:

TheBloke_WizardCoder-Python-13B-V1.0-GPTQ_gptq-4bit-32g-actorder_True$:
  loader: ExLlama_HF
  use_fast: true
  cfg_cache: false
  gpu_split: '12'
  max_seq_len: 16384
  compress_pos_emb: 2
  alpha_value: 1
  rope_freq_base: 1000000
  truncation_length: 8192

But I get this message in the console:

Warning: $This model maximum context length is 8192 tokens. However, your messages resulted in over 784 tokens and max_tokens is 8192.

pythonjohan · 2023-10-29T15:01:23Z

This workaround no longer works. I set the models/config-user.yaml to be:
TheBloke_WizardCoder-Python-13B-V1.0-GPTQ_gptq-4bit-32g-actorder_True$:
  loader: ExLlama_HF
  use_fast: true
  cfg_cache: false
  gpu_split: '12'
  max_seq_len: 16384
  compress_pos_emb: 2
  alpha_value: 1
  rope_freq_base: 1000000
  truncation_length: 8192
But I get this message in the console:

Warning: $This model maximum context length is 8192 tokens. However, your messages resulted in over 784 tokens and max_tokens is 8192.

I couldn't get the config yaml-solution to work but hardcoding the token limit into the openai extension worked for me: #4152 (comment)

matatonic · 2023-10-29T23:56:29Z

This workaround no longer works. I set the models/config-user.yaml to be:
TheBloke_WizardCoder-Python-13B-V1.0-GPTQ_gptq-4bit-32g-actorder_True$:
  loader: ExLlama_HF
  use_fast: true
  cfg_cache: false
  gpu_split: '12'
  max_seq_len: 16384
  compress_pos_emb: 2
  alpha_value: 1
  rope_freq_base: 1000000
  truncation_length: 8192
But I get this message in the console:
Warning: $This model maximum context length is 8192 tokens. However, your messages resulted in over 784 tokens and max_tokens is 8192.
I couldn't get the config yaml-solution to work but hardcoding the token limit into the openai extension worked for me: #4152 (comment)

Yeah, it seems kind of busted now. context length and instruction format are both broken right now afaik.

corallofrancesco mentioned this issue Jul 26, 2023

[extensions/openai] Failure to load instruction-following template for model #3308

Closed

1 task

matatonic mentioned this issue Jul 27, 2023

4K context doesn't work? #3329

Closed

1 task

github-actions bot added the stale label Aug 14, 2023

github-actions bot closed this as completed Aug 14, 2023

This was referenced Aug 27, 2023

openai api extention can use long context model: if the model context is 16k, the api also can use this 16k context #3668

Closed

openai.error.InvalidRequestError: This model maximum context length is 4096 tokens. #3658

Closed

This was referenced Sep 16, 2023

Failing due to misconfigured max context length matatonic/openai-summary#1

Closed

OpenedAI API Issue #3910

Closed

cpacker mentioned this issue Nov 3, 2023

Patch summarize when running with local llms letta-ai/letta#213

Merged

1 task

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

The UI does not set the truncation length on the OpenAI API server. #3153

The UI does not set the truncation length on the OpenAI API server. #3153

atisharma commented Jul 15, 2023 •

edited

Loading

matatonic commented Jul 15, 2023

Ph0rk0z commented Jul 15, 2023 •

edited

Loading

matatonic commented Jul 15, 2023

github-actions bot commented Aug 14, 2023

matatonic commented Aug 27, 2023

teddybear082 commented Sep 8, 2023

spreck commented Oct 2, 2023 •

edited

Loading

pythonjohan commented Oct 29, 2023

matatonic commented Oct 29, 2023

The UI does not set the truncation length on the OpenAI API server. #3153

The UI does not set the truncation length on the OpenAI API server. #3153

Comments

atisharma commented Jul 15, 2023 • edited Loading

Problem

Expected behaviour

Workaround

Testing

matatonic commented Jul 15, 2023

Ph0rk0z commented Jul 15, 2023 • edited Loading

matatonic commented Jul 15, 2023

github-actions bot commented Aug 14, 2023

matatonic commented Aug 27, 2023

teddybear082 commented Sep 8, 2023

spreck commented Oct 2, 2023 • edited Loading

pythonjohan commented Oct 29, 2023

matatonic commented Oct 29, 2023

atisharma commented Jul 15, 2023 •

edited

Loading

Ph0rk0z commented Jul 15, 2023 •

edited

Loading

spreck commented Oct 2, 2023 •

edited

Loading