openai api extention can use long context model: if the model context is 16k, the api also can use this 16k context #3668

elven2016 · 2023-08-24T06:47:21Z

Description
when use long context model, the api only support 4k max tokens, 16kax tokens is need ,please add this feature,

A clear and concise description of what you want to be implemented.

Additional Context

If applicable, please provide any extra information, external links, or screenshots that could be useful.

matatonic · 2023-08-27T14:51:43Z

See #3153 for a workaround with openai

elven2016 · 2023-08-28T04:10:13Z

See #3153 for a workaround with openai

thanks for your reply， i change the config.yml set the truncate_length=8192 ,the context seems to be actvitated ,but the completion call get error:

matatonic · 2023-08-28T11:05:44Z

Can you include the server logs for this error? it should have a full stack trace. ideally, please enable OPENEDAI_DEBUG=1 Environment variable too.

…

On Mon, Aug 28, 2023, 1:40 a.m. elven2016 ***@***.***> wrote: See #3153 <#3153> for a workaround with openai thanks for your reply， i change the config.yml set the truncate_length=8192 ,the context seems to be actvitated ,but the completion call get error: [image: image] <https://user-images.githubusercontent.com/16677082/263590696-8791d2fc-437f-4e7b-87b4-64f797459fda.png> — Reply to this email directly, view it on GitHub <#3668 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ARO7ETIKDO635V6OHIKW4P3XXQK3BANCNFSM6AAAAAA34Q2ACM> . You are receiving this because you commented.Message ID: ***@***.***>

elven2016 · 2023-08-30T07:56:10Z

Can you include the server logs for this error? it should have a full stack trace. ideally, please enable OPENEDAI_DEBUG=1 Environment variable too.
…
On Mon, Aug 28, 2023, 1:40 a.m. elven2016 @.> wrote: See #3153 <#3153> for a workaround with openai thanks for your reply， i change the config.yml set the truncate_length=8192 ,the context seems to be actvitated ,but the completion call get error: [image: image] https://user-images.githubusercontent.com/16677082/263590696-8791d2fc-437f-4e7b-87b4-64f797459fda.png — Reply to this email directly, view it on GitHub <#3668 (comment)>, or unsubscribe https://github.com/notifications/unsubscribe-auth/ARO7ETIKDO635V6OHIKW4P3XXQK3BANCNFSM6AAAAAA34Q2ACM . You are receiving this because you commented.Message ID: @.>

the log prints below：
Traceback (most recent call last):
File "/home/elven/miniconda3/envs/finreport/lib/python3.10/site-packages/streamlit/runtime/scriptrunner/script_runner.py", line 552, in _run_script
exec(code, module.dict)
File "/home/elven/finreport/finapp/app.py", line 182, in
main()
File "/home/elven/finreport/finapp/app.py", line 120, in main
handle_userinput(user_question)
File "/home/elven/finreport/finapp/app.py", line 82, in handle_userinput
response = st.session_state.conversation({'question': user_question})
File "/home/elven/miniconda3/envs/finreport/lib/python3.10/site-packages/langchain/chains/base.py", line 282, in call
raise e
File "/home/elven/miniconda3/envs/finreport/lib/python3.10/site-packages/langchain/chains/base.py", line 276, in call
self._call(inputs, run_manager=run_manager)
File "/home/elven/miniconda3/envs/finreport/lib/python3.10/site-packages/langchain/chains/conversational_retrieval/base.py", line 141, in _call
answer = self.combine_docs_chain.run(
File "/home/elven/miniconda3/envs/finreport/lib/python3.10/site-packages/langchain/chains/base.py", line 480, in run
return self(kwargs, callbacks=callbacks, tags=tags, metadata=metadata)[
File "/home/elven/miniconda3/envs/finreport/lib/python3.10/site-packages/langchain/chains/base.py", line 282, in call
raise e
File "/home/elven/miniconda3/envs/finreport/lib/python3.10/site-packages/langchain/chains/base.py", line 276, in call
self._call(inputs, run_manager=run_manager)
File "/home/elven/miniconda3/envs/finreport/lib/python3.10/site-packages/langchain/chains/combine_documents/base.py", line 105, in _call
output, extra_return_dict = self.combine_docs(
File "/home/elven/miniconda3/envs/finreport/lib/python3.10/site-packages/langchain/chains/combine_documents/stuff.py", line 171, in combine_docs
return self.llm_chain.predict(callbacks=callbacks, **inputs), {}
File "/home/elven/miniconda3/envs/finreport/lib/python3.10/site-packages/langchain/chains/llm.py", line 255, in predict
return self(kwargs, callbacks=callbacks)[self.output_key]
File "/home/elven/miniconda3/envs/finreport/lib/python3.10/site-packages/langchain/chains/base.py", line 282, in call
raise e
File "/home/elven/miniconda3/envs/finreport/lib/python3.10/site-packages/langchain/chains/base.py", line 276, in call
self._call(inputs, run_manager=run_manager)
File "/home/elven/miniconda3/envs/finreport/lib/python3.10/site-packages/langchain/chains/llm.py", line 91, in _call
response = self.generate([inputs], run_manager=run_manager)
File "/home/elven/miniconda3/envs/finreport/lib/python3.10/site-packages/langchain/chains/llm.py", line 101, in generate
return self.llm.generate_prompt(
File "/home/elven/miniconda3/envs/finreport/lib/python3.10/site-packages/langchain/chat_models/base.py", line 414, in generate_prompt
return self.generate(prompt_messages, stop=stop, callbacks=callbacks, **kwargs)
File "/home/elven/miniconda3/envs/finreport/lib/python3.10/site-packages/langchain/chat_models/base.py", line 309, in generate
raise e
File "/home/elven/miniconda3/envs/finreport/lib/python3.10/site-packages/langchain/chat_models/base.py", line 299, in generate
self._generate_with_cache(
File "/home/elven/miniconda3/envs/finreport/lib/python3.10/site-packages/langchain/chat_models/base.py", line 446, in _generate_with_cache
return self._generate(
File "/home/elven/miniconda3/envs/finreport/lib/python3.10/site-packages/langchain/chat_models/openai.py", line 345, in _generate
response = self.completion_with_retry(
File "/home/elven/miniconda3/envs/finreport/lib/python3.10/site-packages/langchain/chat_models/openai.py", line 278, in completion_with_retry
return _completion_with_retry(**kwargs)
File "/home/elven/miniconda3/envs/finreport/lib/python3.10/site-packages/tenacity/init.py", line 289, in wrapped_f
return self(f, *args, **kw)
File "/home/elven/miniconda3/envs/finreport/lib/python3.10/site-packages/tenacity/init.py", line 379, in call
do = self.iter(retry_state=retry_state)
File "/home/elven/miniconda3/envs/finreport/lib/python3.10/site-packages/tenacity/init.py", line 325, in iter
raise retry_exc.reraise()
File "/home/elven/miniconda3/envs/finreport/lib/python3.10/site-packages/tenacity/init.py", line 158, in reraise
raise self.last_attempt.result()
File "/home/elven/miniconda3/envs/finreport/lib/python3.10/concurrent/futures/_base.py", line 451, in result
return self.__get_result()
File "/home/elven/miniconda3/envs/finreport/lib/python3.10/concurrent/futures/_base.py", line 403, in __get_result
raise self._exception
File "/home/elven/miniconda3/envs/finreport/lib/python3.10/site-packages/tenacity/init.py", line 382, in call
result = fn(*args, **kwargs)
File "/home/elven/miniconda3/envs/finreport/lib/python3.10/site-packages/langchain/chat_models/openai.py", line 276, in _completion_with_retry
return self.client.create(**kwargs)
File "/home/elven/miniconda3/envs/finreport/lib/python3.10/site-packages/openai/api_resources/chat_completion.py", line 25, in create
return super().create(args, **kwargs)
File "/home/elven/miniconda3/envs/finreport/lib/python3.10/site-packages/openai/api_resources/abstract/engine_api_resource.py", line 153, in create
response, _, api_key = requestor.request(
File "/home/elven/miniconda3/envs/finreport/lib/python3.10/site-packages/openai/api_requestor.py", line 298, in request
resp, got_stream = self._interpret_response(result, stream)
File "/home/elven/miniconda3/envs/finreport/lib/python3.10/site-packages/openai/api_requestor.py", line 700, in _interpret_response
self._interpret_response_line(
File "/home/elven/miniconda3/envs/finreport/lib/python3.10/site-packages/openai/api_requestor.py", line 765, in _interpret_response_line
raise self.handle_error_response(
openai.error.APIError: UnboundLocalError("local variable 'tokens' referenced before assignment") {"error": {"message": "UnboundLocalError("local variable 'tokens' referenced before assignment")", "code": 500, "type": "OpenAIError", "param": ""}} 500 {'error': {'message': 'UnboundLocalError("local variable 'tokens' referenced before assignment")', 'code': 500, 'type': 'OpenAIError', 'param': ''}} {'Connection': 'close', 'Content-Length': '150', 'Access-Control-Allow-Credentials': 'true', 'Access-Control-Allow-Headers': 'Origin, Accept, X-Requested-With, Content-Type, Access-Control-Request-Method, Access-Control-Request-Headers, Authorization', 'Access-Control-Allow-Methods': 'GET,HEAD,OPTIONS,POST,PUT', 'Access-Control-Allow-Origin': '', 'Content-Type': 'application/json', 'Date': 'Mon, 28 Aug 2023 08:14:40 GMT', 'Server': 'BaseHTTP/0.6 Python/3.10.10'}

matatonic · 2023-08-30T10:14:09Z

I meant the error logs from the server with OPENEDAI_DEBUG=1 set, can you provide that?

…

On Wed, Aug 30, 2023, 5:26 a.m. elven2016 ***@***.***> wrote: Can you include the server logs for this error? it should have a full stack trace. ideally, please enable OPENEDAI_DEBUG=1 Environment variable too. … <#m_-5870151089720743706_> On Mon, Aug 28, 2023, 1:40 a.m. elven2016 *@*.*> wrote: See #3153 <#3153> <#3153 <#3153>> for a workaround with openai thanks for your reply， i change the config.yml set the truncate_length=8192 ,the context seems to be actvitated ,but the completion call get error: [image: image] https://user-images.githubusercontent.com/16677082/263590696-8791d2fc-437f-4e7b-87b4-64f797459fda.png <https://user-images.githubusercontent.com/16677082/263590696-8791d2fc-437f-4e7b-87b4-64f797459fda.png> — Reply to this email directly, view it on GitHub <#3668 (comment) <#3668 (comment)>>, or unsubscribe https://github.com/notifications/unsubscribe-auth/ARO7ETIKDO635V6OHIKW4P3XXQK3BANCNFSM6AAAAAA34Q2ACM <https://github.com/notifications/unsubscribe-auth/ARO7ETIKDO635V6OHIKW4P3XXQK3BANCNFSM6AAAAAA34Q2ACM> . You are receiving this because you commented.Message ID: @.*> the log prints below： Traceback (most recent call last): File "/home/elven/miniconda3/envs/finreport/lib/python3.10/site-packages/streamlit/runtime/scriptrunner/script_runner.py", line 552, in _run_script exec(code, module.*dict*) File "/home/elven/finreport/finapp/app.py", line 182, in main() File "/home/elven/finreport/finapp/app.py", line 120, in main handle_userinput(user_question) File "/home/elven/finreport/finapp/app.py", line 82, in handle_userinput response = st.session_state.conversation({'question': user_question}) File "/home/elven/miniconda3/envs/finreport/lib/python3.10/site-packages/langchain/chains/base.py", line 282, in *call* raise e File "/home/elven/miniconda3/envs/finreport/lib/python3.10/site-packages/langchain/chains/base.py", line 276, in *call* self._call(inputs, run_manager=run_manager) File "/home/elven/miniconda3/envs/finreport/lib/python3.10/site-packages/langchain/chains/conversational_retrieval/base.py", line 141, in _call answer = self.combine_docs_chain.run( File "/home/elven/miniconda3/envs/finreport/lib/python3.10/site-packages/langchain/chains/base.py", line 480, in run return self(kwargs, callbacks=callbacks, tags=tags, metadata=metadata)[ File "/home/elven/miniconda3/envs/finreport/lib/python3.10/site-packages/langchain/chains/base.py", line 282, in *call* raise e File "/home/elven/miniconda3/envs/finreport/lib/python3.10/site-packages/langchain/chains/base.py", line 276, in *call* self._call(inputs, run_manager=run_manager) File "/home/elven/miniconda3/envs/finreport/lib/python3.10/site-packages/langchain/chains/combine_documents/base.py", line 105, in _call output, extra_return_dict = self.combine_docs( File "/home/elven/miniconda3/envs/finreport/lib/python3.10/site-packages/langchain/chains/combine_documents/stuff.py", line 171, in combine_docs return self.llm_chain.predict(callbacks=callbacks, **inputs), {} File "/home/elven/miniconda3/envs/finreport/lib/python3.10/site-packages/langchain/chains/llm.py", line 255, in predict return self(kwargs, callbacks=callbacks)[self.output_key] File "/home/elven/miniconda3/envs/finreport/lib/python3.10/site-packages/langchain/chains/base.py", line 282, in *call* raise e File "/home/elven/miniconda3/envs/finreport/lib/python3.10/site-packages/langchain/chains/base.py", line 276, in *call* self._call(inputs, run_manager=run_manager) File "/home/elven/miniconda3/envs/finreport/lib/python3.10/site-packages/langchain/chains/llm.py", line 91, in _call response = self.generate([inputs], run_manager=run_manager) File "/home/elven/miniconda3/envs/finreport/lib/python3.10/site-packages/langchain/chains/llm.py", line 101, in generate return self.llm.generate_prompt( File "/home/elven/miniconda3/envs/finreport/lib/python3.10/site-packages/langchain/chat_models/base.py", line 414, in generate_prompt return self.generate(prompt_messages, stop=stop, callbacks=callbacks, **kwargs) File "/home/elven/miniconda3/envs/finreport/lib/python3.10/site-packages/langchain/chat_models/base.py", line 309, in generate raise e File "/home/elven/miniconda3/envs/finreport/lib/python3.10/site-packages/langchain/chat_models/base.py", line 299, in generate self._generate_with_cache( File "/home/elven/miniconda3/envs/finreport/lib/python3.10/site-packages/langchain/chat_models/base.py", line 446, in _generate_with_cache return self._generate( File "/home/elven/miniconda3/envs/finreport/lib/python3.10/site-packages/langchain/chat_models/openai.py", line 345, in _generate response = self.completion_with_retry( File "/home/elven/miniconda3/envs/finreport/lib/python3.10/site-packages/langchain/chat_models/openai.py", line 278, in completion_with_retry return _completion_with_retry(**kwargs) File "/home/elven/miniconda3/envs/finreport/lib/python3.10/site-packages/tenacity/ *init*.py", line 289, in wrapped_f return self(f, *args, **kw) File "/home/elven/miniconda3/envs/finreport/lib/python3.10/site-packages/tenacity/ *init*.py", line 379, in *call* do = self.iter(retry_state=retry_state) File "/home/elven/miniconda3/envs/finreport/lib/python3.10/site-packages/tenacity/ *init*.py", line 325, in iter raise retry_exc.reraise() File "/home/elven/miniconda3/envs/finreport/lib/python3.10/site-packages/tenacity/ *init*.py", line 158, in reraise raise self.last_attempt.result() File "/home/elven/miniconda3/envs/finreport/lib/python3.10/concurrent/futures/_base.py", line 451, in result return self.__get_result() File "/home/elven/miniconda3/envs/finreport/lib/python3.10/concurrent/futures/_base.py", line 403, in __get_result raise self._exception File "/home/elven/miniconda3/envs/finreport/lib/python3.10/site-packages/tenacity/ *init*.py", line 382, in *call* result = fn(*args, **kwargs) File "/home/elven/miniconda3/envs/finreport/lib/python3.10/site-packages/langchain/chat_models/openai.py", line 276, in _completion_with_retry return self.client.create(**kwargs) File "/home/elven/miniconda3/envs/finreport/lib/python3.10/site-packages/openai/api_resources/chat_completion.py", line 25, in create return super().create( *args, **kwargs) File "/home/elven/miniconda3/envs/finreport/lib/python3.10/site-packages/openai/api_resources/abstract/engine_api_resource.py", line 153, in create response, _, api_key = requestor.request( File "/home/elven/miniconda3/envs/finreport/lib/python3.10/site-packages/openai/api_requestor.py", line 298, in request resp, got_stream = self._interpret_response(result, stream) File "/home/elven/miniconda3/envs/finreport/lib/python3.10/site-packages/openai/api_requestor.py", line 700, in _interpret_response self._interpret_response_line( File "/home/elven/miniconda3/envs/finreport/lib/python3.10/site-packages/openai/api_requestor.py", line 765, in _interpret_response_line raise self.handle_error_response( openai.error.APIError: UnboundLocalError("local variable 'tokens' referenced before assignment") {"error": {"message": "UnboundLocalError("local variable 'tokens' referenced before assignment")", "code": 500, "type": "OpenAIError", "param": ""}} 500 {'error': {'message': 'UnboundLocalError("local variable 'tokens' referenced before assignment")', 'code': 500, 'type': 'OpenAIError', 'param': ''}} {'Connection': 'close', 'Content-Length': '150', 'Access-Control-Allow-Credentials': 'true', 'Access-Control-Allow-Headers': 'Origin, Accept, X-Requested-With, Content-Type, Access-Control-Request-Method, Access-Control-Request-Headers, Authorization', 'Access-Control-Allow-Methods': 'GET,HEAD,OPTIONS,POST,PUT', 'Access-Control-Allow-Origin': '*', 'Content-Type': 'application/json', 'Date': 'Mon, 28 Aug 2023 08:14:40 GMT', 'Server': 'BaseHTTP/0.6 Python/3.10.10'} — Reply to this email directly, view it on GitHub <#3668 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ARO7ETMPQQL5YPJOYYWFMQ3XX3W2LANCNFSM6AAAAAA34Q2ACM> . You are receiving this because you commented.Message ID: ***@***.***>

elven2016 · 2023-08-31T03:55:30Z

ignore the message output，the error prints seems it's the issue of context too long and the cuda oom，the GPU memory seems is enough free

127.0.0.1 - - [31/Aug/2023 15:03:02] "POST /v1/chat/completions HTTP/1.1" 500 -
b'{"error": {"message": "UnboundLocalError(\"local variable 'tokens' referenced before assignment\")", "code": 500, "type": "OpenAIError", "param": ""}}'
POST /v1/chat/completions HTTP/1.1
Host: 127.0.0.1:5001
User-Agent: OpenAI/v1 PythonBindings/0.27.9
Content-Length: 27513
Accept: /
Accept-Encoding: gzip, deflate
Authorization: Bearer sk-111111111111111111111111111111111111111111111111
Content-Type: application/json
X-Openai-Client-User-Agent: {"bindings_version": "0.27.9", "httplib": "requests", "lang": "python", "lang_version": "3.10.10", "platform": "Linux-5.19.0-1010-nvidia-lowlatency-x86_64-with-glibc2.35", "publisher": "openai", "uname": "Linux 5.19.0-1010-nvidia-lowlatency #10-Ubuntu SMP PREEMPT_DYNAMIC Wed Apr 26 00:40:27 UTC 2023 x86_64 x86_64"}

{'messages': [{'role': 'system', 'content': "Use the following pieces of context to answer the users question. \nIf you don't know the answer, just say that you don't know, don't try to make up an answer.\n----------------\n XXXXXX（igonre the long content）'}], 'model': 'chatglm2-6b', 'max_tokens': None, 'stream': False, 'n': 1, 'temperature': 0.0}
Loaded instruction role format: Vicuna-v1.1
Warning: $This model maximum context length is 16384 tokens. However, your messages resulted in over 7420 tokens and max_tokens is 16384.
{'prompt': "Use the following pieces of context to answer the users question. \nIf you don't know the answer, just say that you don't know, don't try to make up an answer.\n----------------\n XXXXXX（igonre the long content）
。\nA chat between a curious user and an artificial intelligence assistant. The assistant gives helpful, detailed, and polite answers to the user's questions.\n\nUSER: 开山集团在地热能源方面做了哪些工作？\nASSISTANT:", 'req_params': {'max_new_tokens': 8964, 'auto_max_new_tokens': False, 'max_tokens_second': 0, 'temperature': 0.01, 'top_p': 1.0, 'top_k': 20, 'repetition_penalty': 1.18, 'repetition_penalty_range': 0, 'encoder_repetition_penalty': 1.0, 'suffix': None, 'stream': False, 'echo': False, 'seed': -1, 'truncation_length': 16384, 'add_bos_token': True, 'do_sample': True, 'typical_p': 1.0, 'epsilon_cutoff': 0.0, 'eta_cutoff': 0.0, 'tfs': 1.0, 'top_a': 0.0, 'min_length': 0, 'no_repeat_ngram_size': 0, 'num_beams': 1, 'penalty_alpha': 0.0, 'length_penalty': 1.0, 'early_stopping': False, 'mirostat_mode': 0, 'mirostat_tau': 5.0, 'mirostat_eta': 0.1, 'guidance_scale': 1, 'negative_prompt': '', 'ban_eos_token': False, 'skip_special_tokens': True, 'custom_stopping_strings': ''}}
Traceback (most recent call last):
File "/ssd_data01/text-generation-webui/modules/callbacks.py", line 56, in gentask
ret = self.mfunc(callback=_callback, *args, **self.kwargs)
File "/ssd_data01/text-generation-webui/modules/text_generation.py", line 321, in generate_with_callback
shared.model.generate(**kwargs)
File "/home/elven/miniconda3/envs/tgweb/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "/home/elven/miniconda3/envs/tgweb/lib/python3.10/site-packages/transformers/generation/utils.py", line 1642, in generate
return self.sample(
File "/home/elven/miniconda3/envs/tgweb/lib/python3.10/site-packages/transformers/generation/utils.py", line 2724, in sample
outputs = self(
File "/home/elven/miniconda3/envs/tgweb/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/home/elven/miniconda3/envs/tgweb/lib/python3.10/site-packages/accelerate/hooks.py", line 165, in new_forward
output = old_forward(*args, **kwargs)
File "/home/elven/miniconda3/envs/tgweb/lib/python3.10/site-packages/transformers/models/llama/modeling_llama.py", line 809, in forward
outputs = self.model(
File "/home/elven/miniconda3/envs/tgweb/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/home/elven/miniconda3/envs/tgweb/lib/python3.10/site-packages/transformers/models/llama/modeling_llama.py", line 697, in forward
layer_outputs = decoder_layer(
File "/home/elven/miniconda3/envs/tgweb/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/home/elven/miniconda3/envs/tgweb/lib/python3.10/site-packages/accelerate/hooks.py", line 165, in new_forward
output = old_forward(*args, **kwargs)
File "/home/elven/miniconda3/envs/tgweb/lib/python3.10/site-packages/transformers/models/llama/modeling_llama.py", line 413, in forward
hidden_states, self_attn_weights, present_key_value = self.self_attn(
File "/home/elven/miniconda3/envs/tgweb/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/home/elven/miniconda3/envs/tgweb/lib/python3.10/site-packages/accelerate/hooks.py", line 165, in new_forward
output = old_forward(*args, **kwargs)
File "/home/elven/miniconda3/envs/tgweb/lib/python3.10/site-packages/transformers/models/llama/modeling_llama.py", line 335, in forward
attn_weights = torch.matmul(query_states, key_states.transpose(2, 3)) / math.sqrt(self.head_dim)
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 13.61 GiB (GPU 0; 23.65 GiB total capacity; 5.44 GiB already allocated; 8.93 GiB free; 6.77 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
Output generated in 0.25 seconds (0.00 tokens/s, 0 tokens, context 15109, seed 867992534)
OpenAIError UnboundLocalError("local variable 'tokens' referenced before assignment")
Traceback (most recent call last):
File "/ssd_data01/text-generation-webui/extensions/openai/script.py", line 101, in wrapper
func(self)
File "/ssd_data01/text-generation-webui/extensions/openai/script.py", line 172, in do_POST
response = OAIcompletions.chat_completions(body, is_legacy=is_legacy)
File "/ssd_data01/text-generation-webui/extensions/openai/completions.py", line 295, in chat_completions
completion_token_count = len(encode(answer)[0])
File "/ssd_data01/text-generation-webui/modules/text_generation.py", line 113, in encode
input_ids = shared.tokenizer.encode(str(prompt), return_tensors='pt', add_special_tokens=add_special_tokens)
File "/home/elven/miniconda3/envs/tgweb/lib/python3.10/site-packages/transformers/tokenization_utils_base.py", line 2373, in encode
encoded_inputs = self.encode_plus(
File "/home/elven/miniconda3/envs/tgweb/lib/python3.10/site-packages/transformers/tokenization_utils_base.py", line 2781, in encode_plus
return self._encode_plus(
File "/home/elven/miniconda3/envs/tgweb/lib/python3.10/site-packages/transformers/tokenization_utils.py", line 656, in _encode_plus
first_ids = get_input_ids(text)
File "/home/elven/miniconda3/envs/tgweb/lib/python3.10/site-packages/transformers/tokenization_utils.py", line 623, in get_input_ids
tokens = self.tokenize(text, **kwargs)
File "/home/elven/miniconda3/envs/tgweb/lib/python3.10/site-packages/transformers/models/llama/tokenization_llama.py", line 208, in tokenize
if tokens[0] == SPIECE_UNDERLINE and tokens[1] in self.all_special_tokens:
UnboundLocalError: local variable 'tokens' referenced before assignment

127.0.0.1 - - [31/Aug/2023 11:48:15] "POST /v1/chat/completions HTTP/1.1" 500 -
b'{"error": {"message": "UnboundLocalError(\"local variable 'tokens' referenced before assignment\")", "code": 500, "type": "OpenAIError", "param": ""}}

another thing is that he cuda have enouch resoures when the api returns error

matatonic · 2023-08-31T10:01:03Z

Yes, looks like CUDA OOM is the real problem, try freeing up more space (you can see what's using space with the nvidia-smi command) and restart the server.

…

On Thu, Aug 31, 2023, 1:25 a.m. elven2016 ***@***.***> wrote: I meant the error logs from the server with OPENEDAI_DEBUG=1 set, can you provide that? … <#m_-7063823997669950787_> On Wed, Aug 30, 2023, 5:26 a.m. elven2016 *@*.**> wrote: Can you include the server logs for this error? it should have a full stack trace. ideally, please enable OPENEDAI_DEBUG=1 Environment variable too. … <#m_-5870151089720743706_> On Mon, Aug 28, 2023, 1:40 a.m. elven2016 @.> wrote: See #3153 <#3153> <#3153 <#3153>> <#3153 <#3153> <#3153 <#3153>>> for a workaround with openai thanks for your reply， i change the config.yml set the truncate_length=8192 ,the context seems to be actvitated ,but the completion call get error: [image: image] https://user-images.githubusercontent.com/16677082/263590696-8791d2fc-437f-4e7b-87b4-64f797459fda.png <https://user-images.githubusercontent.com/16677082/263590696-8791d2fc-437f-4e7b-87b4-64f797459fda.png> https://user-images.githubusercontent.com/16677082/263590696-8791d2fc-437f-4e7b-87b4-64f797459fda.png <https://user-images.githubusercontent.com/16677082/263590696-8791d2fc-437f-4e7b-87b4-64f797459fda.png> — Reply to this email directly, view it on GitHub <#3668 <#3668> (comment) <#3668 (comment) <#3668 (comment)>>>, or unsubscribe https://github.com/notifications/unsubscribe-auth/ARO7ETIKDO635V6OHIKW4P3XXQK3BANCNFSM6AAAAAA34Q2ACM <https://github.com/notifications/unsubscribe-auth/ARO7ETIKDO635V6OHIKW4P3XXQK3BANCNFSM6AAAAAA34Q2ACM> https://github.com/notifications/unsubscribe-auth/ARO7ETIKDO635V6OHIKW4P3XXQK3BANCNFSM6AAAAAA34Q2ACM <https://github.com/notifications/unsubscribe-auth/ARO7ETIKDO635V6OHIKW4P3XXQK3BANCNFSM6AAAAAA34Q2ACM> . You are receiving this because you commented.Message ID: @.*> the log prints below： Traceback (most recent call last): File "/home/elven/miniconda3/envs/finreport/lib/python3.10/site-packages/streamlit/runtime/scriptrunner/script_runner.py", line 552, in _run_script exec(code, module.*dict*) File "/home/elven/finreport/finapp/app.py", line 182, in main() File "/home/elven/finreport/finapp/app.py", line 120, in main handle_userinput(user_question) File "/home/elven/finreport/finapp/app.py", line 82, in handle_userinput response = st.session_state.conversation({'question': user_question}) File "/home/elven/miniconda3/envs/finreport/lib/python3.10/site-packages/langchain/chains/base.py", line 282, in *call* raise e File "/home/elven/miniconda3/envs/finreport/lib/python3.10/site-packages/langchain/chains/base.py", line 276, in *call* self._call(inputs, run_manager=run_manager) File "/home/elven/miniconda3/envs/finreport/lib/python3.10/site-packages/langchain/chains/conversational_retrieval/base.py", line 141, in _call answer = self.combine_docs_chain.run( File "/home/elven/miniconda3/envs/finreport/lib/python3.10/site-packages/langchain/chains/base.py", line 480, in run return self(kwargs, callbacks=callbacks, tags=tags, metadata=metadata)[ File "/home/elven/miniconda3/envs/finreport/lib/python3.10/site-packages/langchain/chains/base.py", line 282, in *call* raise e File "/home/elven/miniconda3/envs/finreport/lib/python3.10/site-packages/langchain/chains/base.py", line 276, in *call* self._call(inputs, run_manager=run_manager) File "/home/elven/miniconda3/envs/finreport/lib/python3.10/site-packages/langchain/chains/combine_documents/base.py", line 105, in _call output, extra_return_dict = self.combine_docs( File "/home/elven/miniconda3/envs/finreport/lib/python3.10/site-packages/langchain/chains/combine_documents/stuff.py", line 171, in combine_docs return self.llm_chain.predict(callbacks=callbacks, **inputs), {} File "/home/elven/miniconda3/envs/finreport/lib/python3.10/site-packages/langchain/chains/llm.py", line 255, in predict return self(kwargs, callbacks=callbacks)[self.output_key] File "/home/elven/miniconda3/envs/finreport/lib/python3.10/site-packages/langchain/chains/base.py", line 282, in *call* raise e File "/home/elven/miniconda3/envs/finreport/lib/python3.10/site-packages/langchain/chains/base.py", line 276, in *call* self._call(inputs, run_manager=run_manager) File "/home/elven/miniconda3/envs/finreport/lib/python3.10/site-packages/langchain/chains/llm.py", line 91, in _call response = self.generate([inputs], run_manager=run_manager) File "/home/elven/miniconda3/envs/finreport/lib/python3.10/site-packages/langchain/chains/llm.py", line 101, in generate return self.llm.generate_prompt( File "/home/elven/miniconda3/envs/finreport/lib/python3.10/site-packages/langchain/chat_models/base.py", line 414, in generate_prompt return self.generate(prompt_messages, stop=stop, callbacks=callbacks, **kwargs) File "/home/elven/miniconda3/envs/finreport/lib/python3.10/site-packages/langchain/chat_models/base.py", line 309, in generate raise e File "/home/elven/miniconda3/envs/finreport/lib/python3.10/site-packages/langchain/chat_models/base.py", line 299, in generate self._generate_with_cache( File "/home/elven/miniconda3/envs/finreport/lib/python3.10/site-packages/langchain/chat_models/base.py", line 446, in _generate_with_cache return self._generate( File "/home/elven/miniconda3/envs/finreport/lib/python3.10/site-packages/langchain/chat_models/openai.py", line 345, in _generate response = self.completion_with_retry( File "/home/elven/miniconda3/envs/finreport/lib/python3.10/site-packages/langchain/chat_models/openai.py", line 278, in completion_with_retry return _completion_with_retry(**kwargs) File "/home/elven/miniconda3/envs/finreport/lib/python3.10/site-packages/tenacity/ *init*.py", line 289, in wrapped_f return self(f, *args, **kw) File "/home/elven/miniconda3/envs/finreport/lib/python3.10/site-packages/tenacity/ *init*.py", line 379, in *call* do = self.iter(retry_state=retry_state) File "/home/elven/miniconda3/envs/finreport/lib/python3.10/site-packages/tenacity/ *init*.py", line 325, in iter raise retry_exc.reraise() File "/home/elven/miniconda3/envs/finreport/lib/python3.10/site-packages/tenacity/ *init*.py", line 158, in reraise raise self.last_attempt.result() File "/home/elven/miniconda3/envs/finreport/lib/python3.10/concurrent/futures/_base.py", line 451, in result return self.__get_result() File "/home/elven/miniconda3/envs/finreport/lib/python3.10/concurrent/futures/_base.py", line 403, in __get_result raise self._exception File "/home/elven/miniconda3/envs/finreport/lib/python3.10/site-packages/tenacity/ *init*.py", line 382, in *call* result = fn(*args, **kwargs) File "/home/elven/miniconda3/envs/finreport/lib/python3.10/site-packages/langchain/chat_models/openai.py", line 276, in _completion_with_retry return self.client.create(kwargs) File "/home/elven/miniconda3/envs/finreport/lib/python3.10/site-packages/openai/api_resources/chat_completion.py", line 25, in create return super().create( args, **kwargs) File "/home/elven/miniconda3/envs/finreport/lib/python3.10/site-packages/openai/api_resources/abstract/engine_api_resource.py", line 153, in create response, _, api_key = requestor.request( File "/home/elven/miniconda3/envs/finreport/lib/python3.10/site-packages/openai/api_requestor.py", line 298, in request resp, got_stream = self._interpret_response(result, stream) File "/home/elven/miniconda3/envs/finreport/lib/python3.10/site-packages/openai/api_requestor.py", line 700, in _interpret_response self._interpret_response_line( File "/home/elven/miniconda3/envs/finreport/lib/python3.10/site-packages/openai/api_requestor.py", line 765, in _interpret_response_line raise self.handle_error_response( openai.error.APIError: UnboundLocalError("local variable 'tokens' referenced before assignment") {"error": {"message": "UnboundLocalError("local variable 'tokens' referenced before assignment")", "code": 500, "type": "OpenAIError", "param": ""}} 500 {'error': {'message': 'UnboundLocalError("local variable 'tokens' referenced before assignment")', 'code': 500, 'type': 'OpenAIError', 'param': ''}} {'Connection': 'close', 'Content-Length': '150', 'Access-Control-Allow-Credentials': 'true', 'Access-Control-Allow-Headers': 'Origin, Accept, X-Requested-With, Content-Type, Access-Control-Request-Method, Access-Control-Request-Headers, Authorization', 'Access-Control-Allow-Methods': 'GET,HEAD,OPTIONS,POST,PUT', 'Access-Control-Allow-Origin': '', 'Content-Type': 'application/json', 'Date': 'Mon, 28 Aug 2023 08:14:40 GMT', 'Server': 'BaseHTTP/0.6 Python/3.10.10'} — Reply to this email directly, view it on GitHub <#3668 (comment) <#3668 (comment)>>, or unsubscribe https://github.com/notifications/unsubscribe-auth/ARO7ETMPQQL5YPJOYYWFMQ3XX3W2LANCNFSM6AAAAAA34Q2ACM <https://github.com/notifications/unsubscribe-auth/ARO7ETMPQQL5YPJOYYWFMQ3XX3W2LANCNFSM6AAAAAA34Q2ACM> . You are receiving this because you commented.Message ID: @.*> *ignore the message ，the error output seems it's issue of context too long and the cuda oom* Traceback (most recent call last): File "/ssd_data01/text-generation-webui/modules/callbacks.py", line 56, in gentask ret = self.mfunc(callback=_callback, *args, **self.kwargs) File "/ssd_data01/text-generation-webui/modules/text_generation.py", line 321, in generate_with_callback shared.model.generate(**kwargs) File "/home/elven/miniconda3/envs/tgweb/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context return func(*args, **kwargs) File "/home/elven/miniconda3/envs/tgweb/lib/python3.10/site-packages/transformers/generation/utils.py", line 1642, in generate return self.sample( File "/home/elven/miniconda3/envs/tgweb/lib/python3.10/site-packages/transformers/generation/utils.py", line 2724, in sample outputs = self( File "/home/elven/miniconda3/envs/tgweb/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl return forward_call(*args, **kwargs) File "/home/elven/miniconda3/envs/tgweb/lib/python3.10/site-packages/accelerate/hooks.py", line 165, in new_forward output = old_forward(*args, **kwargs) File "/home/elven/miniconda3/envs/tgweb/lib/python3.10/site-packages/transformers/models/llama/modeling_llama.py", line 809, in forward outputs = self.model( File "/home/elven/miniconda3/envs/tgweb/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl return forward_call(*args, **kwargs) File "/home/elven/miniconda3/envs/tgweb/lib/python3.10/site-packages/transformers/models/llama/modeling_llama.py", line 697, in forward layer_outputs = decoder_layer( File "/home/elven/miniconda3/envs/tgweb/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl return forward_call(*args, **kwargs) File "/home/elven/miniconda3/envs/tgweb/lib/python3.10/site-packages/accelerate/hooks.py", line 165, in new_forward output = old_forward(*args, **kwargs) File "/home/elven/miniconda3/envs/tgweb/lib/python3.10/site-packages/transformers/models/llama/modeling_llama.py", line 413, in forward hidden_states, self_attn_weights, present_key_value = self.self_attn( File "/home/elven/miniconda3/envs/tgweb/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl return forward_call(*args, **kwargs) File "/home/elven/miniconda3/envs/tgweb/lib/python3.10/site-packages/accelerate/hooks.py", line 165, in new_forward output = old_forward(*args, **kwargs) File "/home/elven/miniconda3/envs/tgweb/lib/python3.10/site-packages/transformers/models/llama/modeling_llama.py", line 335, in forward attn_weights = torch.matmul(query_states, key_states.transpose(2, 3)) / math.sqrt(self.head_dim) torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 13.61 GiB (GPU 0; 23.65 GiB total capacity; 5.44 GiB already allocated; 8.93 GiB free; 6.77 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF Output generated in 0.25 seconds (0.00 tokens/s, 0 tokens, context 15109, seed 867992534) OpenAIError UnboundLocalError("local variable 'tokens' referenced before assignment") Traceback (most recent call last): File "/ssd_data01/text-generation-webui/extensions/openai/script.py", line 101, in wrapper func(self) File "/ssd_data01/text-generation-webui/extensions/openai/script.py", line 172, in do_POST response = OAIcompletions.chat_completions(body, is_legacy=is_legacy) File "/ssd_data01/text-generation-webui/extensions/openai/completions.py", line 295, in chat_completions completion_token_count = len(encode(answer)[0]) File "/ssd_data01/text-generation-webui/modules/text_generation.py", line 113, in encode input_ids = shared.tokenizer.encode(str(prompt), return_tensors='pt', add_special_tokens=add_special_tokens) File "/home/elven/miniconda3/envs/tgweb/lib/python3.10/site-packages/transformers/tokenization_utils_base.py", line 2373, in encode encoded_inputs = self.encode_plus( File "/home/elven/miniconda3/envs/tgweb/lib/python3.10/site-packages/transformers/tokenization_utils_base.py", line 2781, in encode_plus return self._encode_plus( File "/home/elven/miniconda3/envs/tgweb/lib/python3.10/site-packages/transformers/tokenization_utils.py", line 656, in _encode_plus first_ids = get_input_ids(text) File "/home/elven/miniconda3/envs/tgweb/lib/python3.10/site-packages/transformers/tokenization_utils.py", line 623, in get_input_ids tokens = self.tokenize(text, **kwargs) File "/home/elven/miniconda3/envs/tgweb/lib/python3.10/site-packages/transformers/models/llama/tokenization_llama.py", line 208, in tokenize if tokens[0] == SPIECE_UNDERLINE and tokens[1] in self.all_special_tokens: UnboundLocalError: local variable 'tokens' referenced before assignment 127.0.0.1 - - [31/Aug/2023 11:48:15] "POST /v1/chat/completions HTTP/1.1" 500 - b'{"error": {"message": "UnboundLocalError(\"local variable 'tokens' referenced before assignment\")", "code": 500, "type": "OpenAIError", "param": ""}} — Reply to this email directly, view it on GitHub <#3668 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ARO7ETOXQGIXIY3ULN4QH7DXYADL3ANCNFSM6AAAAAA34Q2ACM> . You are receiving this because you commented.Message ID: ***@***.***>

github-actions · 2023-10-12T23:16:38Z

This issue has been closed due to inactivity for 6 weeks. If you believe it is still relevant, please leave a comment below. You can tag a developer in your comment.

elven2016 added the enhancement New feature or request label Aug 24, 2023

github-actions bot added the stale label Oct 12, 2023

github-actions bot closed this as completed Oct 12, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

openai api extention can use long context model: if the model context is 16k, the api also can use this 16k context #3668

openai api extention can use long context model: if the model context is 16k, the api also can use this 16k context #3668

elven2016 commented Aug 24, 2023

matatonic commented Aug 27, 2023

elven2016 commented Aug 28, 2023

matatonic commented Aug 28, 2023 via email

elven2016 commented Aug 30, 2023

matatonic commented Aug 30, 2023 via email

elven2016 commented Aug 31, 2023 •

edited

Loading

matatonic commented Aug 31, 2023 via email

github-actions bot commented Oct 12, 2023

openai api extention can use long context model: if the model context is 16k, the api also can use this 16k context #3668

openai api extention can use long context model: if the model context is 16k, the api also can use this 16k context #3668

Comments

elven2016 commented Aug 24, 2023

matatonic commented Aug 27, 2023

elven2016 commented Aug 28, 2023

matatonic commented Aug 28, 2023 via email

elven2016 commented Aug 30, 2023

matatonic commented Aug 30, 2023 via email

elven2016 commented Aug 31, 2023 • edited Loading

matatonic commented Aug 31, 2023 via email

github-actions bot commented Oct 12, 2023

elven2016 commented Aug 31, 2023 •

edited

Loading