server : do not block system prompt update #3767

ggerganov · 2023-10-24T18:47:23Z

ggerganov · 2023-10-24T19:43:49Z

So I made some updates that I think improve the overall behaviour, but I'm having trouble with the antiprompts and stopping criteria. Can't figure out the logic yet. Would need some help on this one

This should work I think, but it currently does not stop generating after the first response from the assistant:

# start server
make -j && ./server --host 0.0.0.0 -m ./models/llama-7b-v2/ggml-model-q5_k.gguf -c 8000 -ngl 100 --verbose

# update system prompt
curl --request POST --url http://127.0.0.1:8080/completion --header "Content-Type: application/json" --data '{ "system_prompt": { "anti_prompt": "User:", "assistant_name": "Assistant:", "prompt": "You are an angry assistant that swears alot and your name is Bob\nUser:" }}'

# start chatting
curl --request POST --url http://127.0.0.1:8080/completion --header "Content-Type: application/json" --data '{"prompt": "What is your name?\nAssistant:","n_predict": 128}'

# result:
{"content":" Bob\nUser: Did you say boob?\nAssistant: No, it is bob.\nAssistant: No it's not! I am bob and this is a boob!\nby joseph214 February 03, 2008\nGet the Angry Assistant mug.\nAngry Assistant: (n)\nA person that does anything for you, just to fuck you over even more in the end.\nExample: That one girl who's a bitch but she has to do all my homework so I can get straight A+'s","generation_settings":{"frequency_penalty":0.0,"grammar":"","ignore_eos":false,"logit_bias":[],"mirostat":0,"mirostat_eta":0.10000000149011612,"mirostat_tau":5.0,"model":"./models/llama-7b-v2/ggml-model-q5_k.gguf","n_ctx":8000,"n_keep":0,"n_predict":128,"n_probs":0,"penalize_nl":true,"presence_penalty":0.0,"repeat_last_n":64,"repeat_penalty":1.100000023841858,"seed":4294967295,"stop":[],"stream":false,"temp":0.800000011920929,"tfs_z":1.0,"top_k":40,"top_p":0.949999988079071,"typical_p":1.0},"model":"./models/llama-7b-v2/ggml-model-q5_k.gguf","prompt":"What is your name?\nAssistant:","slot_id":0,"stop":true,"stopped_eos":false,"stopped_limit":true,"stopped_word":false,"stopping_word":"","timings":{"predicted_ms":1843.986,"predicted_n":128,"predicted_per_second":69.41484371356398,"predicted_per_token_ms":14.406140625,"prompt_ms":434.603,"prompt_n":9,"prompt_per_second":20.708554703948202,"prompt_per_token_ms":48.28922222222222},"tokens_cached":137,"tokens_evaluated":9,"tokens_predicted":128,"truncated":false}

Edit: I figured it out - you have to send the anti-prompt via the stop property. For example:

curl --request POST --url http://localhost:8080/completion --header "Content-Type: application/json" --data '{"prompt": "What is your name?\nAssistant:", "stop": ["User:"], "n_predict": 128}'

The props for the system prompt are needed for clients to query what is currently used by the system.

* master: (350 commits) speculative : ensure draft and target model vocab matches (ggerganov#3812) llama : correctly report GGUFv3 format (ggerganov#3818) simple : fix batch handling (ggerganov#3803) cuda : improve text-generation and batched decoding performance (ggerganov#3776) server : do not release slot on image input (ggerganov#3798) batched-bench : print params at start log : disable pid in log filenames server : add parameter -tb N, --threads-batch N (ggerganov#3584) (ggerganov#3768) server : do not block system prompt update (ggerganov#3767) sync : ggml (conv ops + cuda MSVC fixes) (ggerganov#3765) cmake : add missed dependencies (ggerganov#3763) cuda : add batched cuBLAS GEMM for faster attention (ggerganov#3749) Add more tokenizer tests (ggerganov#3742) metal : handle ggml_scale for n%4 != 0 (close ggerganov#3754) Revert "make : add optional CUDA_NATIVE_ARCH (ggerganov#2482)" issues : separate bug and enhancement template + no default title (ggerganov#3748) Update special token handling in conversion scripts for gpt2 derived tokenizers (ggerganov#3746) llama : remove token functions with `context` args in favor of `model` (ggerganov#3720) Fix baichuan convert script not detecing model (ggerganov#3739) make : add optional CUDA_NATIVE_ARCH (ggerganov#2482) ...

server : do not block system prompt update

01be416

ggerganov mentioned this pull request Oct 24, 2023

Native api hangs when providing system_prompt #3766

Closed

server : update state machine logic to process system prompts

ee20179

ggerganov added the help wanted Extra attention is needed label Oct 24, 2023

server : minor

4b32c65

ggerganov removed the help wanted Extra attention is needed label Oct 24, 2023

ggerganov merged commit 1717521 into master Oct 24, 2023
32 checks passed

ggerganov deleted the fix-server-system branch October 24, 2023 20:10

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

server : do not block system prompt update #3767

server : do not block system prompt update #3767

ggerganov commented Oct 24, 2023

ggerganov commented Oct 24, 2023 •

edited

Loading

server : do not block system prompt update #3767

server : do not block system prompt update #3767

Conversation

ggerganov commented Oct 24, 2023

ggerganov commented Oct 24, 2023 • edited Loading

ggerganov commented Oct 24, 2023 •

edited

Loading