Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Newlines in generation when using grammar #637

Open
rlancemartin opened this issue Aug 24, 2023 · 4 comments
Open

Newlines in generation when using grammar #637

rlancemartin opened this issue Aug 24, 2023 · 4 comments
Labels
model Model specific issue quality Quality of model output

Comments

@rlancemartin
Copy link

rlancemartin commented Aug 24, 2023

Using llama-cpp-python w/ LangChain integration and this PR to support grammars.

Test w/o grammar_path:

llm = LlamaCpp(
    model_path="/Users/rlm/Desktop/Code/llama.cpp/llama-2-13b-chat.ggmlv3.q4_0.bin",
    n_gpu_layers=n_gpu_layers,
    n_batch=n_batch,
    callback_manager=callback_manager,
    verbose=True,
)
question = "What NFL team won the Super Bowl in the year Justin Bieber was born?"
llm(question)

The result is as expected.


Test w/ grammar_path:

llm = LlamaCpp(
    model_path="/Users/rlm/Desktop/Code/llama.cpp/llama-2-13b-chat.ggmlv3.q4_0.bin",
    n_gpu_layers=n_gpu_layers,
    n_batch=n_batch,
    f16_kv=True,  # MUST set to True, otherwise you will run into problem after a couple of calls
    callback_manager=callback_manager,
    verbose=True,
    grammar_path="/Users/rlm/Desktop/json.gbnf",
)
question = "Request: schedule a call at 8pm; Command:"
llm(question)

The result has a large number of newlines:

'{"schedule": {"date": "2018-09-14T20:00:00.000Z", "duration": 60}}\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n'

Has anyone seen / resolved similar behavior?

@abetlen
Copy link
Owner

abetlen commented Aug 24, 2023

@rlancemartin the grammar only specifies the syntax of the output not necessarily the stopping condition, if the model doesn't generate an EOS token and no other stopping criteria is met "\n" is the only valid character at the end of the generation. You either need to pass something to the stop list or using StoppingCriteria that checks if the output is parseable using json.loads.

@rlancemartin
Copy link
Author

rlancemartin commented Aug 24, 2023

@rlancemartin the grammar only specifies the syntax of the output not necessarily the stopping condition, if the model doesn't generate an EOS token and no other stopping criteria is met "\n" is the only valid character at the end of the generation. You either need to pass something to the stop list or using StoppingCriteria that checks if the output is parseable using json.loads.

Thanks. Yes, this works.

# Make sure the model path is correct for your system!
llm = LlamaCpp(
    model_path="/Users/rlm/Desktop/Code/llama.cpp/llama-2-13b-chat.ggmlv3.q4_0.bin",
    n_gpu_layers=n_gpu_layers,
    n_batch=n_batch,
    f16_kv=True,  # MUST set to True, otherwise you will run into problem after a couple of calls
    callback_manager=callback_manager,
    verbose=True,
    grammar_path="/Users/rlm/Desktop/json.gbnf",
    stop=["STOP"]
)

Prompt w/ STOP token specified:

template = """Print 'STOP' when you are finished answering the question. Question: {question}"""
prompt = PromptTemplate(template=template, input_variables=["question"])
llm_chain = LLMChain(prompt=prompt, llm=llm)
question = "Request: schedule a call at 8pm; Command:"
llm_chain.run(question)

Result, as expected:

'{"type": "request", "message": "Hello! I would like to request your availability for a call tonight at 8pm. Would you be available?", "tones": [{"id": "polite", "name": "Polite"}]}'

Is there a best-practice for this? (I'm just using STOP as a test-case.)

@rlancemartin
Copy link
Author

OK, I think I get it a bit further:

The problem seems to be with the json.gbnf specifically.

I'm working on modifying that file.

@AndreaRiboni
Copy link

any update?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
model Model specific issue quality Quality of model output
Projects
None yet
Development

No branches or pull requests

4 participants