Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: grammar-based sampling in llama-cpp #9712

Merged
merged 13 commits into from
Aug 28, 2023

Conversation

eryk-dsai
Copy link
Contributor

@eryk-dsai eryk-dsai commented Aug 24, 2023

Description

The following PR enables the grammar-based sampling in llama-cpp LLM.

In short, loading file with formal grammar definition will constrain model outputs. For instance, one can force the model to generate valid JSON or generate only python lists.

In the follow-up PR we will add:

  • docs with some description why it is cool and how it works
  • maybe some code sample for some task such as in llama repo

@vercel
Copy link

vercel bot commented Aug 24, 2023

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name Status Preview Comments Updated (UTC)
langchain ✅ Ready (Inspect) Visit Preview 💬 Add feedback Aug 25, 2023 0:08am
1 Ignored Deployment
Name Status Preview Comments Updated (UTC)
langchain-deprecated ⬜️ Ignored (Inspect) Visit Preview Aug 25, 2023 0:08am

@dosubot dosubot bot added Ɑ: models Related to LLMs or chat model modules 🤖:enhancement A large net-new component, integration, or chain. Use sparingly. The largest features labels Aug 24, 2023
Copy link
Collaborator

@rlancemartin rlancemartin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should add some usage examples to the notebook:
https://python.langchain.com/docs/integrations/llms/llamacpp

Edit: oh i see you want to add in follow-up PR? If easy, maybe just consolidate here?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@baskaryan feel free to weigh in here, but think we should make this a classmethod or in the init

from_grammar_path()

it's not great to be hitting the file system every time we call the model, and I'd assume people may want to directly provide the grammar as a string rather than exclusively via a file

@rlancemartin
Copy link
Collaborator

rlancemartin commented Aug 24, 2023

I'm testing now:

  • Using grammars here

Example init:

n_gpu_layers = 1  # Metal set to 1 is enough.
n_batch = 512  # Should be between 1 and n_ctx, consider the amount of RAM of your Apple Silicon Chip.

llm = LlamaCpp(
    model_path="/Users/rlm/Desktop/Code/llama.cpp/llama-2-13b-chat.ggmlv3.q4_0.bin",
    n_gpu_layers=n_gpu_layers,
    n_batch=n_batch,
    f16_kv=True,  # MUST set to True, otherwise you will run into problem after a couple of calls
    callback_manager=callback_manager,
    verbose=True,
    grammar_path="/Users/rlm/Desktop/json.gbnf",
)

Result:

question = "Request: schedule a call at 8pm; Command:"
llm(question)
root ::= object 
object ::= [{] ws object_11 [}] ws 
value ::= object | array | string | number | value_6 ws 
array ::= [[] ws array_15 []] ws 
string ::= ["] string_18 ["] ws 
number ::= number_19 number_25 number_29 ws 
value_6 ::= [t] [r] [u] [e] | [f] [a] [l] [s] [e] | [n] [u] [l] [l] 
ws ::= ws_31 
object_8 ::= string [:] ws value object_10 
object_9 ::= [,] ws string [:] ws value 
object_10 ::= object_9 object_10 | 
object_11 ::= object_8 | 
array_12 ::= value array_14 
array_13 ::= [,] ws value 
array_14 ::= array_13 array_14 | 
array_15 ::= array_12 | 
string_16 ::= [^"\] | [\] string_17 
string_17 ::= ["\/bfnrt] | [u] [0-9a-fA-F] [0-9a-fA-F] [0-9a-fA-F] [0-9a-fA-F] 
string_18 ::= string_16 string_18 | 
number_19 ::= number_20 number_21 
number_20 ::= [-] | 
number_21 ::= [0-9] | [1-9] number_22 
number_22 ::= [0-9] number_22 | 
number_23 ::= [.] number_24 
number_24 ::= [0-9] number_24 | [0-9] 
number_25 ::= number_23 | 
number_26 ::= [eE] number_27 number_28 
number_27 ::= [-+] | 
number_28 ::= [0-9] number_28 | [0-9] 
number_29 ::= number_26 | 
ws_30 ::= [ <U+0009><U+000A>] ws 
ws_31 ::= ws_30 | 
{ "scheduleCall": { "date": "2018-09-25", "time": "20:00" } }

But, many newlines added following the JSON:

'{"schedule": {"date": "2018-09-14T20:00:00.000Z", "duration": 60}}\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n'

@rlancemartin
Copy link
Collaborator

rlancemartin commented Aug 24, 2023

Possible bug / issue in generations w/ excessive newlines.

Also see this in LangSmith trace.

Seems specific to when I supply the grammar_path.

Separately, Ollama folks report similar behavior w/ code-LLaMA model @jmorganca.

Possible a common issue w/ incorrectly LLaMA prompting in both cases?

@rlancemartin
Copy link
Collaborator

This is fixed if we manually supply a STOP token in the prompt. See here.

@rlancemartin
Copy link
Collaborator

Full demo:

n_gpu_layers = 1 
n_batch = 512 
llm = LlamaCpp(
    model_path="/Users/rlm/Desktop/Code/llama.cpp/llama-2-13b-chat.ggmlv3.q4_0.bin",
    n_gpu_layers=n_gpu_layers,
    n_batch=n_batch,
    f16_kv=True,  # MUST set to True, otherwise you will run into problem after a couple of calls
    callback_manager=callback_manager,
    verbose=True,
    grammar_path="/Users/rlm/Desktop/json.gbnf",
    stop=["STOP"]
)

Run w/ STOP token specified:

template = """Print 'STOP' when you are finished answering the question. Question: {question}"""
prompt = PromptTemplate(template=template, input_variables=["question"])
llm_chain = LLMChain(prompt=prompt, llm=llm)
question = "Request: schedule a call at 8pm; Command:"
llm_chain.run(question)

Result:

'{"type": "request", "message": "Hello! I would like to request your availability for a call tonight at 8pm. Would you be available?", "tones": [{"id": "polite", "name": "Polite"}]}'

@rlancemartin
Copy link
Collaborator

@eryk-dsai overall, this is promising but quite finicky / tricky.

We will definitely need a good / crisp prompting guide.

In addition, we should check in some hardened .gbnf files that "just work."

I started here, but still work to be done since reliability is not there yet.

In particular, the default .gbnf w/ llama.cpp (here) spew excessive newlines (at least JSON did).

I tried to fix it in the linked PR.

@eryk-dsai
Copy link
Contributor Author

eryk-dsai commented Aug 25, 2023

Hi @rlancemartin ,

I played with list grammar a little bit more and I think I've got something that works reasonably well for lists that consist of multi-word strings. Obviously Python's lists can be much more diverse: empty lists, lists with different types, lists of lists... but I think the example grammar of list and json that we've provided should be enough.

Feel free to update the model and grammar path in list example if you want

@rlancemartin
Copy link
Collaborator

Hi @rlancemartin ,

I played with list grammar a little bit more and I think I've got something that works reasonably well for lists that consist of multi-word strings. Obviously Python's lists can be much more diverse: empty lists, lists with different types, lists of lists... but I think the example grammar of list and json that we've provided should be enough.

Feel free to update the model and grammar path in list example if you want

Thanks for adding! I'll have a look now.

@rlancemartin
Copy link
Collaborator

Hi @rlancemartin ,

I played with list grammar a little bit more and I think I've got something that works reasonably well for lists that consist of multi-word strings. Obviously Python's lists can be much more diverse: empty lists, lists with different types, lists of lists... but I think the example grammar of list and json that we've provided should be enough.

Feel free to update the model and grammar path in list example if you want

Pretty cool! Just tested w/ -

llama.cpp/models/openorca-platypus2-13b.gguf.q4_0.bin

Note gguf formated needed for llama-cpp-python v0.1.79.

Converted my model (openorca-platypus2-13b) via -

llama.cpp % python ./convert-llama-ggmlv3-to-gguf.py --eps 1e-5 --input models/openorca-platypus2-13b.ggmlv3.q4_0.bin --output models/openorca-platypus2-13b.gguf.q4_0.bin

Init -

llm = LlamaCpp(
    model_path="/Users/rlm/Desktop/Code/llama.cpp/models/openorca-platypus2-13b.gguf.q4_0.bin",
    n_gpu_layers=1,
    n_batch=512,
    f16_kv=True, 
    callback_manager=callback_manager,
    verbose=True,
    grammar_path="/Users/rlm/Desktop/Code/langchain-main/langchain/libs/langchain/langchain/llms/grammars/json.gbnf",
)

Ran llm("Describe a person in JSON format:") -

{
  "name": "John Doe",
  "age": 30,
  "": "Engineer",
  "country": {
    "name": "United States"
  },
  "likes": [
    "Sports",
    "Music",
    "Movies"
  ]}

Lists still need some work (at least for me).

result=llm("List of top-3 my favourite books:")
{"data":[{"book_name":"The Alchemist","author":"Paul Coelho"},{"book_name":"The Secret","author":"Rhonda Byrne"},{"book_name":"Think and Grow Rich","author":"Napoleon Hill"}]}

@eryk-dsai
Copy link
Contributor Author

@rlancemartin thank you for mentioning gguf.

Your most recent output appears to be the correct json. Is it possible that you used the path json.gbnf rather than list.gbnf? I tested list grammar again locally, and it always produced the correct Python list.

@rlancemartin
Copy link
Collaborator

@rlancemartin thank you for mentioning gguf.

Your most recent output appears to be the correct json. Is it possible that you used the path json.gbnf rather than list.gbnf? I tested list grammar again locally, and it always produced the correct Python list.

I was able to reproduce! Nice. Going to merge this.

Copy link
Collaborator

@rlancemartin rlancemartin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can reproduce this:

# Path to LLaMA
llm_path = "/Users/rlm/Desktop/Code/llama.cpp/models/openorca-platypus2-13b.gguf.q4_0.bin"
# Path to Langchain repo
langchain_path = "/Users/rlm/Desktop/Code/langchain-main"

JSON -

n_gpu_layers = 1
n_batch = 512
llm = LlamaCpp(
    model_path=llm_path,
    n_gpu_layers=n_gpu_layers,
    n_batch=n_batch,
    f16_kv=True,  # MUST set to True, otherwise you will run into problem after a couple of calls
    callback_manager=callback_manager,
    verbose=True,
    grammar_path=langchain_path+"/langchain/libs/langchain/langchain/llms/grammars/json.gbnf",
)
result=llm("Describe a person in JSON format:")

Result -

{"name":"John Smith", "age":32, "":"Software Engineer"}

Results are also as expected using list.

@rlancemartin rlancemartin merged commit 7f5713b into langchain-ai:master Aug 28, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🤖:enhancement A large net-new component, integration, or chain. Use sparingly. The largest features Ɑ: models Related to LLMs or chat model modules
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants