-
Notifications
You must be signed in to change notification settings - Fork 15.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: grammar-based sampling in llama-cpp #9712
feat: grammar-based sampling in llama-cpp #9712
Conversation
The latest updates on your projects. Learn more about Vercel for Git ↗︎
1 Ignored Deployment
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should add some usage examples to the notebook:
https://python.langchain.com/docs/integrations/llms/llamacpp
Edit: oh i see you want to add in follow-up PR? If easy, maybe just consolidate here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@baskaryan feel free to weigh in here, but think we should make this a classmethod or in the init
from_grammar_path()
it's not great to be hitting the file system every time we call the model, and I'd assume people may want to directly provide the grammar as a string rather than exclusively via a file
I'm testing now:
Example init:
Result:
But, many newlines added following the JSON:
|
Possible bug / issue in generations w/ excessive newlines. Also see this in LangSmith trace. Seems specific to when I supply the Separately, Ollama folks report similar behavior w/ code-LLaMA model @jmorganca. Possible a common issue w/ incorrectly LLaMA prompting in both cases? |
This is fixed if we manually supply a |
Full demo:
Run w/ STOP token specified:
Result:
|
@eryk-dsai overall, this is promising but quite finicky / tricky. We will definitely need a good / crisp prompting guide. In addition, we should check in some hardened I started here, but still work to be done since reliability is not there yet. In particular, the default I tried to fix it in the linked PR. |
Hi @rlancemartin , I played with list grammar a little bit more and I think I've got something that works reasonably well for lists that consist of multi-word strings. Obviously Python's lists can be much more diverse: empty lists, lists with different types, lists of lists... but I think the example grammar of list and json that we've provided should be enough. Feel free to update the model and grammar path in list example if you want |
Thanks for adding! I'll have a look now. |
Pretty cool! Just tested w/ -
Note Converted my model (openorca-platypus2-13b) via -
Init -
Ran
Lists still need some work (at least for me).
|
@rlancemartin thank you for mentioning gguf. Your most recent output appears to be the correct json. Is it possible that you used the path |
I was able to reproduce! Nice. Going to merge this. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can reproduce this:
# Path to LLaMA
llm_path = "/Users/rlm/Desktop/Code/llama.cpp/models/openorca-platypus2-13b.gguf.q4_0.bin"
# Path to Langchain repo
langchain_path = "/Users/rlm/Desktop/Code/langchain-main"
JSON -
n_gpu_layers = 1
n_batch = 512
llm = LlamaCpp(
model_path=llm_path,
n_gpu_layers=n_gpu_layers,
n_batch=n_batch,
f16_kv=True, # MUST set to True, otherwise you will run into problem after a couple of calls
callback_manager=callback_manager,
verbose=True,
grammar_path=langchain_path+"/langchain/libs/langchain/langchain/llms/grammars/json.gbnf",
)
result=llm("Describe a person in JSON format:")
Result -
{"name":"John Smith", "age":32, "":"Software Engineer"}
Results are also as expected using list.
Description
The following PR enables the grammar-based sampling in llama-cpp LLM.
In short, loading file with formal grammar definition will constrain model outputs. For instance, one can force the model to generate valid JSON or generate only python lists.
In the follow-up PR we will add: