Added multiple model prompt formats to llama_cpp chat streaming. #3588

nigel-daniels · 2023-12-08T00:09:51Z

This adds multiple prompt formats to the llama_cpp chat model to support models other than Llama2 running locally.

vercel · 2023-12-08T00:09:56Z

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name	Status	Preview	Comments	Updated (UTC)
langchainjs-api-refs	❌ Failed (Inspect)			Dec 14, 2023 6:26pm
langchainjs-docs	❌ Failed (Inspect)			Dec 14, 2023 6:26pm

giladgd · 2023-12-11T23:08:39Z

langchain/src/chat_models/llama_cpp.ts

+            case "chatML":
+              messageText = `<|im_start|>user\n${message.content}<|im_end|>`;
+              break;
+            case "falcon":
+              messageText = `User: ${message.content}`;
+              break;
+            case "general":
+              messageText = `### Human\n${message.content}`;
+              break;
+            default:
+              messageText = `[INST] ${message.content} [/INST]`;
+              break;


@nigel-daniels Why aren't you using a LlamaChatSession with the relevant ChatPromptWrapper for it?
For example, you have a ChatPromptWrapper for ChatML under ChatMLChatPromptWrapper
It's better to only have one centralized place for the handling of chat formats, especially as more will get supported in node-llama-cpp over time.

Furthermore, in the next major version of node-llama-cpp it'll figure out the right chat wrapper by default, so it won't require a manual config most of the time.

This is because the call for streaming uses the context.evaluate() method:

const stream = await this.caller.call(async () => this._context.evaluate(this._context.encode(prompt), promptOptions) );

This returned a generator function where I can use yield to pass back the tokens as they arrive.

giladgd · 2023-12-11T23:14:48Z

langchain/src/chat_models/llama_cpp.ts

+            default:
+              messageText = `[INST] ${message.content} [/INST]`;
+              break;
+          }
        } else if (message._getType() === "ai") {


The next version will support the generation of a response based on chat histories that can have "system", "user" and "model" roles, so I think most of this code could be replaced by the node-llama-cpp functionality for this.

I predict that the next beta that includes this functionality will be ready in about ~2 weeks if you're willing to wait for it, but in any case, I think this better be handled by node-llama-cpp as it its responsibility to handle the interface against a model.

jacoblee93 · 2023-12-13T00:09:41Z

Thanks for the review @giladgd - @nigel-daniels should we wait a bit then?

nigel-daniels · 2023-12-13T00:18:18Z

@jacoblee93 as I understood it this would only work if I was using the session.prompt() method but in order to get hold of a generator function I was calling the context.evaluate() and bypass the session. It's the session which applies the different wrapper types.
@giladgd do you have any thoughts on a better way to do this using the session?
Afaik If I use the context directly then there is no point in waiting.

giladgd · 2023-12-13T13:58:04Z

@nigel-daniels If I understood the intention of _streamResponseChunks correctly, it means streaming the model's response but not necessarily controlling the generation of the response itself.

A good solution would be to start generating a response using a LlamaChatSession and stream it without controlling the generation process.
This solution will also improve the generation performance.

This is a general idea of how it can be implemented:

async function *_streamResponseChunks(
    input: BaseMessage[],
    _options: this["ParsedCallOptions"],
    runManager?: CallbackManagerForLLMRun
): AsyncGenerator<ChatGenerationChunk> {
    // `this._parseInput` will convert a `BaseMessage[]` to the `conversationHistory` format that `LlamaChatSession` expects,
    // return the last human prompt text as `lastHumPrompt`,
    // and the system message as `sysMessage` or `undefined` if there is none, so the default system message of `node-llama-cpp` will be used.
    const {sysMessage, conversationHistory, lastHumanPrompt} = this._parseInput(input);

    const chatSession = new LlamaChatSession({
        context: this._context,
        conversationHistory,
        systemPrompt: sysMessage,
    });

    let pendingTokens: Token[] = [];
    let generationCompleted = false;
    let onNewUpdate: (() => void) | undefined;

    const responsePromise = chatSession.prompt(lastHumanPrompt, {
        temperature: this?.temperature,
        topK: this?.topK,
        topP: this?.topP,
        onToken(tokens: Token[]) {
            pendingTokens = pendingTokens.concat(tokens);
            onNewUpdate?.();
        }
    });
    responsePromise
        .finally(() => {
            generationCompleted = true;
            onNewUpdate?.();
        });

    while (!generationCompleted || pendingTokens.length > 0) {
        if (pendingTokens.length === 0) {
            await new Promise((accept) => {
                onNewUpdate = accept;
            });
            continue;
        }

        const chunkText = this._context.decode(pendingTokens);
        pendingTokens = [];

        yield new ChatGenerationChunk({
            text: chunkText,
            message: new AIMessageChunk({
                content: chunkText
            }),
            generationInfo: {},
        });
        await runManager?.handleLLMNewToken(chunkText);
    }
    
    await responsePromise; // in case of an error, this will throw
}

Note that the next major version of node-llama-cpp will provide a much more convenient interface that will make most of the this._parseInput function redundant, but as it'll take some time until it's ready, I suggest proceeding with this workaround for now.

nigel-daniels · 2023-12-14T19:46:02Z

@giladgd looks like a good workaround, it will also ensure the Wrappers get used so this module can remain better aligned with node-llama-cpp. Thanks!

nigel-daniels · 2023-12-14T22:29:10Z

Given the previous comments about the upcoming update in a couple of weeks and the restructure happening in the Langchain.JS main branch I'll close this PR and rebase on main when the 'node-llama-cpp' updates are out.

Added multiple model message formats to streaming.

10f3f08

dosubot bot added the size:M This PR changes 30-99 lines, ignoring generated files. label Dec 8, 2023

dosubot bot added the auto:improvement Medium size change to existing code to handle new use-cases label Dec 8, 2023

vercel bot had a problem deploying to Preview – langchainjs-docs December 8, 2023 00:14 Failure

vercel bot had a problem deploying to Preview – langchainjs-api-refs December 8, 2023 00:14 Failure

nigel-daniels mentioned this pull request Dec 8, 2023

Llama stream multiple models supported #3505

Closed

giladgd suggested changes Dec 11, 2023

View reviewed changes

Doc update.

449f3be

vercel bot had a problem deploying to Preview – langchainjs-docs December 14, 2023 18:25 Failure

vercel bot had a problem deploying to Preview – langchainjs-api-refs December 14, 2023 18:26 Failure

nigel-daniels closed this Dec 14, 2023

giladgd mentioned this pull request Jan 18, 2024

feat: function calling withcatai/node-llama-cpp#139

Merged

7 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Added multiple model prompt formats to llama_cpp chat streaming. #3588

Added multiple model prompt formats to llama_cpp chat streaming. #3588

nigel-daniels commented Dec 8, 2023

vercel bot commented Dec 8, 2023 •

edited

Loading

giladgd Dec 11, 2023

nigel-daniels Dec 11, 2023

giladgd Dec 11, 2023

jacoblee93 commented Dec 13, 2023

nigel-daniels commented Dec 13, 2023

giladgd commented Dec 13, 2023 •

edited

Loading

nigel-daniels commented Dec 14, 2023

nigel-daniels commented Dec 14, 2023

Added multiple model prompt formats to llama_cpp chat streaming. #3588

Added multiple model prompt formats to llama_cpp chat streaming. #3588

Conversation

nigel-daniels commented Dec 8, 2023

vercel bot commented Dec 8, 2023 • edited Loading

giladgd Dec 11, 2023

Choose a reason for hiding this comment

nigel-daniels Dec 11, 2023

Choose a reason for hiding this comment

giladgd Dec 11, 2023

Choose a reason for hiding this comment

jacoblee93 commented Dec 13, 2023

nigel-daniels commented Dec 13, 2023

giladgd commented Dec 13, 2023 • edited Loading

nigel-daniels commented Dec 14, 2023

nigel-daniels commented Dec 14, 2023

vercel bot commented Dec 8, 2023 •

edited

Loading

giladgd commented Dec 13, 2023 •

edited

Loading