-
-
Notifications
You must be signed in to change notification settings - Fork 93
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: automatic batching #104
Conversation
…, smart context shifting support, better automatic chat wrapper resolution, improved API, safer `Token` type
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
🎉 This PR is included in version 3.0.0-beta.1 🎉 The release is available on:
Your semantic-release bot 📦🚀 |
is there a code snippet that shows how to correctly use batching? |
@Madd0g There will be a better example in the documentation when version 3 leaves the beta status soon, but for now, here's a simple example: import {fileURLToPath} from "url";
import path from "path";
import {getLlama, LlamaChatSession} from "node-llama-cpp";
const __dirname = path.dirname(fileURLToPath(import.meta.url));
const llama = await getLlama();
const model = await llama.loadModel({
modelPath: path.join(__dirname, "models", "dolphin-2.1-mistral-7b.Q4_K_M.gguf")
});
const context = await model.createContext({
sequences: 2
});
const sequence1 = context.getSequence();
const sequence2 = context.getSequence();
const session1 = new LlamaChatSession({
contextSequence: sequence1
});
const session2 = new LlamaChatSession({
contextSequence: sequence2
});
const q1 = "Hi there, how are you?";
const q2 = "How much is 6+6?";
const [
a1,
a2
] = await Promise.all([
session1.prompt(q1),
session2.prompt(q2)
]);
console.log("User: " + q1);
console.log("AI: " + a1);
console.log("User: " + q2);
console.log("AI: " + a2); The batching is done automatically across sequences of the same context |
🎉 This PR is included in version 3.0.0 🎉 The release is available on: Your semantic-release bot 📦🚀 |
Description of change
BREAKING CHANGE: completely new API (docs will be updated before a stable version is released)
Closes #85
Fixes #102
Fixes #94
Fixes #93
Fixes #76
Things left to do (in other PRs)
LlamaChatSession
llama.cpp
logs by defaultPull-Request Checklist
master
branchnpm run format
to apply eslint formattingnpm run test
passes with this changeFixes #0000