Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: automatic batching #104

Merged
merged 13 commits into from
Nov 26, 2023
Merged

feat: automatic batching #104

merged 13 commits into from
Nov 26, 2023

Conversation

giladgd
Copy link
Contributor

@giladgd giladgd commented Nov 26, 2023

Description of change

  • feat: evaluate multiple sequences in parallel with automatic batching
  • feat: improve automatic chat wrapper resolution
  • feat: smart context shifting
  • feat: improve TS types
  • refactor: improve API
  • build: support beta releases
  • build: improve dev configurations

BREAKING CHANGE: completely new API (docs will be updated before a stable version is released)

Closes #85
Fixes #102
Fixes #94
Fixes #93
Fixes #76

Things left to do (in other PRs)

  • Update documentation
  • Use the smart context shifting support in LlamaChatSession
  • Add contexts manager to automatically create more contexts as needed
  • Improve grammar support
  • Try to disable llama.cpp logs by default
  • Add migration guide from v2 to v3
  • Add more tests

Pull-Request Checklist

  • Code is up-to-date with the master branch
  • npm run format to apply eslint formatting
  • npm run test passes with this change
  • This pull request links relevant issues as Fixes #0000
  • There are new or updated unit tests validating the change
  • Documentation has been updated to reflect this change
  • The new commits and pull request title follow conventions explained in pull request guidelines (PRs that do not follow this convention will not be merged)

@giladgd giladgd self-assigned this Nov 26, 2023
Copy link
Contributor

@ido-pluto ido-pluto left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Contributor

@ido-pluto ido-pluto left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@giladgd giladgd merged commit 4757af8 into beta Nov 26, 2023
14 checks passed
@giladgd giladgd deleted the gilad/autoBatching branch November 26, 2023 19:29
Copy link

🎉 This PR is included in version 3.0.0-beta.1 🎉

The release is available on:

Your semantic-release bot 📦🚀

@giladgd giladgd mentioned this pull request Dec 6, 2023
17 tasks
@giladgd giladgd added this to the v3.0.0 milestone Dec 16, 2023
@giladgd giladgd linked an issue Dec 16, 2023 that may be closed by this pull request
@giladgd giladgd linked an issue Jan 12, 2024 that may be closed by this pull request
3 tasks
@giladgd giladgd mentioned this pull request Mar 16, 2024
7 tasks
@Madd0g
Copy link

Madd0g commented Apr 17, 2024

is there a code snippet that shows how to correctly use batching?
I'm doing repetitive things in a loop and wondering how I might take advantage of this?

@giladgd
Copy link
Contributor Author

giladgd commented Apr 19, 2024

@Madd0g There will be a better example in the documentation when version 3 leaves the beta status soon, but for now, here's a simple example:

import {fileURLToPath} from "url";
import path from "path";
import {getLlama, LlamaChatSession} from "node-llama-cpp";

const __dirname = path.dirname(fileURLToPath(import.meta.url));

const llama = await getLlama();
const model = await llama.loadModel({
    modelPath: path.join(__dirname, "models", "dolphin-2.1-mistral-7b.Q4_K_M.gguf")
});
const context = await model.createContext({
    sequences: 2
});

const sequence1 = context.getSequence();
const sequence2 = context.getSequence();

const session1 = new LlamaChatSession({
    contextSequence: sequence1
});
const session2 = new LlamaChatSession({
    contextSequence: sequence2
});

const q1 = "Hi there, how are you?";
const q2 = "How much is 6+6?";

const [
    a1,
    a2
] = await Promise.all([
    session1.prompt(q1),
    session2.prompt(q2)
]);

console.log("User: " + q1);
console.log("AI: " + a1);

console.log("User: " + q2);
console.log("AI: " + a2);

The batching is done automatically across sequences of the same context

Copy link

github-actions bot commented Sep 24, 2024

🎉 This PR is included in version 3.0.0 🎉

The release is available on:

Your semantic-release bot 📦🚀

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Development

Successfully merging this pull request may close these issues.

Could not find a KV slot feat: automatic batching
3 participants