Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Try migrating from server architecture to llama.cpp.swift #42

Open
psugihara opened this issue Dec 19, 2023 · 0 comments
Open

Try migrating from server architecture to llama.cpp.swift #42

psugihara opened this issue Dec 19, 2023 · 0 comments

Comments

@psugihara
Copy link
Owner

Speed, stability, performance, simplicity! These are paramount concerns for freechat.

The current completion architecture using server.cpp works pretty well but has a few problems:

  1. model switching sometimes breaks
  2. model loading errors are not surfaced to the user, not captured
  3. it's kind of complicated and is not portable to iOS

We can fix 1 and 2 but not 3 with the current arch. As model sizes trend smaller, 3 is making more and more sense.

I did a quick audit of the newish SwiftUI example in llama.cpp and it's fantastic and fast. Let's try migrating FreeChat to doing inference in Swift in the same way.

We should try not to edit llama.cpp.swift so that it can be maintained in llama.cpp. Maybe there is some fancy git or SPM way to link it in, but copying the file is easy to start.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant