Try migrating from server architecture to llama.cpp.swift #42

psugihara · 2023-12-19T19:02:09Z

Speed, stability, performance, simplicity! These are paramount concerns for freechat.

The current completion architecture using server.cpp works pretty well but has a few problems:

model switching sometimes breaks
model loading errors are not surfaced to the user, not captured
it's kind of complicated and is not portable to iOS

We can fix 1 and 2 but not 3 with the current arch. As model sizes trend smaller, 3 is making more and more sense.

I did a quick audit of the newish SwiftUI example in llama.cpp and it's fantastic and fast. Let's try migrating FreeChat to doing inference in Swift in the same way.

We should try not to edit llama.cpp.swift so that it can be maintained in llama.cpp. Maybe there is some fancy git or SPM way to link it in, but copying the file is easy to start.

psugihara mentioned this issue Dec 19, 2023

Extract model api into a separate SPM package #40

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Try migrating from server architecture to llama.cpp.swift #42

Try migrating from server architecture to llama.cpp.swift #42

psugihara commented Dec 19, 2023

Try migrating from server architecture to llama.cpp.swift #42

Try migrating from server architecture to llama.cpp.swift #42

Comments

psugihara commented Dec 19, 2023