You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Speed, stability, performance, simplicity! These are paramount concerns for freechat.
The current completion architecture using server.cpp works pretty well but has a few problems:
model switching sometimes breaks
model loading errors are not surfaced to the user, not captured
it's kind of complicated and is not portable to iOS
We can fix 1 and 2 but not 3 with the current arch. As model sizes trend smaller, 3 is making more and more sense.
I did a quick audit of the newish SwiftUI example in llama.cpp and it's fantastic and fast. Let's try migrating FreeChat to doing inference in Swift in the same way.
We should try not to edit llama.cpp.swift so that it can be maintained in llama.cpp. Maybe there is some fancy git or SPM way to link it in, but copying the file is easy to start.
The text was updated successfully, but these errors were encountered:
Speed, stability, performance, simplicity! These are paramount concerns for freechat.
The current completion architecture using server.cpp works pretty well but has a few problems:
We can fix 1 and 2 but not 3 with the current arch. As model sizes trend smaller, 3 is making more and more sense.
I did a quick audit of the newish SwiftUI example in llama.cpp and it's fantastic and fast. Let's try migrating FreeChat to doing inference in Swift in the same way.
We should try not to edit llama.cpp.swift so that it can be maintained in llama.cpp. Maybe there is some fancy git or SPM way to link it in, but copying the file is easy to start.
The text was updated successfully, but these errors were encountered: