forked from Mozilla-Ocho/llamafile
-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
You can now build and run `o//llamafile/server/main` which launches an HTTP server that currently supports a single endpoint at /tokenize. If wrk sends it a request to tokenize a string that has 51 tokens then it serves two million requests per second on my workstation, where 99 pct latency is 179 µs. This server is designed to be crash proof, reliable and preeempting. Workers are able to be asynchronously canceled so the supervisor thread can respawn them. Cosmo's new memory allocator helps this server be high performance for llama.cpp's STL-heavy use case too
- Loading branch information
Showing
39 changed files
with
2,902 additions
and
121 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.