Currently the built-in inference engines are node-llama-cpp
, gpt4all
and transformers-js
(highly experimental). Install the corresponding peer dependency before using an engine.
Can be used for text-completion
and embedding
tasks. See the node-llama-cpp docs for more information.
Find available GGUF models on huggingface.co.
Can be used for text-completion
and embedding
tasks. You can find parameter docs here.
You can find available models here
Currently supporting speech-to-text
and image-to-text
tasks. See tests.
WIP. See tests.
You can also write your own engine implementation. See ./src/engines for how the built-in engines are implemented and here for examples of how to utilize custom engines to combine models and add multimodality to your chat completion endpoint. (Or to any other consumer of the ModelServer class.) Multiple ModelServers are allowed and can also be nested to create more complex pipelines.