Template based on llama.cpp to run Llama2&3 inference inside of Codesphere.
The CI pipeline is configured to fetch a pre-converted and quantized llama 3 model from Huggingface and run the http server example, README with config options can be found in the /examples/server directory.