Practical deployment of Generative AI models

Course covering practical aspects of deploying, optimizing, and monitoring Generative AI models. The course is divided into three modules: Deployment, Model Optimization, and Monitoring and Maintenance Deployments.

Module 1: Deployment

Covers various strategies for deploying Generative AI models starting from local deployment of Generative AI models on a laptop or workstation, followed by on-premise server-based deployments, then edge deployments, before finishing with cloud-based deployments. Cover the pros and cons of each strategy and the factors to consider when choosing a deployment strategy.

Module 1.1: Local deployments

LLaMA C++: Enable LLM inference with minimal setup and state-of-the-art performance on a wide variety of hardware - locally and in the cloud. See also LLama C++ Python for Python bindings.
LlamaFile: Make open-source LLMs more accessible to both developers and end users. Combines LLaMA C++ with Cosmopolitan Libc into one framework that collapses all the complexity of LLMs down to a single-file executable (called a "llamafile") that runs locally on most computers, with no installation.
Ollama (GitHub): Get up and running with Llama 3, Mistral, Gemma 2, and other large language models. Uses LLaMA C++ as the backend.
Open WebUI (GitHub): Extensible, self-hosted interface for AI that adapts to your workflow, all while operating entirely offline.
Jupyter AI: A generative AI extension for JupyterLab.

Additional relevant material:

Module 1.2: On-premise, server-based deployments

Module 1.3: Edge deployments

Module 1.4: Cloud-based deployments

Module 2: Model Optimization

Cover techniques for optimizing Generative AI models for deployment, such as model pruning, quantization, and distillation. Cover the trade-offs between model size, speed, and performance.

Module 3: Monitoring and Maintenance

Cover the importance of monitoring the performance of deployed models and updating them as needed. Discuss potential issues that might arise during deployment and how to troubleshoot them.

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Practical deployment of Generative AI models

Module 1: Deployment

Module 1.1: Local deployments

Module 1.2: On-premise, server-based deployments

Module 1.3: Edge deployments

Module 1.4: Cloud-based deployments

Module 2: Model Optimization

Module 3: Monitoring and Maintenance

About

Releases

Packages

License

kaust-generative-ai/practical-deployment-of-generative-ai-models

Folders and files

Latest commit

History

Repository files navigation

Practical deployment of Generative AI models

Module 1: Deployment

Module 1.1: Local deployments

Module 1.2: On-premise, server-based deployments

Module 1.3: Edge deployments

Module 1.4: Cloud-based deployments

Module 2: Model Optimization

Module 3: Monitoring and Maintenance

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Packages