Skip to content

kaust-generative-ai/practical-deployment-of-generative-ai-models

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 

Repository files navigation

Practical deployment of Generative AI models

Course covering practical aspects of deploying, optimizing, and monitoring Generative AI models. The course is divided into three modules: Deployment, Model Optimization, and Monitoring and Maintenance Deployments.

Module 1: Deployment

Covers various strategies for deploying Generative AI models starting from local deployment of Generative AI models on a laptop or workstation, followed by on-premise server-based deployments, then edge deployments, before finishing with cloud-based deployments. Cover the pros and cons of each strategy and the factors to consider when choosing a deployment strategy.

Module 1.1: Local deployments

  1. LLaMA C++: Enable LLM inference with minimal setup and state-of-the-art performance on a wide variety of hardware - locally and in the cloud. See also LLama C++ Python for Python bindings.
  2. LlamaFile: Make open-source LLMs more accessible to both developers and end users. Combines LLaMA C++ with Cosmopolitan Libc into one framework that collapses all the complexity of LLMs down to a single-file executable (called a "llamafile") that runs locally on most computers, with no installation.
  3. Ollama (GitHub): Get up and running with Llama 3, Mistral, Gemma 2, and other large language models. Uses LLaMA C++ as the backend.
  4. Open WebUI (GitHub): Extensible, self-hosted interface for AI that adapts to your workflow, all while operating entirely offline.
  5. Jupyter AI: A generative AI extension for JupyterLab.

Additional relevant material:

Module 1.2: On-premise, server-based deployments

Module 1.3: Edge deployments

Module 1.4: Cloud-based deployments

Module 2: Model Optimization

Cover techniques for optimizing Generative AI models for deployment, such as model pruning, quantization, and distillation. Cover the trade-offs between model size, speed, and performance.

Module 3: Monitoring and Maintenance

Cover the importance of monitoring the performance of deployed models and updating them as needed. Discuss potential issues that might arise during deployment and how to troubleshoot them.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published