From 24419c9044fe227ea806db370c1a30272d026f8a Mon Sep 17 00:00:00 2001
From: "Sun, Xuehao" <xuehao.sun@intel.com>
Date: Tue, 26 Mar 2024 16:50:15 +0800
Subject: [PATCH] update LLM recipes (#1692)

Signed-off-by: Sun, Xuehao <xuehao.sun@intel.com>
---
 docs/source/llm_recipes.md | 55 +++++++++++++++++++++++---------------
 1 file changed, 34 insertions(+), 21 deletions(-)

diff --git a/docs/source/llm_recipes.md b/docs/source/llm_recipes.md
index 356b145873c..55fa385fe19 100644
--- a/docs/source/llm_recipes.md
+++ b/docs/source/llm_recipes.md
@@ -1,27 +1,40 @@
-LLMs Quantization Recipes
----
+## LLMs Quantization Recipes
 
-Intel® Neural Compressor supported advanced large language models (LLMs) quantization technologies including SmoothQuant (SQ) and Weight-Only Quant (WOQ), 
-and verified a list of LLMs on 4th Gen Intel® Xeon® Scalable Processor (codenamed Sapphire Rapids) with [PyTorch](https://pytorch.org/), 
-[Intel® Extension for PyTorch](https://github.com/intel/intel-extension-for-pytorch) and [Intel® Extension for Transformers](https://github.com/intel/intel-extension-for-transformers).   
+Intel® Neural Compressor supported advanced large language models (LLMs) quantization technologies including SmoothQuant (SQ) and Weight-Only Quant (WOQ),
+and verified a list of LLMs on 4th Gen Intel® Xeon® Scalable Processor (codenamed Sapphire Rapids) with [PyTorch](https://pytorch.org/),
+[Intel® Extension for PyTorch](https://github.com/intel/intel-extension-for-pytorch) and [Intel® Extension for Transformers](https://github.com/intel/intel-extension-for-transformers).  
 This document aims to publish the specific recipes we achieved for the popular LLMs and help users to quickly get an optimized LLM with limited 1% accuracy loss.
 
-> Notes:  
-> - The quantization algorithms provide by [Intel® Neural Compressor](https://github.com/intel/neural-compressor) and the evaluate functions provide by [Intel® Extension for Transformers](https://github.com/intel/intel-extension-for-transformers).    
-> - The model list are continuing update, please expect to find more LLMs in the future. 
+> Notes:
+>
+> - The quantization algorithms provide by [Intel® Neural Compressor](https://github.com/intel/neural-compressor) and the evaluate functions provide by [Intel® Extension for Transformers](https://github.com/intel/intel-extension-for-transformers).
+> - The model list are continuing update, please expect to find more LLMs in the future.
 
 ## IPEX key models
-|          Models           | SQ INT8 | WOQ INT8 | WOQ INT4 |
-|:-------------------------:|:-------:|:--------:|:--------:|
-|    EleutherAI/gpt-j-6b    |    ✔    |    ✔     |    ✔    |
-|   facebook/opt-1.3b       |    ✔    |    ✔     |    ✔    |
-|     facebook/opt-30b      |    ✔    |    ✔     |    ✔    |
-| meta-llama/Llama-2-7b-hf  |    ✔    |    ✔     |    ✔    |
-| meta-llama/Llama-2-13b-hf |    ✔    |    ✔     |    ✔    |
-| meta-llama/Llama-2-70b-hf |    ✔    |    ✔     |    ✔    |
-|     tiiuae/falcon-40b     |    ✔    |    ✔     |    ✔    |
- 
+
+|             Models              | SQ INT8 | WOQ INT8 | WOQ INT4 |
+| :-----------------------------: | :-----: | :------: | :------: |
+|       EleutherAI/gpt-j-6b       |    ✔    |    ✔     |    ✔     |
+|        facebook/opt-1.3b        |    ✔    |    ✔     |    ✔     |
+|        facebook/opt-30b         |    ✔    |    ✔     |    ✔     |
+|    meta-llama/Llama-2-7b-hf     |   WIP   |    ✔     |    ✔     |
+|    meta-llama/Llama-2-13b-hf    |    ✔    |    ✔     |    ✔     |
+|    meta-llama/Llama-2-70b-hf    |    ✔    |    ✔     |    ✔     |
+|        tiiuae/falcon-7b         |    ✔    |    ✔     |    ✔     |
+|        tiiuae/falcon-40b        |    ✔    |    ✔     |    ✔     |
+| baichuan-inc/Baichuan-13B-Chat  |    ✔    |    ✔     |    ✔     |
+| baichuan-inc/Baichuan2-13B-Chat |    ✔    |    ✔     |    ✔     |
+| baichuan-inc/Baichuan2-7B-Chat  |    ✔    |    ✔     |    ✔     |
+|      bigscience/bloom-1b7       |    ✔    |    ✔     |    ✔     |
+|     databricks/dolly-v2-12b     |    ✖    |    ✔     |    ✖     |
+|     EleutherAI/gpt-neox-20b     |    ✖    |    ✔     |    ✖     |
+|    mistralai/Mistral-7B-v0.1    |    ✖    |    ✔     |    ✔     |
+|        THUDM/chatglm2-6b        |   WIP   |    ✔     |   WIP    |
+|        THUDM/chatglm3-6b        |   WIP   |    ✔     |   WIP    |
+
 **Detail recipes can be found [HERE](https://github.com/intel/intel-extension-for-transformers/blob/main/examples/huggingface/pytorch/text-generation/quantization/llm_quantization_recipes.md).**
-> Notes: 
-> - This model list comes from [IPEX](https://intel.github.io/intel-extension-for-pytorch/cpu/latest/tutorials/llm.html).  
-> - WOQ INT4 recipes will be published soon.    
+
+> Notes:
+>
+> - This model list comes from [IPEX](https://intel.github.io/intel-extension-for-pytorch/cpu/latest/tutorials/llm.html).
+> - The WIP recipes will be published soon.