From 5d0dafbf9c4c835ae5ce6557e29848c5c9ccf5c0 Mon Sep 17 00:00:00 2001 From: Emmanuel Ferdman Date: Wed, 20 Nov 2024 19:29:19 +0200 Subject: [PATCH] Update dynamic loading report reference (#7321) Signed-off-by: Emmanuel Ferdman --- docs/gen-ai/README.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/docs/gen-ai/README.md b/docs/gen-ai/README.md index e828546d45..9f9bd027f1 100644 --- a/docs/gen-ai/README.md +++ b/docs/gen-ai/README.md @@ -11,6 +11,6 @@ This folder contains the design doc for GenAI Model package - [Tokenizer](./Tokenizer.md) ### Need further investigation -- [Dynamic loading](./DynamicLoading.md): load only part of model to GPU when gpu memory is limited. We explore the result w/o dynamic loading in [this report](../DynamicLoadingReport.md) +- [Dynamic loading](./DynamicLoading.md): load only part of model to GPU when gpu memory is limited. We explore the result w/o dynamic loading in [this report](./DynamicLoadingReport.md) - Improve loading speed: I notice that the model loading speed from disk to memory is slower in torchsharp than what it is in huggingface. Need to investigate the reason and improve the loading speed -- Quantization: quantize the model to reduce the model size and improve the inference speed \ No newline at end of file +- Quantization: quantize the model to reduce the model size and improve the inference speed