Skip to content

Commit

Permalink
Update dynamic loading report reference (#7321)
Browse files Browse the repository at this point in the history
Signed-off-by: Emmanuel Ferdman <[email protected]>
  • Loading branch information
emmanuel-ferdman authored Nov 20, 2024
1 parent cfc1fb5 commit 5d0dafb
Showing 1 changed file with 2 additions and 2 deletions.
4 changes: 2 additions & 2 deletions docs/gen-ai/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,6 @@ This folder contains the design doc for GenAI Model package
- [Tokenizer](./Tokenizer.md)

### Need further investigation
- [Dynamic loading](./DynamicLoading.md): load only part of model to GPU when gpu memory is limited. We explore the result w/o dynamic loading in [this report](../DynamicLoadingReport.md)
- [Dynamic loading](./DynamicLoading.md): load only part of model to GPU when gpu memory is limited. We explore the result w/o dynamic loading in [this report](./DynamicLoadingReport.md)
- Improve loading speed: I notice that the model loading speed from disk to memory is slower in torchsharp than what it is in huggingface. Need to investigate the reason and improve the loading speed
- Quantization: quantize the model to reduce the model size and improve the inference speed
- Quantization: quantize the model to reduce the model size and improve the inference speed

0 comments on commit 5d0dafb

Please sign in to comment.