diff --git a/README.md b/README.md index 2cb8df6..8960560 100644 --- a/README.md +++ b/README.md @@ -124,7 +124,7 @@ For this we can choose as chunk size the window size. For each chunk, we thus ne # Sparse Mixture of Experts (SMoE) -Sparse Mixture of Experts allows one to decouple throughput from memory costs by only activating subsets of the overall model for each token. In this approach, each token is assigned to one or more "experts" -- a separate set of weights -- and only processed by sunch experts. This division happens at feedforward layers of the model. The expert models specialize in different aspects of the data, allowing them to capture complex patterns and make more accurate predictions. +Sparse Mixture of Experts allows one to decouple throughput from memory costs by only activating subsets of the overall model for each token. In this approach, each token is assigned to one or more "experts" -- a separate set of weights -- and only processed by such experts. This division happens at feedforward layers of the model. The expert models specialize in different aspects of the data, allowing them to capture complex patterns and make more accurate predictions. ![SMoE](assets/smoe.png)