Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add OLMoE #32406

Merged
merged 33 commits into from
Sep 3, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
33 commits
Select commit Hold shift + click to select a range
082973c
Add OLMoE
Muennighoff Jul 21, 2024
8588576
Add OLMoE
Muennighoff Jul 21, 2024
f1c569e
Updates
Muennighoff Jul 22, 2024
f6ea7c5
Make norm optional; add keys
Muennighoff Jul 23, 2024
4d56722
Add output
Muennighoff Jul 24, 2024
8b176d9
Add
Muennighoff Jul 24, 2024
452da8d
Fix dtype
Muennighoff Jul 24, 2024
140bafb
Fix eos config
Muennighoff Jul 31, 2024
91f95fd
Update
Muennighoff Jul 31, 2024
6c20b73
Add OLMoE
Muennighoff Aug 3, 2024
171602e
git pushMerge branch 'olmoe' of https://github.com/Muennighoff/transf…
Muennighoff Aug 3, 2024
30a4feb
Fix OLMoE path
Muennighoff Aug 3, 2024
698f156
Merge branch 'huggingface:main' into olmoe
Muennighoff Aug 3, 2024
474f8e8
Format
Muennighoff Aug 4, 2024
e7e2ce3
git stah popMerge branch 'olmoe' of https://github.com/Muennighoff/tr…
Muennighoff Aug 4, 2024
d3eeef0
Format
Muennighoff Aug 4, 2024
28cdfd8
Rmv copy statement
Muennighoff Aug 4, 2024
58aed4a
Rmv copy statement
Muennighoff Aug 4, 2024
f9fbd12
Format
Muennighoff Aug 4, 2024
16ed9e1
Add copies
Muennighoff Aug 4, 2024
b9a045a
Cp rotary
Muennighoff Aug 4, 2024
4c598be
Fix aming
Muennighoff Aug 4, 2024
50507ea
Fix naming
Muennighoff Aug 4, 2024
1d9b006
Merge branch 'huggingface:main' into olmoe
Muennighoff Aug 27, 2024
b9948cc
Update RoPE integration; num_logits_to_keep; Add copy statements
Muennighoff Aug 27, 2024
e97ae0e
Add eps to config
Muennighoff Aug 27, 2024
fd0baf5
Format
Muennighoff Aug 27, 2024
79e0ecc
Add aux loss
Muennighoff Aug 28, 2024
758a808
Adapt router_aux_loss_coef
Muennighoff Aug 28, 2024
efdcda6
Update md
Muennighoff Sep 3, 2024
42145af
Merge branch 'huggingface:main' into olmoe
Muennighoff Sep 3, 2024
34ef8f5
Adapt
Muennighoff Sep 3, 2024
30aace4
adapt tests
Muennighoff Sep 3, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions docs/source/en/_toctree.yml
Original file line number Diff line number Diff line change
Expand Up @@ -490,6 +490,8 @@
title: Nyströmformer
- local: model_doc/olmo
title: OLMo
- local: model_doc/olmoe
title: OLMoE
- local: model_doc/open-llama
title: Open-Llama
- local: model_doc/opt
Expand Down
1 change: 1 addition & 0 deletions docs/source/en/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -233,6 +233,7 @@ Flax), PyTorch, and/or TensorFlow.
| [Nougat](model_doc/nougat) | ✅ | ✅ | ✅ |
| [Nyströmformer](model_doc/nystromformer) | ✅ | ❌ | ❌ |
| [OLMo](model_doc/olmo) | ✅ | ❌ | ❌ |
| [OLMoE](model_doc/olmoe) | ✅ | ❌ | ❌ |
| [OneFormer](model_doc/oneformer) | ✅ | ❌ | ❌ |
| [OpenAI GPT](model_doc/openai-gpt) | ✅ | ✅ | ❌ |
| [OpenAI GPT-2](model_doc/gpt2) | ✅ | ✅ | ✅ |
Expand Down
45 changes: 45 additions & 0 deletions docs/source/en/model_doc/olmoe.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,45 @@
<!--

Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.

⚠️ Note that this file is in Markdown but contain specific syntax for our doc-builder (similar to MDX) that may not be
rendered properly in your Markdown viewer.

-->

# OLMoE

## Overview

The OLMoE model was proposed in [OLMoE: Open Mixture-of-Experts Language Models](TODO) by Niklas Muennighoff, Luca Soldaini, Dirk Groeneveld, Kyle Lo, Jacob Morrison, Sewon Min, Weijia Shi, Pete Walsh, Oyvind Tafjord, Nathan Lambert, Yuling Gu, Shane Arora, Akshita Bhagia, Dustin Schwenk, David Wadden, Alexander Wettig, Binyuan Hui, Tim Dettmers, Douwe Kiela, Ali Farhadi, Noah A. Smith, Pang Wei Koh, Amanpreet Singh, Hannaneh Hajishirzi.

OLMoE is a series of **O**pen **L**anguage **Mo**dels using sparse **M**ixture-**o**f-**E**xperts designed to enable the science of language models. We release all code, checkpoints, logs, and details involved in training these models.

The abstract from the paper is the following:

*We introduce OLMoE, a fully open, state-of-the-art language model leveraging sparse Mixture-of-Experts (MoE). OLMoE-1B-7B has 7 billion (B) parameters but uses only 1B per input token. We pretrain it on 5 trillion tokens and further adapt it to create OLMoE-1B-7B-Instruct. Our models outperform all available models with similar active parameters, even surpassing larger ones like Llama2-13B-Chat and DeepSeekMoE-16B. We present various experiments on MoE training, analyze routing in our model showing high specialization, and open-source all aspects of our work: model weights, training data, code, and logs.*

This model was contributed by [Muennighoff](https://hf.co/Muennighoff).
The original code can be found [here](https://github.com/allenai/OLMoE).


## OlmoeConfig

[[autodoc]] OlmoeConfig

## OlmoeModel

[[autodoc]] OlmoeModel
- forward

## OlmoeForCausalLM

[[autodoc]] OlmoeForCausalLM
- forward
2 changes: 2 additions & 0 deletions docs/source/en/perf_infer_gpu_one.md
Original file line number Diff line number Diff line change
Expand Up @@ -71,6 +71,7 @@ FlashAttention-2 is currently supported for the following architectures:
* [Nemotron](https://huggingface.co/docs/transformers/model_doc/nemotron)
* [NLLB](https://huggingface.co/docs/transformers/model_doc/nllb)
* [OLMo](https://huggingface.co/docs/transformers/model_doc/olmo#transformers.OlmoModel)
* [OLMoE](https://huggingface.co/docs/transformers/model_doc/olmoe#transformers.OlmoeModel)
* [OPT](https://huggingface.co/docs/transformers/model_doc/opt#transformers.OPTModel)
* [Phi](https://huggingface.co/docs/transformers/model_doc/phi#transformers.PhiModel)
* [Phi3](https://huggingface.co/docs/transformers/model_doc/phi3#transformers.Phi3Model)
Expand Down Expand Up @@ -229,6 +230,7 @@ For now, Transformers supports SDPA inference and training for the following arc
* [Musicgen](https://huggingface.co/docs/transformers/model_doc/musicgen#transformers.MusicgenModel)
* [MusicGen Melody](https://huggingface.co/docs/transformers/model_doc/musicgen_melody#transformers.MusicgenMelodyModel)
* [OLMo](https://huggingface.co/docs/transformers/model_doc/olmo#transformers.OlmoModel)
* [OLMoE](https://huggingface.co/docs/transformers/model_doc/olmoe#transformers.OlmoeModel)
* [PaliGemma](https://huggingface.co/docs/transformers/model_doc/paligemma#transformers.PaliGemmaForConditionalGeneration)
* [Phi](https://huggingface.co/docs/transformers/model_doc/phi#transformers.PhiModel)
* [Phi3](https://huggingface.co/docs/transformers/model_doc/phi3#transformers.Phi3Model)
Expand Down
14 changes: 14 additions & 0 deletions src/transformers/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -603,6 +603,7 @@
"models.nougat": ["NougatProcessor"],
"models.nystromformer": ["NystromformerConfig"],
"models.olmo": ["OlmoConfig"],
"models.olmoe": ["OlmoeConfig"],
"models.oneformer": [
"OneFormerConfig",
"OneFormerProcessor",
Expand Down Expand Up @@ -2828,6 +2829,13 @@
"OlmoPreTrainedModel",
]
)
_import_structure["models.olmoe"].extend(
[
"OlmoeForCausalLM",
"OlmoeModel",
"OlmoePreTrainedModel",
]
)
_import_structure["models.oneformer"].extend(
[
"OneFormerForUniversalSegmentation",
Expand Down Expand Up @@ -5380,6 +5388,7 @@
NystromformerConfig,
)
from .models.olmo import OlmoConfig
from .models.olmoe import OlmoeConfig
from .models.oneformer import (
OneFormerConfig,
OneFormerProcessor,
Expand Down Expand Up @@ -7339,6 +7348,11 @@
OlmoModel,
OlmoPreTrainedModel,
)
from .models.olmoe import (
OlmoeForCausalLM,
OlmoeModel,
OlmoePreTrainedModel,
)
from .models.oneformer import (
OneFormerForUniversalSegmentation,
OneFormerModel,
Expand Down
1 change: 1 addition & 0 deletions src/transformers/models/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -169,6 +169,7 @@
nougat,
nystromformer,
olmo,
olmoe,
oneformer,
openai,
opt,
Expand Down
2 changes: 2 additions & 0 deletions src/transformers/models/auto/configuration_auto.py
Original file line number Diff line number Diff line change
Expand Up @@ -187,6 +187,7 @@
("nougat", "VisionEncoderDecoderConfig"),
("nystromformer", "NystromformerConfig"),
("olmo", "OlmoConfig"),
("olmoe", "OlmoeConfig"),
("oneformer", "OneFormerConfig"),
("open-llama", "OpenLlamaConfig"),
("openai-gpt", "OpenAIGPTConfig"),
Expand Down Expand Up @@ -488,6 +489,7 @@
("nougat", "Nougat"),
("nystromformer", "Nyströmformer"),
("olmo", "OLMo"),
("olmoe", "OLMoE"),
("oneformer", "OneFormer"),
("open-llama", "OpenLlama"),
("openai-gpt", "OpenAI GPT"),
Expand Down
2 changes: 2 additions & 0 deletions src/transformers/models/auto/modeling_auto.py
Original file line number Diff line number Diff line change
Expand Up @@ -178,6 +178,7 @@
("nllb-moe", "NllbMoeModel"),
("nystromformer", "NystromformerModel"),
("olmo", "OlmoModel"),
("olmoe", "OlmoeModel"),
("oneformer", "OneFormerModel"),
("open-llama", "OpenLlamaModel"),
("openai-gpt", "OpenAIGPTModel"),
Expand Down Expand Up @@ -498,6 +499,7 @@
("mvp", "MvpForCausalLM"),
("nemotron", "NemotronForCausalLM"),
("olmo", "OlmoForCausalLM"),
("olmoe", "OlmoeForCausalLM"),
("open-llama", "OpenLlamaForCausalLM"),
("openai-gpt", "OpenAIGPTLMHeadModel"),
("opt", "OPTForCausalLM"),
Expand Down
1 change: 1 addition & 0 deletions src/transformers/models/auto/tokenization_auto.py
Original file line number Diff line number Diff line change
Expand Up @@ -342,6 +342,7 @@
),
),
("olmo", (None, "GPTNeoXTokenizerFast" if is_tokenizers_available() else None)),
("olmoe", (None, "GPTNeoXTokenizerFast" if is_tokenizers_available() else None)),
("oneformer", ("CLIPTokenizer", "CLIPTokenizerFast" if is_tokenizers_available() else None)),
(
"openai-gpt",
Expand Down
55 changes: 55 additions & 0 deletions src/transformers/models/olmoe/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,55 @@
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from typing import TYPE_CHECKING

from ...utils import (
OptionalDependencyNotAvailable,
_LazyModule,
is_torch_available,
)


_import_structure = {
"configuration_olmoe": ["OlmoeConfig"],
}

try:
if not is_torch_available():
raise OptionalDependencyNotAvailable()
except OptionalDependencyNotAvailable:
pass
else:
_import_structure["modeling_olmoe"] = [
"OlmoeForCausalLM",
"OlmoeModel",
"OlmoePreTrainedModel",
]

if TYPE_CHECKING:
from .configuration_olmoe import OlmoeConfig

try:
if not is_torch_available():
raise OptionalDependencyNotAvailable()
except OptionalDependencyNotAvailable:
pass
else:
from .modeling_olmoe import (
OlmoeForCausalLM,
OlmoeModel,
OlmoePreTrainedModel,
)

else:
import sys

sys.modules[__name__] = _LazyModule(__name__, globals()["__file__"], _import_structure, module_spec=__spec__)
Loading
Loading