Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add OLMoE #32406

Merged
merged 33 commits into from
Sep 3, 2024
Merged
Show file tree
Hide file tree
Changes from 23 commits
Commits
Show all changes
33 commits
Select commit Hold shift + click to select a range
082973c
Add OLMoE
Muennighoff Jul 21, 2024
8588576
Add OLMoE
Muennighoff Jul 21, 2024
f1c569e
Updates
Muennighoff Jul 22, 2024
f6ea7c5
Make norm optional; add keys
Muennighoff Jul 23, 2024
4d56722
Add output
Muennighoff Jul 24, 2024
8b176d9
Add
Muennighoff Jul 24, 2024
452da8d
Fix dtype
Muennighoff Jul 24, 2024
140bafb
Fix eos config
Muennighoff Jul 31, 2024
91f95fd
Update
Muennighoff Jul 31, 2024
6c20b73
Add OLMoE
Muennighoff Aug 3, 2024
171602e
git pushMerge branch 'olmoe' of https://github.com/Muennighoff/transf…
Muennighoff Aug 3, 2024
30a4feb
Fix OLMoE path
Muennighoff Aug 3, 2024
698f156
Merge branch 'huggingface:main' into olmoe
Muennighoff Aug 3, 2024
474f8e8
Format
Muennighoff Aug 4, 2024
e7e2ce3
git stah popMerge branch 'olmoe' of https://github.com/Muennighoff/tr…
Muennighoff Aug 4, 2024
d3eeef0
Format
Muennighoff Aug 4, 2024
28cdfd8
Rmv copy statement
Muennighoff Aug 4, 2024
58aed4a
Rmv copy statement
Muennighoff Aug 4, 2024
f9fbd12
Format
Muennighoff Aug 4, 2024
16ed9e1
Add copies
Muennighoff Aug 4, 2024
b9a045a
Cp rotary
Muennighoff Aug 4, 2024
4c598be
Fix aming
Muennighoff Aug 4, 2024
50507ea
Fix naming
Muennighoff Aug 4, 2024
1d9b006
Merge branch 'huggingface:main' into olmoe
Muennighoff Aug 27, 2024
b9948cc
Update RoPE integration; num_logits_to_keep; Add copy statements
Muennighoff Aug 27, 2024
e97ae0e
Add eps to config
Muennighoff Aug 27, 2024
fd0baf5
Format
Muennighoff Aug 27, 2024
79e0ecc
Add aux loss
Muennighoff Aug 28, 2024
758a808
Adapt router_aux_loss_coef
Muennighoff Aug 28, 2024
efdcda6
Update md
Muennighoff Sep 3, 2024
42145af
Merge branch 'huggingface:main' into olmoe
Muennighoff Sep 3, 2024
34ef8f5
Adapt
Muennighoff Sep 3, 2024
30aace4
adapt tests
Muennighoff Sep 3, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions docs/source/en/_toctree.yml
Original file line number Diff line number Diff line change
Expand Up @@ -476,6 +476,8 @@
title: Nyströmformer
- local: model_doc/olmo
title: OLMo
- local: model_doc/olmoe
title: OLMoE
- local: model_doc/open-llama
title: Open-Llama
- local: model_doc/opt
Expand Down
1 change: 1 addition & 0 deletions docs/source/en/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -228,6 +228,7 @@ Flax), PyTorch, and/or TensorFlow.
| [Nougat](model_doc/nougat) | ✅ | ✅ | ✅ |
| [Nyströmformer](model_doc/nystromformer) | ✅ | ❌ | ❌ |
| [OLMo](model_doc/olmo) | ✅ | ❌ | ❌ |
| [OLMoE](model_doc/olmoe) | ✅ | ❌ | ❌ |
| [OneFormer](model_doc/oneformer) | ✅ | ❌ | ❌ |
| [OpenAI GPT](model_doc/openai-gpt) | ✅ | ✅ | ❌ |
| [OpenAI GPT-2](model_doc/gpt2) | ✅ | ✅ | ✅ |
Expand Down
45 changes: 45 additions & 0 deletions docs/source/en/model_doc/olmoe.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,45 @@
<!--

Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.

⚠️ Note that this file is in Markdown but contain specific syntax for our doc-builder (similar to MDX) that may not be
rendered properly in your Markdown viewer.

-->

# OLMoE

## Overview

The OLMoE model was proposed in [TODO](TODO) by TODO.
ArthurZucker marked this conversation as resolved.
Show resolved Hide resolved

OLMoE is a series of **O**pen **L**anguage **Mo**dels which are **M**ixture-**o**f-**E**xperts designed to enable the science of language models. We release all code, checkpoints, logs, and details involved in training these models.

The abstract from the paper is the following:

*TODO*

This model was contributed by [Muennighoff](https://hf.co/Muennighoff).
The original code can be found [here](https://github.com/allenai/OLMoE).


## OlmoeConfig

[[autodoc]] OlmoeConfig

## OlmoeModel

[[autodoc]] OlmoeModel
- forward

## OlmoeForCausalLM

[[autodoc]] OlmoeForCausalLM
- forward
2 changes: 2 additions & 0 deletions docs/source/en/perf_infer_gpu_one.md
Original file line number Diff line number Diff line change
Expand Up @@ -69,6 +69,7 @@ FlashAttention-2 is currently supported for the following architectures:
* [MusicGen Melody](https://huggingface.co/docs/transformers/model_doc/musicgen_melody#transformers.MusicgenMelodyModel)
* [NLLB](https://huggingface.co/docs/transformers/model_doc/nllb)
* [OLMo](https://huggingface.co/docs/transformers/model_doc/olmo#transformers.OlmoModel)
* [OLMoE](https://huggingface.co/docs/transformers/model_doc/olmoe#transformers.OlmoeModel)
* [OPT](https://huggingface.co/docs/transformers/model_doc/opt#transformers.OPTModel)
* [Phi](https://huggingface.co/docs/transformers/model_doc/phi#transformers.PhiModel)
* [Phi3](https://huggingface.co/docs/transformers/model_doc/phi3#transformers.Phi3Model)
Expand Down Expand Up @@ -216,6 +217,7 @@ For now, Transformers supports SDPA inference and training for the following arc
* [Jamba](https://huggingface.co/docs/transformers/model_doc/jamba#transformers.JambaModel)
* [Llama](https://huggingface.co/docs/transformers/model_doc/llama#transformers.LlamaModel)
* [OLMo](https://huggingface.co/docs/transformers/model_doc/olmo#transformers.OlmoModel)
* [OLMoE](https://huggingface.co/docs/transformers/model_doc/olmoe#transformers.OlmoeModel)
* [PaliGemma](https://huggingface.co/docs/transformers/model_doc/paligemma#transformers.PaliGemmaForConditionalGeneration)
* [Phi](https://huggingface.co/docs/transformers/model_doc/phi#transformers.PhiModel)
* [Idefics](https://huggingface.co/docs/transformers/model_doc/idefics#transformers.IdeficsModel)
Expand Down
14 changes: 14 additions & 0 deletions src/transformers/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -597,6 +597,7 @@
"models.nougat": ["NougatProcessor"],
"models.nystromformer": ["NystromformerConfig"],
"models.olmo": ["OlmoConfig"],
"models.olmoe": ["OlmoeConfig"],
"models.oneformer": [
"OneFormerConfig",
"OneFormerProcessor",
Expand Down Expand Up @@ -2766,6 +2767,13 @@
"OlmoPreTrainedModel",
]
)
_import_structure["models.olmoe"].extend(
[
"OlmoeForCausalLM",
"OlmoeModel",
"OlmoePreTrainedModel",
]
)
_import_structure["models.oneformer"].extend(
[
"OneFormerForUniversalSegmentation",
Expand Down Expand Up @@ -5288,6 +5296,7 @@
NystromformerConfig,
)
from .models.olmo import OlmoConfig
from .models.olmoe import OlmoeConfig
from .models.oneformer import (
OneFormerConfig,
OneFormerProcessor,
Expand Down Expand Up @@ -7201,6 +7210,11 @@
OlmoModel,
OlmoPreTrainedModel,
)
from .models.olmoe import (
OlmoeForCausalLM,
OlmoeModel,
OlmoePreTrainedModel,
)
from .models.oneformer import (
OneFormerForUniversalSegmentation,
OneFormerModel,
Expand Down
1 change: 1 addition & 0 deletions src/transformers/models/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -164,6 +164,7 @@
nougat,
nystromformer,
olmo,
olmoe,
oneformer,
openai,
opt,
Expand Down
2 changes: 2 additions & 0 deletions src/transformers/models/auto/configuration_auto.py
Original file line number Diff line number Diff line change
Expand Up @@ -182,6 +182,7 @@
("nougat", "VisionEncoderDecoderConfig"),
("nystromformer", "NystromformerConfig"),
("olmo", "OlmoConfig"),
("olmoe", "OlmoeConfig"),
("oneformer", "OneFormerConfig"),
("open-llama", "OpenLlamaConfig"),
("openai-gpt", "OpenAIGPTConfig"),
Expand Down Expand Up @@ -475,6 +476,7 @@
("nougat", "Nougat"),
("nystromformer", "Nyströmformer"),
("olmo", "OLMo"),
("olmoe", "OLMoE"),
("oneformer", "OneFormer"),
("open-llama", "OpenLlama"),
("openai-gpt", "OpenAI GPT"),
Expand Down
2 changes: 2 additions & 0 deletions src/transformers/models/auto/modeling_auto.py
Original file line number Diff line number Diff line change
Expand Up @@ -173,6 +173,7 @@
("nllb-moe", "NllbMoeModel"),
("nystromformer", "NystromformerModel"),
("olmo", "OlmoModel"),
("olmoe", "OlmoeModel"),
("oneformer", "OneFormerModel"),
("open-llama", "OpenLlamaModel"),
("openai-gpt", "OpenAIGPTModel"),
Expand Down Expand Up @@ -482,6 +483,7 @@
("musicgen_melody", "MusicgenMelodyForCausalLM"),
("mvp", "MvpForCausalLM"),
("olmo", "OlmoForCausalLM"),
("olmoe", "OlmoeForCausalLM"),
("open-llama", "OpenLlamaForCausalLM"),
("openai-gpt", "OpenAIGPTLMHeadModel"),
("opt", "OPTForCausalLM"),
Expand Down
1 change: 1 addition & 0 deletions src/transformers/models/auto/tokenization_auto.py
Original file line number Diff line number Diff line change
Expand Up @@ -340,6 +340,7 @@
),
),
("olmo", (None, "GPTNeoXTokenizerFast" if is_tokenizers_available() else None)),
("olmoe", (None, "GPTNeoXTokenizerFast" if is_tokenizers_available() else None)),
("oneformer", ("CLIPTokenizer", "CLIPTokenizerFast" if is_tokenizers_available() else None)),
(
"openai-gpt",
Expand Down
55 changes: 55 additions & 0 deletions src/transformers/models/olmoe/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,55 @@
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from typing import TYPE_CHECKING

from ...utils import (
OptionalDependencyNotAvailable,
_LazyModule,
is_torch_available,
)


_import_structure = {
"configuration_olmoe": ["OlmoeConfig"],
}

try:
if not is_torch_available():
raise OptionalDependencyNotAvailable()
except OptionalDependencyNotAvailable:
pass
else:
_import_structure["modeling_olmoe"] = [
"OlmoeForCausalLM",
"OlmoeModel",
"OlmoePreTrainedModel",
]

if TYPE_CHECKING:
from .configuration_olmoe import OlmoeConfig

try:
if not is_torch_available():
raise OptionalDependencyNotAvailable()
except OptionalDependencyNotAvailable:
pass
else:
from .modeling_olmoe import (
OlmoeForCausalLM,
OlmoeModel,
OlmoePreTrainedModel,
)

else:
import sys

sys.modules[__name__] = _LazyModule(__name__, globals()["__file__"], _import_structure, module_spec=__spec__)
Loading