forked from NVIDIA/Megatron-LM
-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merge with upstream #48
Open
Quentin-Anthony
wants to merge
2,585
commits into
Zyphra:main
Choose a base branch
from
NVIDIA:main
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Fix async_grad_allreduce deprecation warning See merge request ADLR/megatron-lm!2247
openai completions endpoint See merge request ADLR/megatron-lm!2212
Add unit tests for Mamba hybrid model sub-units See merge request ADLR/megatron-lm!2233
tests: Fix backoff See merge request ADLR/megatron-lm!2287
revert: Try/catch See merge request ADLR/megatron-lm!2288
More multimodal evals See merge request ADLR/megatron-lm!2174
Co-authored-by: Dingqing Yang <[email protected]> Co-authored-by: Dingqing Yang <[email protected]> Co-authored-by: Dingqing Yang <[email protected]> Co-authored-by: Dingqing Yang <[email protected]> Co-authored-by: Dingqing Yang <[email protected]> Co-authored-by: Dingqing Yang <[email protected]> Co-authored-by: Sangkug Lym <[email protected]>
tunable schedule with overlapping See merge request ADLR/megatron-lm!2117
[Test] Fix Config for RoPE Fusion See merge request ADLR/megatron-lm!2298
Add dist-ckpt support to encoder_pipeline_parallel See merge request ADLR/megatron-lm!2210
Add TestTransformerLayerInterface test See merge request ADLR/megatron-lm!2297
ci: Fix nightly tests See merge request ADLR/megatron-lm!2300
tests: Disable flaky test See merge request ADLR/megatron-lm!2302
tests: Disable modelopt test on dev See merge request ADLR/megatron-lm!2303
Remove `is_onnx_export_mode` import from TE See merge request ADLR/megatron-lm!2296
…requency Patterns and Configurable MoE FFN Hidden Size Co-authored-by: Zijie Yan <[email protected]> Co-authored-by: xuwenc <[email protected]>
Enhance MoE Architecture: Support MoE Layer Frequency Patterns and Configurable MoE FFN Hidden Size Closes #225 See merge request ADLR/megatron-lm!2230
Co-authored-by: Shanmugam Ramasamy <[email protected]> Co-authored-by: Oliver Koenig <[email protected]>
Resolve "Attention as a config option in mcore" Closes #326 See merge request ADLR/megatron-lm!2168
…memory allocation, no unnecessary casting/copying Co-authored-by: Mcore Bot <[email protected]>
sample index helper function, no unnecessary memory allocation, no unnecessary casting/copying See merge request ADLR/megatron-lm!2381
Fix peak memory consumption for NeMo See merge request ADLR/megatron-lm!2388
…gather object when running consistency check
[dist ckpt] Use gather object instead of all gather object when running consistency check See merge request ADLR/megatron-lm!2413
Co-authored-by: Cyril Meurillon <[email protected]> Co-authored-by: Deepak Narayanan <[email protected]> Co-authored-by: Cyril Meurillon <[email protected]>
Add functionality to re-run iterations See merge request ADLR/megatron-lm!2282
Bugfix in multimodal dataloader_provider See merge request ADLR/megatron-lm!2418
…ELayer into the spec Co-authored-by: Zijie Yan <[email protected]>
Refactor MoE specs: move all submodules of MoELayer into the spec Closes #314 See merge request ADLR/megatron-lm!2101
…ot spread corrupted values
Remove all-gather before first iteration to not spread corrupted values See merge request ADLR/megatron-lm!2414
move get_batch_on_this_cp_rank to mcore utils See merge request ADLR/megatron-lm!2404
Small VLM example See merge request ADLR/megatron-lm!2432
Co-authored-by: Cyril Meurillon <[email protected]>
Fix assert warning in !2282 See merge request ADLR/megatron-lm!2443
Co-authored-by: Cyril Meurillon <[email protected]>
Fix wrapping of external dataloaders See merge request ADLR/megatron-lm!2453
Fix moe dist-ckpt compatibility for !2230 See merge request ADLR/megatron-lm!2449
Llava pp > 1 fix See merge request ADLR/megatron-lm!2441
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
No description provided.