Merge with upstream #48

Quentin-Anthony · 2024-01-15T18:32:40Z

No description provided.

Fix async_grad_allreduce deprecation warning See merge request ADLR/megatron-lm!2247

openai completions endpoint See merge request ADLR/megatron-lm!2212

Add unit tests for Mamba hybrid model sub-units See merge request ADLR/megatron-lm!2233

tests: Fix backoff See merge request ADLR/megatron-lm!2287

revert: Try/catch See merge request ADLR/megatron-lm!2288

More multimodal evals See merge request ADLR/megatron-lm!2174

Co-authored-by: Dingqing Yang <[email protected]> Co-authored-by: Dingqing Yang <[email protected]> Co-authored-by: Dingqing Yang <[email protected]> Co-authored-by: Dingqing Yang <[email protected]> Co-authored-by: Dingqing Yang <[email protected]> Co-authored-by: Dingqing Yang <[email protected]> Co-authored-by: Sangkug Lym <[email protected]>

tunable schedule with overlapping See merge request ADLR/megatron-lm!2117

[Test] Fix Config for RoPE Fusion See merge request ADLR/megatron-lm!2298

…allel

Add dist-ckpt support to encoder_pipeline_parallel See merge request ADLR/megatron-lm!2210

Add TestTransformerLayerInterface test See merge request ADLR/megatron-lm!2297

ci: Fix nightly tests See merge request ADLR/megatron-lm!2300

tests: Disable flaky test See merge request ADLR/megatron-lm!2302

tests: Disable modelopt test on dev See merge request ADLR/megatron-lm!2303

Remove `is_onnx_export_mode` import from TE See merge request ADLR/megatron-lm!2296

…requency Patterns and Configurable MoE FFN Hidden Size Co-authored-by: Zijie Yan <[email protected]> Co-authored-by: xuwenc <[email protected]>

Enhance MoE Architecture: Support MoE Layer Frequency Patterns and Configurable MoE FFN Hidden Size Closes #225 See merge request ADLR/megatron-lm!2230

Co-authored-by: Shanmugam Ramasamy <[email protected]> Co-authored-by: Oliver Koenig <[email protected]>

Resolve "Attention as a config option in mcore" Closes #326 See merge request ADLR/megatron-lm!2168

…memory allocation, no unnecessary casting/copying Co-authored-by: Mcore Bot <[email protected]>

sample index helper function, no unnecessary memory allocation, no unnecessary casting/copying See merge request ADLR/megatron-lm!2381

Fix peak memory consumption for NeMo See merge request ADLR/megatron-lm!2388

…gather object when running consistency check

[dist ckpt] Use gather object instead of all gather object when running consistency check See merge request ADLR/megatron-lm!2413

Co-authored-by: Cyril Meurillon <[email protected]> Co-authored-by: Deepak Narayanan <[email protected]> Co-authored-by: Cyril Meurillon <[email protected]>

Add functionality to re-run iterations See merge request ADLR/megatron-lm!2282

Bugfix in multimodal dataloader_provider See merge request ADLR/megatron-lm!2418

…ELayer into the spec Co-authored-by: Zijie Yan <[email protected]>

Refactor MoE specs: move all submodules of MoELayer into the spec Closes #314 See merge request ADLR/megatron-lm!2101

…ot spread corrupted values

Remove all-gather before first iteration to not spread corrupted values See merge request ADLR/megatron-lm!2414

move get_batch_on_this_cp_rank to mcore utils See merge request ADLR/megatron-lm!2404

Small VLM example See merge request ADLR/megatron-lm!2432

Co-authored-by: Cyril Meurillon <[email protected]>

Fix assert warning in !2282 See merge request ADLR/megatron-lm!2443

Co-authored-by: Cyril Meurillon <[email protected]>

Fix wrapping of external dataloaders See merge request ADLR/megatron-lm!2453

Fix moe dist-ckpt compatibility for !2230 See merge request ADLR/megatron-lm!2449

Llava pp > 1 fix See merge request ADLR/megatron-lm!2441

Quentin-Anthony self-assigned this Jan 15, 2024

yaox12 and others added 29 commits October 29, 2024 22:10

ADLR/megatron-lm!2247 - Fix async_grad_allreduce deprecation warning

dab850f

Merge branch 'xiny/fix_async_grad_allreduce' into 'main'

8742d09

Fix async_grad_allreduce deprecation warning See merge request ADLR/megatron-lm!2247

ADLR/megatron-lm!2212 - openai completions endpoint

f8fce3e

Merge branch 'sasatheesh/eval-harness-completions-offsets-1' into 'main'

66cc8c0

openai completions endpoint See merge request ADLR/megatron-lm!2212

ADLR/megatron-lm!2233 - Add unit tests for Mamba hybrid model sub-units

2e4e0d9

Merge branch 'papakipos/mamba-unit-tests' into 'main'

92ae1d7

Add unit tests for Mamba hybrid model sub-units See merge request ADLR/megatron-lm!2233

ADLR/megatron-lm!2287 - tests: Fix backoff

215d769

Merge branch 'ko3n1g/ci/retry-wait' into 'main'

c3eb3be

tests: Fix backoff See merge request ADLR/megatron-lm!2287

ADLR/megatron-lm!2288 - revert: Try/catch

7d43d84

Merge branch 'ko3n1g/ci/retry-wait' into 'main'

d546182

revert: Try/catch See merge request ADLR/megatron-lm!2288

ADLR/megatron-lm!2174 - More multimodal evals

9ed8473

Merge branch 'trintamaki/more-evals' into 'main'

2e2bdf6

More multimodal evals See merge request ADLR/megatron-lm!2174

Merge branch 'decouple_send_recv_issue' into 'main'

441cb92

tunable schedule with overlapping See merge request ADLR/megatron-lm!2117

ADLR/megatron-lm!2298 - [Test] Fix Config for RoPE Fusion

8a1dc8b

Merge branch 'xiny/fix_rope_config' into 'main'

d229a29

[Test] Fix Config for RoPE Fusion See merge request ADLR/megatron-lm!2298

ADLR/megatron-lm!2210 - Add dist-ckpt support to encoder_pipeline_par…

500b278

…allel

Merge branch 'mblaz/fix-t5-sharding' into 'main'

8539eba

Add dist-ckpt support to encoder_pipeline_parallel See merge request ADLR/megatron-lm!2210

ADLR/megatron-lm!2297 - Add TestTransformerLayerInterface test

0a0baaf

Merge branch 'helenn-transformer-interface-test' into 'main'

95fde59

Add TestTransformerLayerInterface test See merge request ADLR/megatron-lm!2297

ADLR/megatron-lm!2300 - ci: Fix nightly tests

1b4b868

Merge branch 'ko3n1g/ci/fix-nightly' into 'main'

bc3b890

ci: Fix nightly tests See merge request ADLR/megatron-lm!2300

ADLR/megatron-lm!2302 - tests: Disable flaky test

e81c7bb

Merge branch 'ko3n1g/test/flaky-test-bin-reader' into 'main'

3d27a9d

tests: Disable flaky test See merge request ADLR/megatron-lm!2302

ADLR/megatron-lm!2303 - tests: Disable modelopt test on dev

013d9f9

Merge branch 'ko3n1g/ci/fix-nightly' into 'main'

af4b5de

tests: Disable modelopt test on dev See merge request ADLR/megatron-lm!2303

ADLR/megatron-lm!2296 - Remove is_onnx_export_mode import from TE

769b03a

Merge branch 'ksivamani/rm_onnx_export' into 'main'

ad5ce45

Remove `is_onnx_export_mode` import from TE See merge request ADLR/megatron-lm!2296

ADLR/megatron-lm!2200 - Mixtral8x7b modelopt support

0197f6f

Shunkangz and others added 30 commits December 7, 2024 19:53

ADLR/megatron-lm!2230 - Enhance MoE Architecture: Support MoE Layer F…

47ab878

…requency Patterns and Configurable MoE FFN Hidden Size Co-authored-by: Zijie Yan <[email protected]> Co-authored-by: xuwenc <[email protected]>

Merge branch 'moe_freq_and_moe_offset' into 'main'

60d5b38

Enhance MoE Architecture: Support MoE Layer Frequency Patterns and Configurable MoE FFN Hidden Size Closes #225 See merge request ADLR/megatron-lm!2230

ADLR/megatron-lm!2168 - Resolve "Attention as a config option in mcore"

fa0dcc4

Co-authored-by: Shanmugam Ramasamy <[email protected]> Co-authored-by: Oliver Koenig <[email protected]>

Merge branch '326-attention-as-a-config-option-in-mcore' into 'main'

9dc7fef

Resolve "Attention as a config option in mcore" Closes #326 See merge request ADLR/megatron-lm!2168

ADLR/megatron-lm!2381 - sample index helper function, no unnecessary …

e059614

…memory allocation, no unnecessary casting/copying Co-authored-by: Mcore Bot <[email protected]>

Merge branch 'return-type-sample-idx' into 'main'

9665f2d

sample index helper function, no unnecessary memory allocation, no unnecessary casting/copying See merge request ADLR/megatron-lm!2381

ADLR/megatron-lm!2388 - Fix peak memory consumption for NeMo

7da20af

Merge branch 'xiny/fix_peak_mem' into 'main'

44fd429

Fix peak memory consumption for NeMo See merge request ADLR/megatron-lm!2388

ADLR/megatron-lm!2413 - [dist ckpt] Use gather object instead of all …

e7503a4

…gather object when running consistency check

Merge branch 'debug-ckpt-oom' into 'main'

d677ca3

[dist ckpt] Use gather object instead of all gather object when running consistency check See merge request ADLR/megatron-lm!2413

ADLR/megatron-lm!2282 - Add functionality to re-run iterations

cf84356

Co-authored-by: Cyril Meurillon <[email protected]> Co-authored-by: Deepak Narayanan <[email protected]> Co-authored-by: Cyril Meurillon <[email protected]>

Merge branch 'rerun_step' into 'main'

43fa44c

Add functionality to re-run iterations See merge request ADLR/megatron-lm!2282

ADLR/megatron-lm!2418 - Bugfix in multimodal dataloader_provider

f6f8434

Merge branch 'jbarker-main-patch-95366' into 'main'

6dfeb25

Bugfix in multimodal dataloader_provider See merge request ADLR/megatron-lm!2418

ADLR/megatron-lm!2101 - Refactor MoE specs: move all submodules of Mo…

aa2a45d

…ELayer into the spec Co-authored-by: Zijie Yan <[email protected]>

Merge branch 'hongxiaob/moe_spec' into 'main'

37cd8f2

Refactor MoE specs: move all submodules of MoELayer into the spec Closes #314 See merge request ADLR/megatron-lm!2101

ADLR/megatron-lm!2414 - Remove all-gather before first iteration to n…

44b6480

…ot spread corrupted values

Merge branch 'dnarayanan/skip_all_gather_first_iteration' into 'main'

d4e72c0

Remove all-gather before first iteration to not spread corrupted values See merge request ADLR/megatron-lm!2414

ADLR/megatron-lm!2404 - move get_batch_on_this_cp_rank to mcore utils

40fb590

Merge branch 'xren/cp_llava' into 'main'

215a2eb

move get_batch_on_this_cp_rank to mcore utils See merge request ADLR/megatron-lm!2404

ADLR/megatron-lm!2432 - Small VLM example

2aa3522

Merge branch 'trintamaki/small-model-example' into 'main'

371feef

Small VLM example See merge request ADLR/megatron-lm!2432

ADLR/megatron-lm!2443 - Fix assert warning in !2282

2816445

Co-authored-by: Cyril Meurillon <[email protected]>

Merge branch 'fix-assert-warning' into 'main'

fd69c2f

Fix assert warning in !2282 See merge request ADLR/megatron-lm!2443

ADLR/megatron-lm!2453 - Fix wrapping of external dataloaders

ebfc79b

Co-authored-by: Cyril Meurillon <[email protected]>

Merge branch 'fix-external-dataloader' into 'main'

99f23d2

Fix wrapping of external dataloaders See merge request ADLR/megatron-lm!2453

ADLR/megatron-lm!2449 - Fix moe dist-ckpt compatibility for !2230

17b92eb

Merge branch 'moe_distckpt_compatibility' into 'main'

40db706

Fix moe dist-ckpt compatibility for !2230 See merge request ADLR/megatron-lm!2449

ADLR/megatron-lm!2441 - Llava pp > 1 fix

de18820

Merge branch 'trintamaki/llava-pp-fix' into 'main'

183f568

Llava pp > 1 fix See merge request ADLR/megatron-lm!2441

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Merge with upstream #48

Merge with upstream #48

Quentin-Anthony commented Jan 15, 2024

Merge with upstream #48

Are you sure you want to change the base?

Merge with upstream #48

Conversation

Quentin-Anthony commented Jan 15, 2024