[MTF] Add `weighted-split-paths` support #299

thomasw21 · 2022-07-04T12:22:45Z

This will allow to manually set validation dataset.

Muennighoff

Trained successfully with the below setup

.SLURM:
TRAIN_DATA_PATH=$MEGATRON_DEEPSPEED_REPO/p3_train.txt
VALID_DATA_PATH=$MEGATRON_DEEPSPEED_REPO/p3_validation.txt

    --train-weighted-split-paths-path $TRAIN_DATA_PATH \
    --valid-weighted-split-paths-path $VALID_DATA_PATH \

.TXT:
"train: 1 0:1 /gpfswork/rech/six/commun/bigscience-training/p3t0/p3_t0_train"
"valid: 1 0:1 /gpfswork/rech/six/commun/bigscience-training/p3t0/p3_t0_validation"

OUTPUT:
[default0]: iteration        1/       2 | consumed samples:            4 | consumed tokens:         1024 | elapsed time per iteration (s): 2.34 | learning rate: 2.275E-04 | global batch size:     4 | lm loss: 1.237034E+01 | loss scale: 4096.0 | grad norm: 4.454 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 | samples per second: 1.712 | TFLOPs: 0.58 |
[default0]:[Rank 0] (after 1 iterations) memory (MB) | allocated: 4256.57470703125 | max allocated: 6819.63232421875 | reserved: 9644.0 | max reserved: 9644.0
[default0]: iteration        2/       2 | consumed samples:            8 | consumed tokens:         2048 | elapsed time per iteration (s): 0.36 | learning rate: 4.392E-05 | global batch size:     4 | lm loss: 1.195433E+01 | loss scale: 4096.0 | grad norm: 8.481 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 | samples per second: 11.266 | TFLOPs: 3.79 |
[default0]:[after training is done] datetime: 2022-07-06 12:09:41
[default0]:------------------------------------------------------------------------------------------------------------
[default0]:valid loss at the end of training for val data | lm loss value: 1.240334E+01 | lm loss PPL: 2.436142E+05 |
[default0]:------------------------------------------------------------------------------------------------------------
++ date

thomasw21 force-pushed the thomas/add_train_test_validation_split_paths_feature branch from 6ba72df to 6513f6e Compare July 4, 2022 12:24

Base automatically changed from thomas/mtf_train_script to main July 5, 2022 14:03

Add support for weighted train

66ce0cf

thomasw21 force-pushed the thomas/add_train_test_validation_split_paths_feature branch from 6513f6e to 66ce0cf Compare July 6, 2022 09:41

Muennighoff approved these changes Jul 6, 2022

View reviewed changes

thomasw21 merged commit 43ab0e0 into main Jul 6, 2022

thomasw21 deleted the thomas/add_train_test_validation_split_paths_feature branch July 6, 2022 10:20

younesbelkada pushed a commit to younesbelkada/Megatron-DeepSpeed that referenced this pull request Sep 28, 2022

Add support for weighted train (bigscience-workshop#299)

6885a39

adammoody pushed a commit to adammoody/Megatron-DeepSpeed that referenced this pull request Dec 18, 2023

fix typo error (bigscience-workshop#299)

7ca477d

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[MTF] Add `weighted-split-paths` support #299

[MTF] Add `weighted-split-paths` support #299

thomasw21 commented Jul 4, 2022

Muennighoff left a comment

[MTF] Add weighted-split-paths support #299

[MTF] Add weighted-split-paths support #299

Conversation

thomasw21 commented Jul 4, 2022

Muennighoff left a comment

Choose a reason for hiding this comment

[MTF] Add `weighted-split-paths` support #299

[MTF] Add `weighted-split-paths` support #299