Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[MTF] Add weighted-split-paths support #299

Merged

Conversation

thomasw21
Copy link
Member

This will allow to manually set validation dataset.

@thomasw21 thomasw21 force-pushed the thomas/add_train_test_validation_split_paths_feature branch from 6ba72df to 6513f6e Compare July 4, 2022 12:24
Base automatically changed from thomas/mtf_train_script to main July 5, 2022 14:03
@thomasw21 thomasw21 force-pushed the thomas/add_train_test_validation_split_paths_feature branch from 6513f6e to 66ce0cf Compare July 6, 2022 09:41
Copy link
Collaborator

@Muennighoff Muennighoff left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Trained successfully with the below setup

.SLURM:
TRAIN_DATA_PATH=$MEGATRON_DEEPSPEED_REPO/p3_train.txt
VALID_DATA_PATH=$MEGATRON_DEEPSPEED_REPO/p3_validation.txt

    --train-weighted-split-paths-path $TRAIN_DATA_PATH \
    --valid-weighted-split-paths-path $VALID_DATA_PATH \

.TXT:
"train: 1 0:1 /gpfswork/rech/six/commun/bigscience-training/p3t0/p3_t0_train"
"valid: 1 0:1 /gpfswork/rech/six/commun/bigscience-training/p3t0/p3_t0_validation"

OUTPUT:
[default0]: iteration        1/       2 | consumed samples:            4 | consumed tokens:         1024 | elapsed time per iteration (s): 2.34 | learning rate: 2.275E-04 | global batch size:     4 | lm loss: 1.237034E+01 | loss scale: 4096.0 | grad norm: 4.454 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 | samples per second: 1.712 | TFLOPs: 0.58 |
[default0]:[Rank 0] (after 1 iterations) memory (MB) | allocated: 4256.57470703125 | max allocated: 6819.63232421875 | reserved: 9644.0 | max reserved: 9644.0
[default0]: iteration        2/       2 | consumed samples:            8 | consumed tokens:         2048 | elapsed time per iteration (s): 0.36 | learning rate: 4.392E-05 | global batch size:     4 | lm loss: 1.195433E+01 | loss scale: 4096.0 | grad norm: 8.481 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 | samples per second: 11.266 | TFLOPs: 3.79 |
[default0]:[after training is done] datetime: 2022-07-06 12:09:41
[default0]:------------------------------------------------------------------------------------------------------------
[default0]:valid loss at the end of training for val data | lm loss value: 1.240334E+01 | lm loss PPL: 2.436142E+05 |
[default0]:------------------------------------------------------------------------------------------------------------
++ date

@thomasw21 thomasw21 merged commit 43ab0e0 into main Jul 6, 2022
@thomasw21 thomasw21 deleted the thomas/add_train_test_validation_split_paths_feature branch July 6, 2022 10:20
younesbelkada pushed a commit to younesbelkada/Megatron-DeepSpeed that referenced this pull request Sep 28, 2022
adammoody pushed a commit to adammoody/Megatron-DeepSpeed that referenced this pull request Dec 18, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants