Skip to content

Commit

Permalink
Merge pull request microsoft#1 from rraminen/pipeclean_seq512_shell_s…
Browse files Browse the repository at this point in the history
…cript

Added ds_train_bert_bsz32k_seq512_pipeclean.sh
  • Loading branch information
jithunnair-amd authored Apr 21, 2021
2 parents fb62e6e + 2dfe9fd commit 53b28ad
Showing 1 changed file with 30 additions and 0 deletions.
30 changes: 30 additions & 0 deletions bing_bert/ds_train_bert_bsz32k_seq512_pipeclean.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,30 @@
#!/bin/bash

base_dir=`pwd`

# Where should we save checkpoints and tensorboard events?
JOB_NAME=lamb_32k_seq512_output
OUTPUT_DIR=${base_dir}/bert_model_outputs

# Assumes job name in previous seq128 run, will resume training from epoch 150

mkdir -p $OUTPUT_DIR

deepspeed ${base_dir}/deepspeed_train.py \
--cf ${base_dir}/bert_large_lamb_pipeclean.json \
--max_seq_length 512 \
--output_dir $OUTPUT_DIR \
--print_steps 100 \
--deepspeed \
--deepspeed_transformer_kernel \
--job_name $JOB_NAME \
--deepspeed_config ${base_dir}/deepspeed_bsz32k_lamb_config_seq512_pipeclean.json \
--validation_data_path_prefix /data/bert \
--data_path_prefix /data/bert \
--rewarmup \
--lr_schedule "EE" \
--attention_dropout_checkpoint \
--lr_offset 0.0 \
--load_training_checkpoint ${CHECKPOINT_BASE_PATH} \
--load_checkpoint_id ${CHECKPOINT_EPOCH150_NAME} \
&> ${JOB_NAME}.log

0 comments on commit 53b28ad

Please sign in to comment.