Skip to content

Commit

Permalink
updated with 2023 material
Browse files Browse the repository at this point in the history
  • Loading branch information
sraskar committed Aug 11, 2023
1 parent a6ec584 commit baff867
Show file tree
Hide file tree
Showing 3 changed files with 2 additions and 55 deletions.
Binary file added 05_aiTestbed/AI Testbeds Hands-on ATPESC2023.pdf
Binary file not shown.
Binary file not shown.
57 changes: 2 additions & 55 deletions 05_aiTestbed/README.md
Original file line number Diff line number Diff line change
@@ -1,56 +1,3 @@
# BERT (language model) is selected on hands-on section

Bidirectional Encoder Representations from Transformers (BERT) is a transformer-based machine learning technique for natural language processing (NLP) pre-training developed by Google.


# SambaNova

1. Login to SN:
```
ssh [email protected]
ssh sm-01 (or sm-02)
```

2. SDK setup:
```
source /software/sambanova/envs/sn_env.sh
```

3. Copy scripts:
```
cp /var/tmp/Additional/slurm/Models/ANL_Acceptance_RC1_11_5/bert_train-inf.sh ~/
```

4. Run scripts:
```
cd ~
./bert_train-inf.sh
```


# Cerebras

1. Login to CS-2:
```
ssh [email protected]
ssh cs2-01-med1
```

2. Copy scripts:
```
cp -r /software/cerebras/model_zoo ~/
cd model_zoo/modelzoo/transformers/tf/bert
```
Ignore any permissions errors during the copy of the subdirectory `modelzoo-R1.3.0_2/`.


Next, modify `data_dir` to `'/software/cerebras/dataset/bert_large/msl128/'` in `configs/params_bert_large_msl128.yaml`, **in two places**.

4. Run scripts:
```
MODELDIR=model_dir_bert_large_msl128_$(hostname)
rm -r $MODELDIR
time -p csrun_cpu python run.py --mode=train --compile_only --params configs/params_bert_large_msl128.yaml --model_dir $MODELDIR --cs_ip $CS_IP
time -p csrun_wse python run.py --mode=train --params configs/params_bert_large_msl128.yaml --model_dir $MODELDIR --cs_ip $CS_IP
```
# Introduction to AI Testbeds at ALCF and hands

Please refer to [Slides here](./AI%20Testbeds%20Hands-on%20ATPESC2023.pdf) and [online documentation](https://docs.alcf.anl.gov/ai-testbed/getting-started/)

0 comments on commit baff867

Please sign in to comment.