Skip to content

Commit

Permalink
Lightweight workflow to check for broken markdown links (#1271)
Browse files Browse the repository at this point in the history
  • Loading branch information
rasbt authored Apr 11, 2024
1 parent cbbea05 commit 88f6574
Show file tree
Hide file tree
Showing 4 changed files with 42 additions and 8 deletions.
38 changes: 38 additions & 0 deletions .github/workflows/check-links.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,38 @@
name: Check Markdown Links

on:
push:
branches:
- main
pull_request:
branches:
- main

jobs:
check-links:
runs-on: ubuntu-latest

steps:
- name: Checkout Repository
uses: actions/checkout@v3

- name: Install Markdown Link Checker
run: npm install -g markdown-link-check

- name: Create config for markdown link checker
run: |
echo '{
"projectBaseUrl":"${{ github.workspace }}",
"ignorePatterns": [
{
"pattern": "^#"
},
{
"pattern": "^https://falconllm.tii.ae"
}
]
}' > $GITHUB_WORKSPACE/md_checker_config.json
- name: Find Markdown Files and Check Links
run: |
find . -name \*.md -print0 | xargs -0 -n1 markdown-link-check -c $GITHUB_WORKSPACE/md_checker_config.json
2 changes: 1 addition & 1 deletion extensions/thunder/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -532,7 +532,7 @@ def backward_fn(saved_for_backward, cotangents):
t763 = unsloth_apply_rope_backward(t757, t21, t22, 1, 8, 4) # t763: "cuda:0 f32[2, 4, 3, 16]"
```

We provide a specific [pre-training script copy](unsloth/pretrain.py) that uses this executor.
We provide a specific [pre-training script copy](pretrain.py) that uses this executor.
Given the Unsloth results below, these hand-written kernels do not seem to be worth it, showcasing the power of automated fusion compilers like [NvFuser](https://github.com/NVIDIA/Fuser).

## Examples and benchmarks
Expand Down
2 changes: 1 addition & 1 deletion tutorials/prepare_dataset.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ Below is a table of all datasets that are currently supported in LitGPT:
|--------------|-------------|---------------------|--------------------------------------------------------------------------------------|---------------------------------------------------------------------------------------------------------------------------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| Alpaca | Finetuning | 51,759 samples | [URL](https://github.com/tatsu-lab/stanford_alpaca) | [URL](https://crfm.stanford.edu/2023/03/13/alpaca.html) | Attribution-NonCommercial 4.0 International, [URL](https://crfm.stanford.edu/2023/03/13/alpaca.html) |
| Alpaca-2k | Finetuning | 2000 samples | [URL](https://huggingface.co/datasets/mhenrichsen/alpaca_2k_test) | See Alpaca above | See Alpaca Above |
| Alpaca-GPT4 | Finetuning | 52,002 samples | [URL](https://github.com/Instruction-Tuning-with-GPT-4/GPT-4-LLM) | [URL](https://arxiv.org/abs/2304.03277) | Attribution-NonCommercial 4.0 International, [URL](https://github.com/Instruction-Tuning-with-GPT-4/GPT-4-LLM/blob/main/DATA_LICENSEl) |
| Alpaca-GPT4 | Finetuning | 52,002 samples | [URL](https://github.com/Instruction-Tuning-with-GPT-4/GPT-4-LLM) | [URL](https://arxiv.org/abs/2304.03277) | Attribution-NonCommercial 4.0 International, [URL](https://github.com/Instruction-Tuning-with-GPT-4/GPT-4-LLM/blob/main/DATA_LICENSE) |
| Alpaca Libre | Finetuning | 55,370 samples | [URL](https://github.com/mobarski/alpaca-libre) | - | CC0/MIT, [URL](https://github.com/mobarski/alpaca-libre) |
| Deita | Finetuning | 9,500 samples | [URL](https://huggingface.co/datasets/HuggingFaceH4/deita-10k-v0-sft/tree/main/data) | [URL](https://arxiv.org/abs/2312.15685) | MIT [URL](https://huggingface.co/datasets/hkust-nlp/deita-10k-v0/blob/main/README.md) |
| Dolly | Finetuning | 15,011 samples | [URL](https://github.com/databrickslabs/dolly/tree/master/data) | [URL](https://www.databricks.com/blog/2023/04/12/dolly-first-open-commercially-viable-instruction-tuned-llm) | CC-BY-SA, [URL](https://github.com/databrickslabs/dolly#model-overview) |
Expand Down
8 changes: 2 additions & 6 deletions tutorials/pretrain_tinyllama.md
Original file line number Diff line number Diff line change
Expand Up @@ -118,11 +118,7 @@ or change the model type and size by passing a different string to the model nam
litgpt pretrain --model_name Gemma-2b
```

The currently supported model names are contained in the [config.py](https://github.com/Lightning-AI/litgpt/litgpt/config.py) file.
You can

1) either search this file for lines containing "name =",
2) or run `litgpt download` without additional command line arguments
The currently supported model names can be listed by executing `litgpt pretrain` without any additional arguments.

Keep in mind that training with a single machine will take weeks. To speed up the process, you'll need access to a cluster.
Once you're in a cluster, you can follow [these instructions](https://lightning.ai/docs/fabric/stable/fundamentals/launch.html#launch-on-a-cluster)
Expand Down Expand Up @@ -190,4 +186,4 @@ The following [Lightning Studio](https://lightning.ai/lightning-ai/studios) temp
|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| <p align="left">[Prepare the TinyLlama 1T token dataset](https://lightning.ai/lightning-ai/studios/prepare-the-tinyllama-1t-token-dataset) <br> [<img src="https://pl-public-data.s3.amazonaws.com/assets_litgpt/readme/3.webp" width="300"></p>](https://lightning.ai/lightning-ai/studios/prepare-the-tinyllama-1t-token-dataset) | [Pretrain LLMs - TinyLlama 1.1B](https://lightning.ai/lightning-ai/studios/pretrain-llms-tinyllama-1-1b) <br> <p align="left">[<img src="https://pl-public-data.s3.amazonaws.com/assets_litgpt/readme/4.webp" width="300"></p>](https://lightning.ai/lightning-ai/studios/pretrain-llms-tinyllama-1-1b) |
| [Continued Pretraining with TinyLlama 1.1B](https://lightning.ai/lightning-ai/studios/continued-pretraining-with-tinyllama-1-1b) <br> <p align="left">[<img src="https://pl-public-data.s3.amazonaws.com/assets_litgpt/readme/1.webp" width="300"></p>](https://lightning.ai/lightning-ai/studios/continued-pretraining-with-tinyllama-1-1b) | |
| |
| |

0 comments on commit 88f6574

Please sign in to comment.