From 67e52579da032ab919a8324596143c217f531c3a Mon Sep 17 00:00:00 2001 From: Andrei-Aksionov <58434077+Andrei-Aksionov@users.noreply.github.com> Date: Sat, 22 Jul 2023 02:46:05 +0300 Subject: [PATCH] Tutorial markdowns small fixes (#295) --- tutorials/download_llama_2.md | 4 ++-- tutorials/finetune_adapter.md | 20 +++++++++++--------- tutorials/finetune_full.md | 15 ++++++++------- tutorials/finetune_lora.md | 15 ++++++++------- tutorials/inference.md | 4 +++- tutorials/oom.md | 4 ++-- tutorials/quantize.md | 1 + tutorials/tpus.md | 3 ++- 8 files changed, 37 insertions(+), 29 deletions(-) diff --git a/tutorials/download_llama_2.md b/tutorials/download_llama_2.md index edcbfe713f..d835771ae7 100644 --- a/tutorials/download_llama_2.md +++ b/tutorials/download_llama_2.md @@ -29,8 +29,8 @@ meta-llama/Llama-2-70b-chat-hf In order to use a specific checkpoint, for instance [Llama-2-7b-chat-hf](https://huggingface.co/meta-llama/Llama-2-7b-chat-hf), download the weights and convert the checkpoint to the lit-gpt format. -This requires that you've been granted access to the weights on the HuggingFace hub. You can do so by following the steps at https://huggingface.co/meta-llama/Llama-2-7b. -After access is granted, you can find your HF hub token in https://huggingface.co/settings/tokens. +This requires that you've been granted access to the weights on the HuggingFace hub. You can do so by following the steps at . +After access is granted, you can find your HF hub token in . ```bash pip install huggingface_hub diff --git a/tutorials/finetune_adapter.md b/tutorials/finetune_adapter.md index 90acb8185a..62b2b529c6 100644 --- a/tutorials/finetune_adapter.md +++ b/tutorials/finetune_adapter.md @@ -30,7 +30,7 @@ python finetune/adapter.py --checkpoint_dir checkpoints/stabilityai/stablelm-bas or for Adapter V2 -```bash +```bash python finetune/adapter_v2.py --checkpoint_dir checkpoints/stabilityai/stablelm-base-alpha-3b ``` @@ -40,6 +40,7 @@ Depending on the available GPU memory, you can also tune the `micro_batch_size` To fit Adapter V2 to 12GB memory set micro_batch_size = 2. For example, the following settings will let you finetune the model in under 1 hour: + ```python devices = 4 micro_batch_size = 4 @@ -78,27 +79,29 @@ python generate/adapter.py \ or for Adapter V2 -```bash +```bash python generate/adapter_v2.py \ --prompt "Recommend a movie to watch on the weekend." \ --checkpoint_dir checkpoints/stabilityai/stablelm-base-alpha-3b ``` Output: -``` + +```text A good movie to watch on the weekend would be The Lion King, since it's a classic family film that everyone can enjoy... ``` + If your GPU supports `bfloat16`, the script will automatically use it. ## Tune on your dataset With only a few modifications, you can prepare and train on your own instruction dataset. -1. Create a json file in which each row holds one instruction-response pair. - A row has an entry for 'instruction', 'input', and 'output', where 'input' is optional an can be +1. Create a json file in which each row holds one instruction-response pair. + A row has an entry for 'instruction', 'input', and 'output', where 'input' is optional an can be the empty string if the instruction doesn't require a context. Below is an example json file: - ``` + ```text [ { "instruction": "Arrange the given numbers in ascending order.", @@ -123,7 +126,7 @@ With only a few modifications, you can prepare and train on your own instruction ``` 5. Run `finetune/adapter.py` by passing in the location of your data (and optionally other parameters): - + ```bash python finetune/adapter.py \ --data_dir data/mydata/ \ @@ -131,8 +134,7 @@ With only a few modifications, you can prepare and train on your own instruction --out_dir data/mydata-finetuned ``` - ## Troubleshooting If you run into a CUDA error "Expected is_sm80 to be true, but got false", uncomment the line -`torch.backends.cuda.enable_flash_sdp(False)` in the finetune script (see https://github.com/Lightning-AI/lit-llama/issues/101). +`torch.backends.cuda.enable_flash_sdp(False)` in the finetune script (see ). diff --git a/tutorials/finetune_full.md b/tutorials/finetune_full.md index 400b22491c..ad44d6d658 100644 --- a/tutorials/finetune_full.md +++ b/tutorials/finetune_full.md @@ -53,20 +53,22 @@ python generate/full.py \ ``` Output: -``` + +```text A good movie to watch on the weekend would be The Lion King, since it's a classic family film that everyone can enjoy... ``` + If your GPU supports `bfloat16`, the script will automatically use it. ## Tune on your dataset With only a few modifications, you can prepare and train on your own instruction dataset. -1. Create a json file in which each row holds one instruction-response pair. - A row has an entry for 'instruction', 'input', and 'output', where 'input' is optional an can be +1. Create a json file in which each row holds one instruction-response pair. + A row has an entry for 'instruction', 'input', and 'output', where 'input' is optional an can be the empty string if the instruction doesn't require a context. Below is an example json file: - ``` + ```text [ { "instruction": "Arrange the given numbers in ascending order.", @@ -91,7 +93,7 @@ With only a few modifications, you can prepare and train on your own instruction ``` 5. Run `finetune/full.py` by passing in the location of your data (and optionally other parameters): - + ```bash python finetune/full.py \ --data_dir data/mydata/ \ @@ -99,8 +101,7 @@ With only a few modifications, you can prepare and train on your own instruction --out_dir data/mydata-finetuned ``` - ## Troubleshooting If you run into a CUDA error "Expected is_sm80 to be true, but got false", uncomment the line -`torch.backends.cuda.enable_flash_sdp(False)` in the finetune script (see https://github.com/Lightning-AI/lit-llama/issues/101). +`torch.backends.cuda.enable_flash_sdp(False)` in the finetune script (see ). diff --git a/tutorials/finetune_lora.md b/tutorials/finetune_lora.md index d244383d1f..f448605f6c 100644 --- a/tutorials/finetune_lora.md +++ b/tutorials/finetune_lora.md @@ -45,8 +45,10 @@ You can test the finetuned model with your own instructions by running: ```bash python generate/lora.py --prompt "Recommend a movie to watch on the weekend." ``` + Output: -``` + +```text I would recommend the movie The Martian (2015). It is a sci-fi movie starring Matt Damon that follows the story of... ``` @@ -56,11 +58,11 @@ If your GPU supports `bfloat16`, you can additionally pass `--precision bf16-tru With only a few modifications, you can prepare and train on your own instruction dataset. -1. Create a json file in which each row holds one instruction-response pair. - A row has an entry for 'instruction', 'input', and 'output', where 'input' is optional an can be +1. Create a json file in which each row holds one instruction-response pair. + A row has an entry for 'instruction', 'input', and 'output', where 'input' is optional an can be the empty string if the instruction doesn't require a context. Below is an example json file: - ``` + ```text [ { "instruction": "Arrange the given numbers in ascending order.", @@ -85,13 +87,12 @@ With only a few modifications, you can prepare and train on your own instruction ``` 5. Run `finetune/lora.py` by passing in the location of your data (and optionally other parameters): - + ```bash python finetune/lora.py --data_dir data/mydata/ --out_dir out/myexperiment ``` - ## Troubleshooting If you run into a CUDA error "Expected is_sm80 to be true, but got false", uncomment the line -`torch.backends.cuda.enable_flash_sdp(False)` in the script below (see https://github.com/Lightning-AI/lit-llama/issues/101). +`torch.backends.cuda.enable_flash_sdp(False)` in the script below (see ). diff --git a/tutorials/inference.md b/tutorials/inference.md index 7f62387278..bb21cef4db 100644 --- a/tutorials/inference.md +++ b/tutorials/inference.md @@ -5,8 +5,10 @@ We demonstrate how to run inference (next token prediction) with the GPT base mo ```bash python generate/base.py --prompt "Hello, my name is" --checkpoint_dir checkpoints/stabilityai/stablelm-base-alpha-3b ``` + Output: -``` + +```text Hello, my name is Levi Durrer, I'm an Austrian journalist - Chairman of the Press Blair Party, with 37 years in the Press Blair International, and two years in the Spectre of Austerity for the other. I'm crossing my fingers that you will feel ``` diff --git a/tutorials/oom.md b/tutorials/oom.md index 383dc3be93..b472067a54 100644 --- a/tutorials/oom.md +++ b/tutorials/oom.md @@ -1,10 +1,10 @@ -## Dealing with out-of-memory (OOM) errors: +## Dealing with out-of-memory (OOM) errors If you got this error while running a script ```bash OutOfMemoryError: CUDA out of memory. Tried to allocate 2.22 GiB. GPU 0 has a total capacty of 79.15 GiB of which 228.38 MiB is free. Including non-PyTorch memory, this process -has 78.93 GiB memory in use. Of the allocated memory 76.28 GiB is allocated by PyTorch, and 2.14 GiB is reserved by PyTorch but unallocated. If reserved but unallocated memory +has 78.93 GiB memory in use. Of the allocated memory 76.28 GiB is allocated by PyTorch, and 2.14 GiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF ``` diff --git a/tutorials/quantize.md b/tutorials/quantize.md index a9dcf9be88..46fe9cd527 100644 --- a/tutorials/quantize.md +++ b/tutorials/quantize.md @@ -24,6 +24,7 @@ To reduce the memory requirements further, Lit-GPT supports several quantization Enabled with [bitsandbytes](https://github.com/TimDettmers/bitsandbytes). Check out the [paper](https://arxiv.org/abs/2305.14314v1) to learn more about how it works. > **Note**: `bitsandbytes` only supports `CUDA` devices and the `Linux` operating system. +Windows users should use [WSL2](https://learn.microsoft.com/en-us/windows/ai/directml/gpu-cuda-in-wsl). Uses the normalized float 4 (nf4) data type. This is recommended over "fp4" based on the paper's experimental results and theoretical analysis. diff --git a/tutorials/tpus.md b/tutorials/tpus.md index c547cee4cf..86092fd570 100644 --- a/tutorials/tpus.md +++ b/tutorials/tpus.md @@ -59,7 +59,8 @@ You'll notice that afterwards, generation times drop to ~2s. Coming soon. > **Warning** -> When you are done, remember to delete your instance +> When you are done, remember to delete your instance +> > ```shell > gcloud compute tpus tpu-vm delete lit-gpt --zone=us-central2-b > ```