Decomposed LLama

There is a LLama model code adopted for better debuging in Jupyter Notebook.

Llama code is taken from https://github.com/huggingface/transformers/blob/main/src/transformers/models/llama/modeling_llama.py and splitted into several files by classes. And after that an adoption is applied to LlamaAttention class to demonstration the technique.

Register Llama model

To use Llama model class from this repo instead fo transofrmers' one register it.

First of all download Llama model, for example https://huggingface.co/meta-llama/Llama-2-7b-chat-hf

from huggingface_hub import hf_hub_download

hf_hub_download(repo_id="meta-llama/Llama-2-7b-chat-hf", filename="config.json", local_dir="./", local_dir_use_symlinks=False)
hf_hub_download(repo_id="meta-llama/Llama-2-7b-chat-hf", filename="generation_config.json", local_dir="./", local_dir_use_symlinks=False)
hf_hub_download(repo_id="meta-llama/Llama-2-7b-chat-hf", filename="model-00001-of-00002.safetensors", local_dir="./", local_dir_use_symlinks=False)
hf_hub_download(repo_id="meta-llama/Llama-2-7b-chat-hf", filename="model-00002-of-00002.safetensors", local_dir="./", local_dir_use_symlinks=False)
hf_hub_download(repo_id="meta-llama/Llama-2-7b-chat-hf", filename="model.safetensors.index.json", local_dir="./", local_dir_use_symlinks=False)
hf_hub_download(repo_id="meta-llama/Llama-2-7b-chat-hf", filename="special_tokens_map.json", local_dir="./", local_dir_use_symlinks=False)
hf_hub_download(repo_id="meta-llama/Llama-2-7b-chat-hf", filename="tokenizer.json", local_dir="./", local_dir_use_symlinks=False)
hf_hub_download(repo_id="meta-llama/Llama-2-7b-chat-hf", filename="tokenizer.model", local_dir="./", local_dir_use_symlinks=False)
hf_hub_download(repo_id="meta-llama/Llama-2-7b-chat-hf", filename="tokenizer_config.json", local_dir="./", local_dir_use_symlinks=False)

Adjust config.json from the model directory by changing "model_type" field from "llama" to "llama_decomposed".

    "model_type": "llama_decomposed",

Register model with next code

from llm.model.LlamaForCasualLM import LlamaForCasualLM

LlamaForCasualLM.register_model()

Load model with repo class

model = AutoModelForCausalLM.from_pretrained(
    model_path,
    # other args
)

model_path here should point to the model directory with adjusted config.json file

Use better code for debug and development.

Model reload code

Example for LlamaAttention class

import llm.model.llama_attention_functions as llama_attention_functions

import importlib
importlib.reload(llama_attention_functions)

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
llm		llm
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Decomposed LLama

Register Llama model

Load model with repo class

Model reload code

About

Releases

Packages

Languages

kapulkin/llama_decomposed

Folders and files

Latest commit

History

Repository files navigation

Decomposed LLama

Register Llama model

Load model with repo class

Model reload code

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages