Llama3 and Llama2 are ExecuTorch compatible #34101

guangy10 · 2024-10-11T19:39:45Z

What does this PR do?

Llama 2&# is compatible with ExecuTorch.

Note that LLama2&3 in ExecuTorch repo has been fully optimized for SOTA perf using its own model definition and optimization. You can read details in https://github.com/pytorch/executorch/tree/main/examples/models/llama2. The work here is to make the Llama model compatible with ExecuTorch using HuggingFace's model definition.

Additional Test in `ExecuTorch`

Running Llama-3.2-1B E2E:
cmake-out/examples/models/llama2/llama_main --tokenizer_path=tokenizer_llama3_1b.model --model_path=llama3_1b.pte --prompt="My name is"

I 00:00:00.000599 executorch:cpuinfo_utils.cpp:61] Reading file /sys/devices/soc0/image_version
I 00:00:00.000638 executorch:cpuinfo_utils.cpp:77] Failed to open midr file /sys/devices/soc0/image_version
I 00:00:00.000643 executorch:cpuinfo_utils.cpp:157] Number of efficient cores 4
I 00:00:00.000645 executorch:main.cpp:69] Resetting threadpool with num threads = 6
I 00:00:00.002350 executorch:runner.cpp:59] Creating LLaMa runner: model_path=llama3_1b.pte, tokenizer_path=tokenizer_llama3_1b.model
I 00:00:01.442476 executorch:runner.cpp:88] Reading metadata from model
I 00:00:01.442494 executorch:runner.cpp:111] Methond use_sdpa_with_kv_cache not found, using the default value 0
I 00:00:01.442496 executorch:runner.cpp:113] Metadata: use_sdpa_with_kv_cache = 0
I 00:00:01.442501 executorch:runner.cpp:113] Metadata: use_kv_cache = 1
I 00:00:01.442503 executorch:runner.cpp:113] Metadata: get_vocab_size = 128256
I 00:00:01.442505 executorch:runner.cpp:113] Metadata: get_bos_id = 128000
I 00:00:01.442507 executorch:runner.cpp:113] Metadata: get_max_seq_len = 123
I 00:00:01.442508 executorch:runner.cpp:111] Methond enable_dynamic_shape not found, using the default value 0
I 00:00:01.442509 executorch:runner.cpp:113] Metadata: enable_dynamic_shape = 0
I 00:00:01.442512 executorch:runner.cpp:174] RSS after loading model: 0.000000 MiB (0 if unsupported)
I 00:00:01.691377 executorch:runner.cpp:243] RSS after prompt prefill: 0.000000 MiB (0 if unsupported)
My name is Alex.
I am a retired Army Captain, former air traffic controller and current writer.
I have been a reader for many years, and I have always enjoyed stories that had the ability to take me away from my daily routine and into a world of my imagination. It was my life that I chose, not the other way around. When I saw the cover for The Devil’s Triangle and the idea that a man was trying to save a child, I knew I had to read the book. I was not disappointed.
I thought this book was beautifully written, and I enjoyed reading about a man who was
I 00:00:11.182142 executorch:runner.cpp:257] RSS after finishing text generation: 0.000000 MiB (0 if unsupported)
PyTorchObserver {"prompt_tokens":3,"generated_tokens":119,"model_load_start_ms":1728683099291,"model_load_end_ms":1728683100732,"inference_start_ms":1728683100732,"inference_end_ms":1728683110471,"prompt_eval_end_ms":1728683100980,"first_token_ms":1728683100980,"aggregate_sampling_time_ms":90,"SCALING_FACTOR_UNITS_PER_SECOND":1000}
I 00:00:11.182185 executorch:stats.h:104] 	Prompt Tokens: 3    Generated Tokens: 119
I 00:00:11.182186 executorch:stats.h:110] 	Model Load Time:		1.441000 (seconds)
I 00:00:11.182188 executorch:stats.h:120] 	Total inference time:		9.739000 (seconds)		 Rate: 	12.218914 (tokens/second)
I 00:00:11.182190 executorch:stats.h:128] 		Prompt evaluation:	0.248000 (seconds)		 Rate: 	12.096774 (tokens/second)
I 00:00:11.182192 executorch:stats.h:139] 		Generated 119 tokens:	9.491000 (seconds)		 Rate: 	12.538194 (tokens/second)
I 00:00:11.182193 executorch:stats.h:147] 	Time to first generated token:	0.248000 (seconds)
I 00:00:11.182194 executorch:stats.h:154] 	Sampling time over 122 tokens:	0.090000 (seconds)

Before submitting

Did you read the contributor guideline,
Pull Request section?
Was this discussed/approved via a Github issue or the forum? Please add a link
to it if that's the case. Llama is ExecuTorch compatible #32505
Did you write any new necessary tests?

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

@ArthurZucker

guangy10 · 2024-10-11T23:00:27Z

Verified on torch>=2.4.0 including the upcoming torch==2.5.0

guangy10 · 2024-10-11T23:59:28Z

@ArthurZucker I see the original tokenizer for LLama3 is moved under original/tokenizer.model dir. Do you know how the tokenizer.model is converted to tokenizer.json and maybe other files? Does transformers have a util script that converts between these formats? Asking because I need to generate a tokenizer.model or tokenizer.bin file that ExecuTorch runtime can recognize for models that only have tokenizer.json.

guangy10 · 2024-10-15T16:40:01Z

@ArthurZucker do you mind reviewing this PR?

ArthurZucker · 2024-10-17T15:32:25Z

Hey sorry for being late!

ArthurZucker · 2024-10-17T15:32:51Z

We convert them using convert_slow's TikToken converter!

ArthurZucker

🤗

HuggingFaceDocBuilderDev · 2024-10-17T16:01:28Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

Llama3_1b and Llama2_7b are ExecuTorch compatible Co-authored-by: Guang Yang <[email protected]>

guangy10 force-pushed the llama3_executorch branch from 752ce90 to 402c2cd Compare October 11, 2024 19:42

guangy10 mentioned this pull request Oct 11, 2024

Export to ExecuTorch #32253

Open

26 tasks

guangy10 force-pushed the llama3_executorch branch 2 times, most recently from 7b7fcec to ecfc5a9 Compare October 11, 2024 21:47

guangy10 changed the title ~~Llama3_1b and Llama2_7b are ExecuTorch compatible~~ Llama3 and Llama2 are ExecuTorch compatible Oct 11, 2024

Llama3_1b and Llama2_7b are ExecuTorch compatible

1ccb858

guangy10 force-pushed the llama3_executorch branch from ecfc5a9 to 1ccb858 Compare October 11, 2024 22:59

ArthurZucker approved these changes Oct 17, 2024

View reviewed changes

ArthurZucker merged commit 9470c00 into huggingface:main Oct 17, 2024
13 checks passed

NielsRogge pushed a commit to NielsRogge/transformers that referenced this pull request Oct 21, 2024

Llama3 and Llama2 are ExecuTorch compatible (huggingface#34101)

2dee228

Llama3_1b and Llama2_7b are ExecuTorch compatible Co-authored-by: Guang Yang <[email protected]>

guangy10 mentioned this pull request Nov 25, 2024

SmolLM is ExecuTorch Compatible #34879

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Llama3 and Llama2 are ExecuTorch compatible #34101

Llama3 and Llama2 are ExecuTorch compatible #34101

guangy10 commented Oct 11, 2024 •

edited

Loading

guangy10 commented Oct 11, 2024

guangy10 commented Oct 11, 2024

guangy10 commented Oct 15, 2024

ArthurZucker commented Oct 17, 2024

ArthurZucker commented Oct 17, 2024

ArthurZucker left a comment

HuggingFaceDocBuilderDev commented Oct 17, 2024

Llama3 and Llama2 are ExecuTorch compatible #34101

Llama3 and Llama2 are ExecuTorch compatible #34101

Conversation

guangy10 commented Oct 11, 2024 • edited Loading

What does this PR do?

Additional Test in ExecuTorch

Before submitting

Who can review?

guangy10 commented Oct 11, 2024

guangy10 commented Oct 11, 2024

guangy10 commented Oct 15, 2024

ArthurZucker commented Oct 17, 2024

ArthurZucker commented Oct 17, 2024

ArthurZucker left a comment

Choose a reason for hiding this comment

HuggingFaceDocBuilderDev commented Oct 17, 2024

guangy10 commented Oct 11, 2024 •

edited

Loading

Additional Test in `ExecuTorch`