Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Llama3 and Llama2 are ExecuTorch compatible #34101

Merged
merged 1 commit into from
Oct 17, 2024

Conversation

guangy10
Copy link
Contributor

@guangy10 guangy10 commented Oct 11, 2024

What does this PR do?

Llama 2&# is compatible with ExecuTorch.

Note that LLama2&3 in ExecuTorch repo has been fully optimized for SOTA perf using its own model definition and optimization. You can read details in https://github.com/pytorch/executorch/tree/main/examples/models/llama2. The work here is to make the Llama model compatible with ExecuTorch using HuggingFace's model definition.

Additional Test in ExecuTorch

Running Llama-3.2-1B E2E:
cmake-out/examples/models/llama2/llama_main --tokenizer_path=tokenizer_llama3_1b.model --model_path=llama3_1b.pte --prompt="My name is"

I 00:00:00.000599 executorch:cpuinfo_utils.cpp:61] Reading file /sys/devices/soc0/image_version
I 00:00:00.000638 executorch:cpuinfo_utils.cpp:77] Failed to open midr file /sys/devices/soc0/image_version
I 00:00:00.000643 executorch:cpuinfo_utils.cpp:157] Number of efficient cores 4
I 00:00:00.000645 executorch:main.cpp:69] Resetting threadpool with num threads = 6
I 00:00:00.002350 executorch:runner.cpp:59] Creating LLaMa runner: model_path=llama3_1b.pte, tokenizer_path=tokenizer_llama3_1b.model
I 00:00:01.442476 executorch:runner.cpp:88] Reading metadata from model
I 00:00:01.442494 executorch:runner.cpp:111] Methond use_sdpa_with_kv_cache not found, using the default value 0
I 00:00:01.442496 executorch:runner.cpp:113] Metadata: use_sdpa_with_kv_cache = 0
I 00:00:01.442501 executorch:runner.cpp:113] Metadata: use_kv_cache = 1
I 00:00:01.442503 executorch:runner.cpp:113] Metadata: get_vocab_size = 128256
I 00:00:01.442505 executorch:runner.cpp:113] Metadata: get_bos_id = 128000
I 00:00:01.442507 executorch:runner.cpp:113] Metadata: get_max_seq_len = 123
I 00:00:01.442508 executorch:runner.cpp:111] Methond enable_dynamic_shape not found, using the default value 0
I 00:00:01.442509 executorch:runner.cpp:113] Metadata: enable_dynamic_shape = 0
I 00:00:01.442512 executorch:runner.cpp:174] RSS after loading model: 0.000000 MiB (0 if unsupported)
I 00:00:01.691377 executorch:runner.cpp:243] RSS after prompt prefill: 0.000000 MiB (0 if unsupported)
My name is Alex.
I am a retired Army Captain, former air traffic controller and current writer.
I have been a reader for many years, and I have always enjoyed stories that had the ability to take me away from my daily routine and into a world of my imagination. It was my life that I chose, not the other way around. When I saw the cover for The Devil’s Triangle and the idea that a man was trying to save a child, I knew I had to read the book. I was not disappointed.
I thought this book was beautifully written, and I enjoyed reading about a man who was
I 00:00:11.182142 executorch:runner.cpp:257] RSS after finishing text generation: 0.000000 MiB (0 if unsupported)
PyTorchObserver {"prompt_tokens":3,"generated_tokens":119,"model_load_start_ms":1728683099291,"model_load_end_ms":1728683100732,"inference_start_ms":1728683100732,"inference_end_ms":1728683110471,"prompt_eval_end_ms":1728683100980,"first_token_ms":1728683100980,"aggregate_sampling_time_ms":90,"SCALING_FACTOR_UNITS_PER_SECOND":1000}
I 00:00:11.182185 executorch:stats.h:104] 	Prompt Tokens: 3    Generated Tokens: 119
I 00:00:11.182186 executorch:stats.h:110] 	Model Load Time:		1.441000 (seconds)
I 00:00:11.182188 executorch:stats.h:120] 	Total inference time:		9.739000 (seconds)		 Rate: 	12.218914 (tokens/second)
I 00:00:11.182190 executorch:stats.h:128] 		Prompt evaluation:	0.248000 (seconds)		 Rate: 	12.096774 (tokens/second)
I 00:00:11.182192 executorch:stats.h:139] 		Generated 119 tokens:	9.491000 (seconds)		 Rate: 	12.538194 (tokens/second)
I 00:00:11.182193 executorch:stats.h:147] 	Time to first generated token:	0.248000 (seconds)
I 00:00:11.182194 executorch:stats.h:154] 	Sampling time over 122 tokens:	0.090000 (seconds)

Before submitting

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

@ArthurZucker

@guangy10 guangy10 mentioned this pull request Oct 11, 2024
26 tasks
@guangy10 guangy10 force-pushed the llama3_executorch branch 2 times, most recently from 7b7fcec to ecfc5a9 Compare October 11, 2024 21:47
@guangy10 guangy10 changed the title Llama3_1b and Llama2_7b are ExecuTorch compatible Llama3 and Llama2 are ExecuTorch compatible Oct 11, 2024
@guangy10
Copy link
Contributor Author

Verified on torch>=2.4.0 including the upcoming torch==2.5.0

@guangy10
Copy link
Contributor Author

@ArthurZucker I see the original tokenizer for LLama3 is moved under original/tokenizer.model dir. Do you know how the tokenizer.model is converted to tokenizer.json and maybe other files? Does transformers have a util script that converts between these formats? Asking because I need to generate a tokenizer.model or tokenizer.bin file that ExecuTorch runtime can recognize for models that only have tokenizer.json.

@guangy10
Copy link
Contributor Author

@ArthurZucker do you mind reviewing this PR?

@ArthurZucker
Copy link
Collaborator

Hey sorry for being late!

@ArthurZucker
Copy link
Collaborator

We convert them using convert_slow's TikToken converter!

Copy link
Collaborator

@ArthurZucker ArthurZucker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤗

@ArthurZucker ArthurZucker merged commit 9470c00 into huggingface:main Oct 17, 2024
13 checks passed
@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

NielsRogge pushed a commit to NielsRogge/transformers that referenced this pull request Oct 21, 2024
Llama3_1b and Llama2_7b are ExecuTorch compatible

Co-authored-by: Guang Yang <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants