-
Notifications
You must be signed in to change notification settings - Fork 58
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[DO NOT MERGE] Upstream codebase diff #470
base: main
Are you sure you want to change the base?
Conversation
To repro: start server: `VLLM_SKIP_WARMUP=true python -m vllm.entrypoints.openai.api_server` send a request (this works fine): ``` curl -v http://localhost:8000/v1/completions -H "Content-Type: application/json" -d '{"model": "facebook/opt-125m","prompt": "The future of AI is ","max_tokens": 100,"temperature": 0}' ``` if request has a seed it fails: ``` curl -v http://localhost:8000/v1/completions -H "Content-Type: application/json" -d '{"model": "facebook/opt-125m","prompt": "The future of AI is ","max_tokens": 100,"temperature": 0, "seed" : 37}' ``` Failure happens here: [vllm-fork/vllm/model_executor/sampling_metadata.py at habana_main · HabanaAI/vllm-fork](https://github.com/HabanaAI/vllm-fork/blob/habana_main/vllm/model_executor/sampling_metadata.py#L220) ``` if sampling_params.seed is not None: seq_group_metadata.state.generator = torch.Generator( device=device).manual_seed(sampling_params.seed) ``` `RuntimeError: Device type HPU is not supported for torch.Generator() api.` This PR fixes above issue by using htrandom [Intel Gaudi PyTorch Python API (habana_frameworks.torch) — Gaudi Documentation 1.17.1 documentation](https://docs.habana.ai/en/latest/PyTorch/Reference/Python_Packages.html?highlight=htrandom#random-number-generator-apis)
Fix one_hot bug in torch compile mode ``` > block_mapping = torch.nn.functional.one_hot(metadata.block_mapping, num_classes=batch_size) E RuntimeError: Class values must be non-negative. ../../vllm/worker/hpu_model_runner.py:311: RuntimeError ```
Due to high dynamicity on logits processing it's better to offload it completely to CPU instead of computing it on HPU.
This PR supports the unit test test_layers with LoraMask based approach
This PR enables automatic prefix caching in intel gaudi HPUs. Please refer to this [RFC](vllm-project#2614) for detailed informations about prefix caching.
Implementation of multi-step scheduling. To use the feature, pass --num_scheduler_steps=[n] as a server parameter. In my tests, best results were achieved with n==64, but this will vary depending on the model. --------- Co-authored-by: Karol Damaszke <[email protected]> Co-authored-by: jmaksymczuk <[email protected]>
This removers the need to pass VLLM_PROMPT_USE_FUSEDSDPA environment variable in order to enable FusedSDPA attention. Fallback attention can still be used if VLLM_PROMPT_USE_FUSEDSDPA=0 is provided.
Contiguous cache fetching to avoid using costly gather operation on Gaudi3. Requires changes in vllm-hpu-extension (HabanaAI/vllm-hpu-extension#17) to work. Introduces redundant calculations in decoding phase. Feature improves the performance of all tested workloads over the entire benchmark (5-12%) on Gaudi3. PR #426 further improves the performance of this feature (9-22%). Only compatible with v2-block-manager. Feature negatively impacts the performance of Gaudi2. Use VLLM_CONTIGUOUS_PA=true environment variable to enable.
This change is fixing the performance issue I have introduced in the PR #414 -- due to the usage of `torch.where` both functions have been called. Now we will run only the selected one.
Change `NaiveBlockAllocator` to use a priority queue so that we always allocate the lowest block id first. This further increases the performance of contiguous paged attention. - [ ] Add an option or env variable to enable/disable this behavior. (Not sure if this is necessary) --------- Co-authored-by: Yang Wang <[email protected]>
Adding calculation of OpenSSF Scorecard. Note: badge (visible at repo main page) will be disabled for now.
max_num_prefill_seqs parameter is used only when use_padding_aware_scheduling is True. use_padding_aware_scheduling default value is False, so max_num_prefill_seqs shouldn't be required to pass each time SchedulerConfig is initialized. Dozens of tests in tests/core are failing due to these parameters issue.
This PR implements tensor parallelism for multi-step scheduling.
0.20.2 had some changes that break lm_eval API
…#10071) Signed-off-by: Jee Jee Li <[email protected]>
Signed-off-by: Jee Jee Li <[email protected]> Signed-off-by: B-201 <[email protected]> Co-authored-by: B-201 <[email protected]>
Signed-off-by: Max de Bayser <[email protected]> Signed-off-by: Max de Bayser <[email protected]> Signed-off-by: Joe Runde <[email protected]> Signed-off-by: Russell Bryant <[email protected]> Signed-off-by: Thomas Parnell <[email protected]> Signed-off-by: Russell Bryant <[email protected]> Signed-off-by: Varad Ahirwadkar <[email protected]> Signed-off-by: Wallas Santos <[email protected]> Signed-off-by: Travis Johnson <[email protected]> Signed-off-by: Rafael Vasquez <[email protected]> Signed-off-by: Yuan Zhou <[email protected]> Signed-off-by: luka <[email protected]> Signed-off-by: Alex-Brooks <[email protected]> Signed-off-by: youkaichao <[email protected]> Signed-off-by: Tyler Michael Smith <[email protected]> Signed-off-by: mgoin <[email protected]> Signed-off-by: Vinay Damodaran <[email protected]> Signed-off-by: Woosuk Kwon <[email protected]> Signed-off-by: Jee Jee Li <[email protected]> Signed-off-by: Harry Mellor <[email protected]> Signed-off-by: charlifu <[email protected]> Signed-off-by: Sam Stoelinga <[email protected]> Signed-off-by: Vasily Alexeev <[email protected]> Signed-off-by: Kevin-Yang <[email protected]> Signed-off-by: Abatom <[email protected]> Signed-off-by: Bill Nell <[email protected]> Signed-off-by: wangshuai09 <[email protected]> Signed-off-by: Qishuai [email protected] Signed-off-by: yuze.zyz <[email protected]> Signed-off-by: Yannick Schnider <[email protected]> Signed-off-by: Kunjan Patel <[email protected]> Signed-off-by: simon-mo <[email protected]> Signed-off-by: kevin <[email protected]> Signed-off-by: YiSheng5 <[email protected]> Signed-off-by: yan ma <[email protected]> Signed-off-by: Went-Liang <[email protected]> Signed-off-by: Roger Wang <[email protected]> Signed-off-by: sasha0552 <[email protected]> Signed-off-by: mzusman <[email protected]> Signed-off-by: Prashant Gupta <[email protected]> Signed-off-by: André Jonasson <[email protected]> Signed-off-by: Gene Su <[email protected]> Signed-off-by: dependabot[bot] <[email protected]> Signed-off-by: Peter Salas <[email protected]> Signed-off-by: Nick Hill <[email protected]> Signed-off-by: Nick Hill <[email protected]> Signed-off-by: Michael Green <[email protected]> Signed-off-by: Shanshan Wang <[email protected]> Signed-off-by: Gregory Shtrasberg <[email protected]> Signed-off-by: daitran2k1 <[email protected]> Signed-off-by: MengqingCao <[email protected]> Signed-off-by: chaunceyjiang <[email protected]> Signed-off-by: Robert Shaw <[email protected]> Signed-off-by: Hissu Hyvarinen <[email protected]> Signed-off-by: [email protected] <[email protected]> Signed-off-by: Linkun Chen <[email protected]> Signed-off-by: Tomer Asida <[email protected]> Signed-off-by: DarkLight1337 <[email protected]> Co-authored-by: sasha0552 <[email protected]> Co-authored-by: Woosuk Kwon <[email protected]> Co-authored-by: Li, Jiang <[email protected]> Co-authored-by: Kuntai Du <[email protected]> Co-authored-by: Daniele <[email protected]> Co-authored-by: Cyrus Leung <[email protected]> Co-authored-by: Luka Govedič <[email protected]> Co-authored-by: bnellnm <[email protected]> Co-authored-by: Kai Wu <[email protected]> Co-authored-by: Isotr0py <[email protected]> Co-authored-by: Shashwat Srijan <[email protected]> Co-authored-by: Robert Shaw <[email protected]> Co-authored-by: Andrew Feldman <[email protected]> Co-authored-by: afeldman-nm <[email protected]> Co-authored-by: laishzh <[email protected]> Co-authored-by: Max de Bayser <[email protected]> Co-authored-by: Max de Bayser <[email protected]> Co-authored-by: Dipika Sikka <[email protected]> Co-authored-by: Joe Runde <[email protected]> Co-authored-by: Haoyu Wang <[email protected]> Co-authored-by: Russell Bryant <[email protected]> Co-authored-by: Nick Hill <[email protected]> Co-authored-by: tomeras91 <[email protected]> Co-authored-by: Tyler Michael Smith <[email protected]> Co-authored-by: Michael Goin <[email protected]> Co-authored-by: Kunjan <[email protected]> Co-authored-by: Kunjan Patel <kunjanp_google_com@vllm.us-central1-a.c.kunjanp-gke-dev-2.internal> Co-authored-by: Cody Yu <[email protected]> Co-authored-by: Thomas Parnell <[email protected]> Co-authored-by: Chih-Chieh Yang <[email protected]> Co-authored-by: Yue Zhang <[email protected]> Co-authored-by: Chen Zhang <[email protected]> Co-authored-by: Andy Dai <[email protected]> Co-authored-by: Dhia Eddine Rhaiem <[email protected]> Co-authored-by: yudian0504 <[email protected]> Co-authored-by: Varad Ahirwadkar <[email protected]> Co-authored-by: youkaichao <[email protected]> Co-authored-by: Baoyuan Qi <[email protected]> Co-authored-by: Wallas Henrique <[email protected]> Co-authored-by: Travis Johnson <[email protected]> Co-authored-by: Cyrus Leung <[email protected]> Co-authored-by: ngrozae <[email protected]> Co-authored-by: Falko1 <[email protected]> Co-authored-by: Rafael Vasquez <[email protected]> Co-authored-by: chenqianfzh <[email protected]> Co-authored-by: wangshuai09 <[email protected]> Co-authored-by: Jee Jee Li <[email protected]> Co-authored-by: xendo <[email protected]> Co-authored-by: Jerzy Zagorski <[email protected]> Co-authored-by: gopalsarda <[email protected]> Co-authored-by: Yuan <[email protected]> Co-authored-by: Gubrud, Aaron D <[email protected]> Co-authored-by: adgubrud <[email protected]> Co-authored-by: Yuhong Guo <[email protected]> Co-authored-by: Yuhong Guo <[email protected]> Co-authored-by: Ronen Schaffer <[email protected]> Co-authored-by: Aurick Qiao <[email protected]> Co-authored-by: Jeremy Arnold <[email protected]> Co-authored-by: Lucas Wilkinson <[email protected]> Co-authored-by: yulei <[email protected]> Co-authored-by: Seth Kimmel <[email protected]> Co-authored-by: Kaunil Dhruv <[email protected]> Co-authored-by: Flex Wang <[email protected]> Co-authored-by: Mengqing Cao <[email protected]> Co-authored-by: Alex Brooks <[email protected]> Co-authored-by: Yongzao <[email protected]> Co-authored-by: Yunfei Chu <[email protected]> Co-authored-by: Vinay R Damodaran <[email protected]> Co-authored-by: Yan Ma <[email protected]> Co-authored-by: Zhuohan Li <[email protected]> Co-authored-by: litianjian <[email protected]> Co-authored-by: Harry Mellor <[email protected]> Co-authored-by: Charlie Fu <[email protected]> Co-authored-by: Kevin H. Luu <[email protected]> Co-authored-by: Will Johnson <[email protected]> Co-authored-by: pavlo-ruban <[email protected]> Co-authored-by: Sam Stoelinga <[email protected]> Co-authored-by: ErkinSagiroglu <[email protected]> Co-authored-by: Vasiliy Alekseev <[email protected]> Co-authored-by: kakao-kevin-us <[email protected]> Co-authored-by: Kevin-Yang <[email protected]> Co-authored-by: 科英 <[email protected]> Co-authored-by: madt2709 <[email protected]> Co-authored-by: litianjian <[email protected]> Co-authored-by: Zhong Qishuai <[email protected]> Co-authored-by: tastelikefeet <[email protected]> Co-authored-by: Sven Seeberg <[email protected]> Co-authored-by: yannicks1 <[email protected]> Co-authored-by: Junichi Sato <[email protected]> Co-authored-by: Kunjan <[email protected]> Co-authored-by: Will Eaton <[email protected]> Co-authored-by: Simon Mo <[email protected]> Co-authored-by: Lily Liu <[email protected]> Co-authored-by: YiSheng5 <[email protected]> Co-authored-by: Went-Liang <[email protected]> Co-authored-by: Elfie Guo <[email protected]> Co-authored-by: Harsha vardhan manoj Bikki <[email protected]> Co-authored-by: Guillaume Calmettes <[email protected]> Co-authored-by: Roger Wang <[email protected]> Co-authored-by: Alexei-V-Ivanov-AMD <[email protected]> Co-authored-by: Mor Zusman <[email protected]> Co-authored-by: Prashant Gupta <[email protected]> Co-authored-by: Patrick von Platen <[email protected]> Co-authored-by: André Jonasson <[email protected]> Co-authored-by: Pavani Majety <[email protected]> Co-authored-by: Gene Der Su <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Peter Salas <[email protected]> Co-authored-by: sroy745 <[email protected]> Co-authored-by: Michael Green <[email protected]> Co-authored-by: Nick Hill <[email protected]> Co-authored-by: Nikita Furin <[email protected]> Co-authored-by: shanshan wang <[email protected]> Co-authored-by: Roger Wang <[email protected]> Co-authored-by: Gregory Shtrasberg <[email protected]> Co-authored-by: Yang Zheng <[email protected]> Co-authored-by: Yang Zheng(SW)(Alex) <[email protected]> Co-authored-by: Tran Quang Dai <[email protected]> Co-authored-by: Chauncey <[email protected]> Co-authored-by: hissu-hyvarinen <[email protected]> Co-authored-by: lkchen <[email protected]> Co-authored-by: Linkun Chen <[email protected]> Co-authored-by: Linkun Chen <[email protected]> Co-authored-by: Gene Der Su <[email protected]>
…led (vllm-project#10388) Signed-off-by: imkero <[email protected]>
Signed-off-by: Woosuk Kwon <[email protected]>
…ect#10383) Signed-off-by: youkaichao <[email protected]>
…odels (vllm-project#10374) Signed-off-by: Roger Wang <[email protected]>
Signed-off-by: Chendi Xue <[email protected]>
…ject#10394) Signed-off-by: Isotr0py <[email protected]>
Signed-off-by: youkaichao <[email protected]>
Signed-off-by: Kunshang Ji <[email protected]>
…ject#10403) Signed-off-by: imkero <[email protected]>
vllm-project#10392) Signed-off-by: wchen61 <[email protected]>
…m-project#10327) Signed-off-by: Isotr0py <[email protected]>
…vllm-project#10375) Signed-off-by: Hollow Man <[email protected]>
Add valid_seq_lengths to fusedsdpa - port from 1.18.0 https://github.com/HabanaAI/vllm-fork/blob/v1.18.0/vllm/attention/backends/habana_attn.py#L209
Set vllm-hpu-extension to 2542c18
…#10401) Signed-off-by: youkaichao <[email protected]>
Signed-off-by: Linkun Chen <[email protected]>
This is a bug fixed introduced by last spec_decode PR formatting commit. Fix here
This PR introduces async copying into _prepare_prompt and _prepare_decode, which makes copying faster. It also moves precompute_indices_and_offsets funtion into forward to avoid unnecessary H2D copying.
@@ -0,0 +1,45 @@ | |||
name: codespell |
Check failure
Code scanning / Scorecard
Token-Permissions High
Remediation tip: Visit https://app.stepsecurity.io/secureworkflow.
Tick the 'Restrict permissions for GITHUB_TOKEN'
Untick other options
NOTE: If you want to resolve multiple issues at once, you can visit https://app.stepsecurity.io/securerepo instead.
Click Remediation section below for further remediation help
def test_stateless_process_group(worker): | ||
port1 = get_open_port() | ||
with socket.socket(socket.AF_INET, socket.SOCK_STREAM) as s: | ||
s.bind(("", port1)) |
Check warning
Code scanning / CodeQL
Binding a socket to all network interfaces Medium test
Show autofix suggestion
Hide autofix suggestion
Copilot Autofix AI 2 days ago
To fix the problem, we need to bind the socket to a specific interface instead of all interfaces. This can be done by replacing the empty string (''
) with a specific IP address, such as 127.0.0.1
, which binds the socket to the localhost interface. This change ensures that the socket only accepts connections from the local machine, mitigating the security risk.
-
Copy modified line R127
@@ -126,3 +126,3 @@ | ||
with socket.socket(socket.AF_INET, socket.SOCK_STREAM) as s: | ||
s.bind(("", port1)) | ||
s.bind(("127.0.0.1", port1)) | ||
port2 = get_open_port() |
…rror] (#502) Fix argument incompatible issue for FP8 ``` ERROR 11-11 04:29:13 engine.py:143] File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1556, in _wrapped_call_impl ERROR 11-11 04:29:13 engine.py:143] return self._call_impl(*args, **kwargs) ERROR 11-11 04:29:13 engine.py:143] File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1606, in _call_impl ERROR 11-11 04:29:13 engine.py:143] result = forward_call(*args, **kwargs) ERROR 11-11 04:29:13 engine.py:143] TypeError: PatchedVLLMKVCache.forward() missing 2 required positional arguments: 'block_indices' and 'block_offset' ``` FIX #453 https://github.com/HabanaAI/vllm-fork/blob/habana_main/README_GAUDI.md#troubleshooting-tweaking-hpu-graphs
Scope of changes:
mark_step
s)