Support per-request seed #2514

njhill · 2024-01-20T00:58:01Z

@WoosukKwon @zhuohan123 @simon-mo please let me know if this looks reasonable!

~~Question: Can we rely on request_id to be unique? If not, this may also require assigning a guaranteed-unique internal id.~~ (now n/a)

Resolves #1211, #1595

zhuohan123 · 2024-01-20T01:07:56Z

Haven't looked into the PR yet, but we can guarantee the request ID is unique

Yard1 · 2024-02-08T23:26:34Z

One issue I have encountered when investigating something similar is that due to not enough precision when doing forward pass, different batch sizes can actually lead to different tokens being sampled despite the seed being set. Something to keep in mind here!

njhill · 2024-02-09T15:51:25Z

Thanks @Yard1! Yes I'm aware of that and it's even the case for greedy. But this should hopefully allow for "mostly stable" results. float16 is much better than bfloat16, quantized case is probably worse. Even OpenAI docs for the param say something along the lines of "best effort" :) I am just getting back to this PR now so will do some tests.

njhill · 2024-02-12T19:09:49Z

This is now working and ready. @Yard1 would really appreciate if you could have a quick look when you get a chance! cc @simon-mo

vllm/worker/model_runner.py

Yard1

Can we add a test to ensure that the sampling is deterministic for the same seed?

njhill · 2024-02-13T16:42:54Z

Can we add a test to ensure that the sampling is deterministic for the same seed?

Yes!

WoosukKwon · 2024-02-14T06:49:56Z

Personally, I really like this feature. Looking forward to the merge!

njhill · 2024-02-14T20:46:14Z

@WoosukKwon I saw you flagged for the imminent release ... I am working on the extra tests right now, should push them in the next hour or so.

WoosukKwon · 2024-02-14T20:55:24Z

@njhill Oh just dropped it from the release tracker. Sorry if I pushed it too tight. For the next release, I think we will just focus on bug fixes. Let's ship this in v0.3.2.

Yard1 · 2024-02-13T22:23:13Z

vllm/sequence.py

@@ -359,6 +362,7 @@ class SequenceGroupMetadata:
        sampling_params: The sampling parameters used to generate the outputs.
        block_tables: The block tables. (Seq id -> list of physical block
            numbers)
+        state: A dict for holding internal state tied to this sequence group.


we should make this a dataclass with defined fields

@Yard1 I'd intentionally kept this opaque to maintain separation of concerns since the torch generator is lower level and managed only by the model runner / sampler. If we define a dataclass here then the member type would be torch.Generator and as far as I can see torch is decoupled from the engine layer. In this case the model runner just needs access to somewhere tied to the lifecycle of the sequence group that it can stash the corresponding Generator.

But happy to change it if you're sure about this!

It's mainly about ensuring we avoid errors caused by eg. typos in key names. Type hinting/checking is secondary.

@Yard1 have now changed it to use a dataclass

@Yard1

If the SamplingParams object passed to LLMEngine.add_request() is mutated after it returns, it could affect the async sampling process for that request. Suggested by @Yard1 vllm-project#2514 (comment)

@Yard1

If the SamplingParams object passed to LLMEngine.add_request() is mutated after it returns, it could affect the async sampling process for that request. Suggested by @Yard1 #2514 (comment)

Yard1 · 2024-02-17T20:27:14Z

tests/samplers/test_seeded_generate.py

+    vllm_outputs_seed_1_2 = vllm_model.generate(example_prompts,
+                                                sampling_params_seed_1)
+    vllm_outputs_seed_2_2 = vllm_model.generate(example_prompts,
+                                                sampling_params_seed_2)


can we also shuffle the prompts here? it would also be great if we could test multiple different seeds in one batch

@Yard1 I've updated the test to do a mix of different/same seeds in a single batch, also updated the other mixed batch sampler test to shuffle and compare the seeded requests.

Revert enforcement of best_of == 1 when using seed

Rather than SamplingParams object

@Yard1

Per @Yard1's review comment

simon-mo · 2024-02-21T19:46:57Z

I'm going to merge this considering the newest commit just removes comment. Here's the passing CI for commit 1a774fd https://buildkite.com/vllm/ci/builds/1465

@Yard1

If the SamplingParams object passed to LLMEngine.add_request() is mutated after it returns, it could affect the async sampling process for that request. Suggested by @Yard1 vllm-project#2514 (comment)

tdeng521 · 2024-03-03T07:20:48Z

Thanks @Yard1! Yes I'm aware of that and it's even the case for greedy. But this should hopefully allow for "mostly stable" results. float16 is much better than bfloat16, quantized case is probably worse. Even OpenAI docs for the param say something along the lines of "best effort" :) I am just getting back to this PR now so will do some tests.

I encountered an issue where, with the same prompt and parameters of top_k=-1, top_p=1, temperature=0, and using continuous batching, there is a certain probability that the response values will differ when the number of concurrent requests is greater than 1. However, when testing with offline inference and a batch size of 2, the response values are always the same. It seems that continuous batching may affect the results of greedy sampling.

njhill · 2024-03-03T16:34:30Z

@tdeng521 yes, this is expected due to the precision-related differences when floating point ops are accumulated differently, including different matmul implementations used for different batch sizes, etc. You should see it to a lesser degree if you try with float32.

@Yard1

If the SamplingParams object passed to LLMEngine.add_request() is mutated after it returns, it could affect the async sampling process for that request. Suggested by @Yard1 vllm-project#2514 (comment)

@Yard1

If the SamplingParams object passed to LLMEngine.add_request() is mutated after it returns, it could affect the async sampling process for that request. Suggested by @Yard1 vllm-project/vllm#2514 (comment)

@Yard1

If the SamplingParams object passed to LLMEngine.add_request() is mutated after it returns, it could affect the async sampling process for that request. Suggested by @Yard1 vllm-project/vllm#2514 (comment)

@Yard1

If the SamplingParams object passed to LLMEngine.add_request() is mutated after it returns, it could affect the async sampling process for that request. Suggested by @Yard1 vllm-project#2514 (comment)

njhill force-pushed the seed branch from e67bbd9 to 716f51f Compare January 20, 2024 23:43

njhill force-pushed the seed branch from 716f51f to 2560980 Compare February 8, 2024 23:22

njhill force-pushed the seed branch from 13c7838 to 58ba21c Compare February 12, 2024 06:05

njhill marked this pull request as ready for review February 12, 2024 17:17

simon-mo assigned Yard1 Feb 12, 2024

Yard1 reviewed Feb 13, 2024

View reviewed changes

vllm/worker/model_runner.py Outdated Show resolved Hide resolved

Yard1 reviewed Feb 13, 2024

View reviewed changes

njhill force-pushed the seed branch from 9c82ddc to be45420 Compare February 13, 2024 19:12

WoosukKwon mentioned this pull request Feb 14, 2024

[v0.3.1] Release Tracker #2859

Closed

5 tasks

simon-mo added the v0.3.2 label Feb 14, 2024

Yard1 reviewed Feb 14, 2024

View reviewed changes

njhill mentioned this pull request Feb 15, 2024

Defensively copy sampling_params #2881

Merged

Yard1 reviewed Feb 17, 2024

View reviewed changes

njhill force-pushed the seed branch 3 times, most recently from 757fb4a to e7cf1a1 Compare February 21, 2024 00:01

njhill mentioned this pull request Feb 21, 2024

Bump up version to v0.3.2 #2968

Merged

njhill added 3 commits February 21, 2024 11:38

Support per-request seed

824db68

Fix setting generators

ec7c3bf

Free per-request generator state upon completion

3b71f4b

njhill added 9 commits February 21, 2024 11:38

Add parameter verification and tests

08fb671

Fixes and simplification

8ec7ced

Revert enforcement of best_of == 1 when using seed

Stash generator in seq-group-specific dict

1cae401

Rather than SamplingParams object

Remove _generator field from SamplingParams

5db801e

Add tests for deterministic output

f264ab2

Use dataclass instead of dict for sequence group state

364cdd2

Per @Yard1's review comment

Improve test_sampler_mixed to test seed determinism

a0eebef

Change seeded generate test to use mixed batch

1a774fd

Remove no-longer-applicable comment

57d9536

njhill force-pushed the seed branch from e7cf1a1 to 57d9536 Compare February 21, 2024 19:40

Yard1 approved these changes Feb 21, 2024

View reviewed changes

simon-mo merged commit 7d2dcce into vllm-project:main Feb 21, 2024
14 of 21 checks passed

njhill deleted the seed branch February 21, 2024 19:48

simon-mo mentioned this pull request Feb 21, 2024

[feat] vLLM generation deterministic option/flag #2910

Open

xjpang pushed a commit to xjpang/vllm that referenced this pull request Feb 22, 2024

Support per-request seed (vllm-project#2514)

454b842

andy-neuma mentioned this pull request Feb 23, 2024

andy/bump main to v0.3.2 neuralmagic/nm-vllm#49

Closed

xjpang pushed a commit to xjpang/vllm that referenced this pull request Mar 4, 2024

Support per-request seed (vllm-project#2514)

1882c1c

davidthomas426 mentioned this pull request Mar 6, 2024

Add handler for new lmi-dist deepjavalibrary/djl-serving#1595

Merged

simon-mo mentioned this pull request Mar 13, 2024

Fixes#1595 | set seed before generation #1637

Closed

Temirulan pushed a commit to Temirulan/vllm-whisper that referenced this pull request Sep 6, 2024

Support per-request seed (vllm-project#2514)

5ff9192

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support per-request seed #2514

Support per-request seed #2514

njhill commented Jan 20, 2024 •

edited

Loading

zhuohan123 commented Jan 20, 2024

Yard1 commented Feb 8, 2024

njhill commented Feb 9, 2024

njhill commented Feb 12, 2024

Yard1 left a comment

njhill commented Feb 13, 2024

WoosukKwon commented Feb 14, 2024

njhill commented Feb 14, 2024

WoosukKwon commented Feb 14, 2024

Yard1 Feb 13, 2024

njhill Feb 14, 2024

Yard1 Feb 17, 2024 •

edited

Loading

njhill Feb 20, 2024

Yard1 Feb 17, 2024

njhill Feb 20, 2024

simon-mo commented Feb 21, 2024

tdeng521 commented Mar 3, 2024

njhill commented Mar 3, 2024

Support per-request seed #2514

Support per-request seed #2514

Conversation

njhill commented Jan 20, 2024 • edited Loading

zhuohan123 commented Jan 20, 2024

Yard1 commented Feb 8, 2024

njhill commented Feb 9, 2024

njhill commented Feb 12, 2024

Yard1 left a comment

Choose a reason for hiding this comment

njhill commented Feb 13, 2024

WoosukKwon commented Feb 14, 2024

njhill commented Feb 14, 2024

WoosukKwon commented Feb 14, 2024

Yard1 Feb 13, 2024

Choose a reason for hiding this comment

njhill Feb 14, 2024

Choose a reason for hiding this comment

Yard1 Feb 17, 2024 • edited Loading

Choose a reason for hiding this comment

njhill Feb 20, 2024

Choose a reason for hiding this comment

Yard1 Feb 17, 2024

Choose a reason for hiding this comment

njhill Feb 20, 2024

Choose a reason for hiding this comment

simon-mo commented Feb 21, 2024

tdeng521 commented Mar 3, 2024

njhill commented Mar 3, 2024

njhill commented Jan 20, 2024 •

edited

Loading

Yard1 Feb 17, 2024 •

edited

Loading