[Frontend] Chat-based Embeddings API #9759

DarkLight1337 · 2024-10-28T13:12:13Z

This PR extends the existing Embeddings API to accept chat conversations similar to Chat Completions API. This enables multi-modal conversations to be passed to the embedding model.

To reduce code duplication, I've also factored out the common code for handling completion and chat-based inputs into the base OpenAIServing class.

FIX #8967
FIX #9303 (comment)

github-actions · 2024-10-28T13:12:25Z

👋 Hi! Thank you for contributing to the vLLM project.
Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can do one of these:

Add ready label to the PR
Enable auto-merge.

🚀

vllm/entrypoints/openai/serving_embedding.py

mergify · 2024-10-29T12:34:10Z

This pull request has merge conflicts that must be resolved before it can be
merged. @DarkLight1337 please rebase it. https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

DarkLight1337 · 2024-10-31T02:16:21Z

@simon-mo @njhill do you have time to take a look at this? @ywang96 is busy today.

maxdebayser

I left a few comments but overall this looks good to me.

maxdebayser · 2024-10-31T19:00:10Z

docs/source/models/vlm.rst

+    Since VLM2Vec has the same model architecture as Phi-3.5-Vision, we have to explicitly pass ``--task embedding``
+    to run this model in embedding mode instead of text generation mode.
+
+Since this schema is not defined by OpenAI client, we post a request to the server using the lower-level ``requests`` library:


Just leaving this as a thought here: should we perhaps have a fork of the openai client that support our extensions explicitly?

This sounds good, but not sure whether we have bandwidth to maintain it 😅

I suggest opening an issue for this.

tests/entrypoints/openai/test_embedding.py

maxdebayser · 2024-10-31T19:55:49Z

vllm/pooling_params.py

@@ -7,7 +7,7 @@ class PoolingParams(
        msgspec.Struct,
        omit_defaults=True,  # type: ignore[call-arg]
        array_like=True):  # type: ignore[call-arg]
-    """Pooling parameters for pooling.
+    """Pooling parameters for embeddings API.


I might be missing something, but the additional_data attribute doesn't seem to be used anywhere. Which is good, because it can by anything and is passed without validation from the request to the Pooler.forward() method as part of the PoolingMetadata object. If there is no use case for this, can we remove it in this PR?

@robertgshaw2-neuralmagic originally added this (#4800 (comment)). I am not sure whether this is still relevant since we can now set the pooling strategy via CLI (#9697).

@robertgshaw2-neuralmagic can you comment on this?

Meanwhile let's merge this PR first.

ywang96

Left a few comments - PTAL!

docs/source/models/vlm.rst

tests/entrypoints/openai/test_embedding.py

vllm/entrypoints/openai/protocol.py

vllm/entrypoints/openai/serving_embedding.py

ywang96

LGTM!

njhill · 2024-11-04T18:55:49Z

vllm/entrypoints/openai/serving_completion.py

+        except asyncio.CancelledError:
+            return self.create_error_response("Client disconnected")
+        except ValueError as e:
+            # TODO: Use a vllm-specific Validation Error
+            return self.create_error_response(str(e))

+        try:


@DarkLight1337 apologies I didn't get a chance to review this last week .. ran into some things while trying to resolve conflicts with another pending PR :)

It looks like there are a few changes not directly related to chat embeddings.

Wondering the reason for splitting into two try/except blocks here in particular (I guess similar in serving_embedding.py)

It is just to make clear that asyncio.CancelledError can only happen while iterating through the result generator.

This isn't necessary and makes the code more convoluted imo... I may open a PR to change and we can discuss there :)

Signed-off-by: Linkun Chen <[email protected]>

Signed-off-by: Richard Liu <[email protected]>

These were changed to separate blocks in vllm-project#9759 but I feel it's cleaner/clearer as a single block. It actually doesn't matter which parts of the block raise the specific exceptions in the except clauses, we still want to handle them in the same way.

These were changed to separate blocks in vllm-project#9759 but I feel it's cleaner/clearer as a single block. It actually doesn't matter which parts of the block raise the specific exceptions in the except clauses, we still want to handle them in the same way. Signed-off-by: Nick Hill <[email protected]>

Signed-off-by: Loc Huynh <[email protected]>

Signed-off-by: Sumit Dubey <[email protected]>

DarkLight1337 added 2 commits October 28, 2024 13:10

Initial implementation

1b91750

Update docs

61e0fcf

DarkLight1337 changed the title ~~Chat embeddings api~~ [Frontend] Chat-based Embeddings API Oct 28, 2024

DarkLight1337 mentioned this pull request Oct 28, 2024

[RFC]: Multi-modality Support Refactoring #4194

Open

DarkLight1337 added 5 commits October 28, 2024 14:04

Cleanup

c62be47

Consolidate and make code consistent

cc999b1

Remove useless statement

9ed87c1

Rename back

efa7c6f

Factor out common code

ab9297e

mergify bot added documentation Improvements or additions to documentation frontend labels Oct 28, 2024

maxdebayser reviewed Oct 28, 2024

View reviewed changes

vllm/entrypoints/openai/serving_embedding.py Outdated Show resolved Hide resolved

DarkLight1337 added 9 commits October 29, 2024 02:23

Reinstate truncate_prompt_tokens check

5a4f271

Rename

4a969b4

Fix

279b9ce

Remove unused code

7de803f

Migrate tokenization API

c1ef363

Some fixes

a10fa85

format

89e0710

remoev unused imports

81b94de

Migrate chat and completion APIs

a79d3b2

mergify bot added the needs-rebase label Oct 29, 2024

DarkLight1337 added 2 commits October 29, 2024 13:54

Factor out trace headers code

8b950dd

Merge branch 'main' into chat-embeddings-api

2c91855

mergify bot removed the needs-rebase label Oct 29, 2024

DarkLight1337 added 2 commits October 29, 2024 13:59

Clean

f5e72ff

More precise error handling

9cd1ac3

DarkLight1337 force-pushed the chat-embeddings-api branch from 3cc07d5 to 9cd1ac3 Compare October 29, 2024 14:04

DarkLight1337 added 2 commits October 31, 2024 01:25

Merge branch 'main' into chat-embeddings-api

8c8ee96

Merge branch 'main' into chat-embeddings-api

c3ba030

DarkLight1337 requested a review from njhill October 31, 2024 02:16

maxdebayser approved these changes Oct 31, 2024

View reviewed changes

ywang96 reviewed Oct 31, 2024

View reviewed changes

docs/source/models/vlm.rst Outdated Show resolved Hide resolved

tests/entrypoints/openai/test_embedding.py Show resolved Hide resolved

vllm/entrypoints/openai/protocol.py Show resolved Hide resolved

vllm/entrypoints/openai/serving_embedding.py Outdated Show resolved Hide resolved

DarkLight1337 added 2 commits November 1, 2024 04:14

Optionally initialize request handlers

46f316f

Update tip

1179f66

ywang96 approved these changes Nov 1, 2024

View reviewed changes

DarkLight1337 added 3 commits November 1, 2024 05:54

Update tests

eb4b235

format

bf46a16

Rename

7f188f9

DarkLight1337 enabled auto-merge (squash) November 1, 2024 05:58

github-actions bot added the ready ONLY add when PR is ready to merge/full CI is needed label Nov 1, 2024

DarkLight1337 merged commit 06386a6 into main Nov 1, 2024
69 checks passed

DarkLight1337 deleted the chat-embeddings-api branch November 1, 2024 08:13

DarkLight1337 mentioned this pull request Nov 1, 2024

[Frontend] Use a proper chat template for VLM2Vec #9912

Merged

FurtherAI mentioned this pull request Nov 2, 2024

[Model] Adding Support for Qwen2VL as an Embedding Model. Using MrLight/dse-qwen2-2b-mrl-v1 #9944

Merged

2 tasks

lk-chen pushed a commit to lk-chen/vllm that referenced this pull request Nov 4, 2024

[Frontend] Chat-based Embeddings API (vllm-project#9759)

35cd4cc

njhill reviewed Nov 4, 2024

View reviewed changes

lk-chen pushed a commit to lk-chen/vllm that referenced this pull request Nov 4, 2024

[Frontend] Chat-based Embeddings API (vllm-project#9759)

847443b

Signed-off-by: Linkun Chen <[email protected]>

richardsliu pushed a commit to richardsliu/vllm that referenced this pull request Nov 4, 2024

[Frontend] Chat-based Embeddings API (vllm-project#9759)

6432426

Signed-off-by: Richard Liu <[email protected]>

bigPYJ1151 pushed a commit to bigPYJ1151/vllm that referenced this pull request Nov 5, 2024

[Frontend] Chat-based Embeddings API (vllm-project#9759)

51fbf75

njhill mentioned this pull request Nov 5, 2024

[Frontend] Adjust try/except blocks in API impl #10056

Merged

hissu-hyvarinen pushed a commit to ROCm/vllm that referenced this pull request Nov 6, 2024

[Frontend] Chat-based Embeddings API (vllm-project#9759)

7f8e48e

JC1DA pushed a commit to JC1DA/vllm that referenced this pull request Nov 11, 2024

[Frontend] Chat-based Embeddings API (vllm-project#9759)

4ec378b

Signed-off-by: Loc Huynh <[email protected]>

sumitd2 pushed a commit to sumitd2/vllm that referenced this pull request Nov 14, 2024

[Frontend] Chat-based Embeddings API (vllm-project#9759)

e4a429d

Signed-off-by: Sumit Dubey <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Frontend] Chat-based Embeddings API #9759

[Frontend] Chat-based Embeddings API #9759

DarkLight1337 commented Oct 28, 2024 •

edited

Loading

github-actions bot commented Oct 28, 2024

mergify bot commented Oct 29, 2024

DarkLight1337 commented Oct 31, 2024 •

edited

Loading

maxdebayser left a comment

maxdebayser Oct 31, 2024

DarkLight1337 Nov 1, 2024

DarkLight1337 Nov 1, 2024

maxdebayser Oct 31, 2024

DarkLight1337 Nov 1, 2024 •

edited

Loading

DarkLight1337 Nov 1, 2024

DarkLight1337 Nov 1, 2024

ywang96 left a comment

ywang96 left a comment

njhill Nov 4, 2024

DarkLight1337 Nov 5, 2024

njhill Nov 5, 2024

[Frontend] Chat-based Embeddings API #9759

[Frontend] Chat-based Embeddings API #9759

Conversation

DarkLight1337 commented Oct 28, 2024 • edited Loading

github-actions bot commented Oct 28, 2024

mergify bot commented Oct 29, 2024

DarkLight1337 commented Oct 31, 2024 • edited Loading

maxdebayser left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

DarkLight1337 Nov 1, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ywang96 left a comment

Choose a reason for hiding this comment

ywang96 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

DarkLight1337 commented Oct 28, 2024 •

edited

Loading

DarkLight1337 commented Oct 31, 2024 •

edited

Loading

DarkLight1337 Nov 1, 2024 •

edited

Loading