Skip to content

Commit

Permalink
[Frontend] add add_request_id middleware (vllm-project#9594)
Browse files Browse the repository at this point in the history
Signed-off-by: cjackal <[email protected]>
Signed-off-by: Loc Huynh <[email protected]>
  • Loading branch information
cjackal authored and JC1DA committed Nov 11, 2024
1 parent acf9207 commit c85f5a5
Show file tree
Hide file tree
Showing 2 changed files with 34 additions and 0 deletions.
26 changes: 26 additions & 0 deletions docs/source/serving/openai_compatible_server.md
Original file line number Diff line number Diff line change
Expand Up @@ -62,6 +62,32 @@ completion = client.chat.completions.create(
)
```

### Extra HTTP Headers

Only `X-Request-Id` HTTP request header is supported for now.

```python
completion = client.chat.completions.create(
model="NousResearch/Meta-Llama-3-8B-Instruct",
messages=[
{"role": "user", "content": "Classify this sentiment: vLLM is wonderful!"}
],
extra_headers={
"x-request-id": "sentiment-classification-00001",
}
)
print(completion._request_id)

completion = client.completions.create(
model="NousResearch/Meta-Llama-3-8B-Instruct",
prompt="A robot may not injure a human being",
extra_headers={
"x-request-id": "completion-test",
}
)
print(completion._request_id)
```

### Extra Parameters for Completions API

The following [sampling parameters (click through to see documentation)](../dev/sampling_params.rst) are supported.
Expand Down
8 changes: 8 additions & 0 deletions vllm/entrypoints/openai/api_server.py
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,7 @@
import signal
import socket
import tempfile
import uuid
from argparse import Namespace
from contextlib import asynccontextmanager
from functools import partial
Expand Down Expand Up @@ -475,6 +476,13 @@ async def authentication(request: Request, call_next):
status_code=401)
return await call_next(request)

@app.middleware("http")
async def add_request_id(request: Request, call_next):
request_id = request.headers.get("X-Request-Id") or uuid.uuid4().hex
response = await call_next(request)
response.headers["X-Request-Id"] = request_id
return response

for middleware in args.middleware:
module_path, object_name = middleware.rsplit(".", 1)
imported = getattr(importlib.import_module(module_path), object_name)
Expand Down

0 comments on commit c85f5a5

Please sign in to comment.