Skip to content

Commit

Permalink
Fix vLLM CPU api_server params (intel-analytics#11384)
Browse files Browse the repository at this point in the history
  • Loading branch information
xiangyuT authored and MeouSker77 committed Jul 19, 2024
1 parent 7adb00c commit 390efd2
Showing 1 changed file with 3 additions and 1 deletion.
Original file line number Diff line number Diff line change
Expand Up @@ -175,7 +175,9 @@ async def authentication(request: Request, call_next):
served_model_names = [args.model]
engine_args = AsyncEngineArgs.from_cli_args(args)
engine = IPEXLLMAsyncLLMEngine.from_engine_args(
engine_args, usage_context=UsageContext.OPENAI_API_SERVER)
engine_args, usage_context=UsageContext.OPENAI_API_SERVER,
load_in_low_bit=args.load_in_low_bit,
)
openai_serving_chat = OpenAIServingChat(engine, served_model_names,
args.response_role,
args.lora_modules,
Expand Down

0 comments on commit 390efd2

Please sign in to comment.