-
-
Notifications
You must be signed in to change notification settings - Fork 4.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[6/N] torch.compile rollout to users #10437
[6/N] torch.compile rollout to users #10437
Conversation
Signed-off-by: youkaichao <[email protected]>
👋 Hi! Thank you for contributing to the vLLM project. Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging. To run CI, PR reviewers can do one of these:
🚀 |
Signed-off-by: youkaichao <[email protected]>
@@ -868,6 +869,16 @@ def add_cli_args(parser: FlexibleArgumentParser) -> FlexibleArgumentParser: | |||
help="Override or set the pooling method in the embedding model. " | |||
"e.g. {\"pooling_type\": \"mean\", \"normalize\": false}.'") | |||
|
|||
parser.add_argument('--compilation-config', | |||
'-O', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I want to mimic the convention of traditional compiler, -O 3
is the most optimized level.
Signed-off-by: youkaichao <[email protected]>
Signed-off-by: youkaichao <[email protected]>
Signed-off-by: youkaichao <[email protected]>
Signed-off-by: youkaichao <[email protected]>
'interpreted as the optimization level.\n' | ||
'NOTE: level 0 is the default level without ' | ||
'any optimization. level 1 and 2 are for internal ' | ||
'testing only. level 3 is the recommended level ' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe add that this (levels 1 and 2) could change in the future - if we actually have multiple optimization levels, we can use O1 and O2 and move the internal levels somewhere else
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
our optimization is actually not well graded as levels. they appear as complicated configs. I would keep only levels 0/1/2/3 , and recommend using 3 for users.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 for -O
:)
the user interface:
command line:
short:
-O 0/1/2/3
long:
--compilation-config json_string
vllm.LLM
:users can construct
CompilationConfig
object directly, and passcompilation_config=obj
.the first roll out will only roll out 3 levels, more fine-grained control will come later as we stabilize them.
documentation on how to use and the design doc will come later.