Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RFC]: vLLM plugin system #7131

Open
youkaichao opened this issue Aug 4, 2024 · 9 comments
Open

[RFC]: vLLM plugin system #7131

youkaichao opened this issue Aug 4, 2024 · 9 comments

Comments

@youkaichao
Copy link
Member

Motivation.

There is an increasing need to customize vLLM, including:

Usually, the request is to swap out some functions / classes in vLLM, or call some functions before vLLM runs the model. While implementing them in vLLM is not difficult, the maintenaince burden grows.

In order to satisfy the growing need of customization, I propose to introduce vLLM plugin system.

It is inspired by the pytest community, where a plugin is a standalone pypi package, e.g. https://pypi.org/project/pytest-forked/ .

#7130 is a draft implementation, where I added a new env var VLLM_PLUGINS. The way it works, is similar to the operating system's LD_PRELOAD, with a colon-separated list of python modules to import.

One of the most important concern, is to fight against arbitrary code execution risk. When a user serves a model using vLLM, the endpoint user cannot activate the plugin, so this does not suffer from code injection risk. However, there is indeed a risk, if the user runs vLLM in an untrusted environment. In this case:

  • we require the plugin package name starts with vllm_ , so that vLLM user does not accidentally add irrelevant modules to execute.
  • we explicitly log the plugin module vLLM is using, so that vLLM user can easily see if any unexpected code is executed.

With these efforts, the security level should be the same as LD_PRELOAD. And since LD_PRELOAD exists for so many years, I think VLLM_PLUGINS should be acceptable in terms of security risk.

Proposed Change.

see #7130 for the draft implementation

Feedback Period.

No response

CC List.

No response

Any Other Things.

No response

@youkaichao youkaichao added the RFC label Aug 4, 2024
@NadavShmayo
Copy link
Contributor

This looks like a step in the right direction to me (:

I have 2 questions regarding this:

  1. Currently it seems like it wouldn't really be possible to replace the scheduler implementation, or add the uneven tensor-parallel implementation using these plugins, or at least not in an intuitive way that I can see.
    Don't you think it would make more sense to make different types of plugins each focused on a different cause?
    For example one plugin type would be a scheduler plugin, another would be a model architecture plugin, and perhaps a few more.
    This could come as an addition to your suggestion of "general-purpose" plugins, but in most cases it'd be simpler to implement a well-defined plugin interface instead of a general-purpose one.

  2. I suggest using Python's built in entrypoints to implement plugins, similar to how pytest implements it (see [Misc] Logits processor plugins #4769 which I worked on as an example for this), are there any advantages you see with the environment variable approach compared to this approach?

I really believe implementing such plugin system concept could make vLLM an even greater technology, and personally it could solve a lot of problems for me by allowing great modularity and costumization.
I'd be more than happy to help in implementing it (:

@youkaichao
Copy link
Member Author

Currently it seems like it wouldn't really be possible to replace the scheduler implementation, or add the uneven tensor-parallel implementation using these plugins, or at least not in an intuitive way that I can see.

It might not be easy, but should be possible. By allowing loading a plugin, the plugin has the total control to do anything it wants. In the extreme case, swap the whole vLLM code into another implementation.

For example one plugin type would be a scheduler plugin, another would be a model architecture plugin, and perhaps a few more.

We can consider this as a TODO. It needs to clean up the interface of each components in vLLM, so that users can bring in their implementation more easily.

In the begining, we can reserve the space for them, e.g. use vllm_general_plugin for a general plugin that is blindly executed, and later introduce vllm_scheduler_plugin that is dedicated to replace scheduler.

Currently, we can have vllm_models_plugin to register out-of-tree models, and vllm_executor_plugin to register user-specified executor.

I suggest using Python's built in entrypoints to implement plugins, similar to how pytest implements it

I think this is great! I didn't know it before. It is much better than env var I think. The only concern is, if users installed many plugins for the same component, e.g. scheduler, how can they select the one they want? We might need to design some config file format, to determine which plugins to use.

@simon-mo
Copy link
Collaborator

simon-mo commented Aug 8, 2024

I think either LD_PRELOAD way or the Python entrypoints ways are proven patterns. At the current experimental stage, I have two concerns:

  • We should stress that this is experimental feature and the API can change without notice. Therefore, we should make sure the variable names are VLLM_EXPERIMENTAL_PLUGINS etc. Also in any documentation or examples, these are highly subject to change; and any plugins can break across any version of vLLM.
  • Plugins works because they have a well documented interface that is backward compatible. We cannot guarantee any of that at the current state of the project. Therefore, we should start thinking about what interfaces are exposed and start thinking about which ones to stabilize.

Regarding the exact code being executed, I don't have much concern about security, rather it is how the plugins is being called and invoked. Will it swap in a class implementation for an abstract class, or some function, or insert some callbacks? It does seems like it needs several use cases to prove out and design over time.

@youkaichao
Copy link
Member Author

It does seems like it needs several use cases to prove out and design over time

agree. so this RFC is just a start to explore how we interact with plugins. There are already 2 usecases now: out-of-tree model registration, and user-specified executor registration.

Plugins works because they have a well documented interface that is backward compatible.

that is the stable state of plugin system. we don't need to guarantee that at the moment. it is the plugin's author's responsibility to keep their plugin up-to-date. and we can see what the community makes out of the plugin, and gradually make some part of the system pluggable with stable API.

we should make sure the variable names are VLLM_EXPERIMENTAL_PLUGINS

I think we can directly call it VLLM_PLUGIN. Although the plugin system is immature for plugin developer, the usage is quite stable for end-users. They will need to install some packages and select them.

@youkaichao
Copy link
Member Author

Overall I got positive feedback for this RFC.

I will use the entrypoint mechanism mentioned by @NadavShmayo to collect all the installed plugins, and use VLLM_PLUGIN to control which plugins are loaded.

@NadavShmayo
Copy link
Contributor

I see that you have already implemented the general-purpose plugin system, nice!
I implemented a basic version of the model architecture plugin system, which as we discussed I think comes as an addition to this.

Would be great if you could have a look at #7438 and give your feedback.

@youkaichao
Copy link
Member Author

#7426 finished the framework.

TODOs:

  • doc update
  • modular plugins for specific purpose

@toslunar
Copy link
Contributor

I use sitecustomize and PYTHONPATH to deploy custom models with parallelism. In other words, Python has a plugin system to a certain extent. For security, it's nice that PYTHONPATH must be audited anyway, while we might miss VLLM_PLUGINS unless it's well-documented. In my opinion, PyTest cannot utilize sitecustomize because PyTest and its plugin system should not disturb other packages much. vLLM doesn't have to work with arbitrary packages (for example, since #2155 many dependencies are pinned).

Copy link

This issue has been automatically marked as stale because it has not had any activity within 90 days. It will be automatically closed if no further activity occurs within 30 days. Leave a comment if you feel this issue should remain open. Thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants