Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature]: Option to override HuggingFace's configurations #5205

Closed
DarkLight1337 opened this issue Jun 3, 2024 · 10 comments · Fixed by #5836
Closed

[Feature]: Option to override HuggingFace's configurations #5205

DarkLight1337 opened this issue Jun 3, 2024 · 10 comments · Fixed by #5836

Comments

@DarkLight1337
Copy link
Member

DarkLight1337 commented Jun 3, 2024

🚀 The feature, motivation and pitch

The configuration files on HuggingFace may have missing information (e.g. #2051) or contain bugs (e.g. #4008). In such cases, it may be necessary to provide/override the configuration files to enable the model to be loaded correctly. However, apart from chat templates, there is currently no method of doing so; we have to update the source HuggingFace repository directly. It may take time for the authors of those repositories to respond, especially if they are unofficial ones which are not as well-maintained.

It would be great if we could provide our own config.json, tokenizer_config.json, etc., through the vLLM CLI to apply patches as necessary.

Related work

#1756 lets us specify alternative chat templates or provide a chat template when it is missing from tokenizer_config.json. However, it currently only applies to the OpenAI API-compatible server. #5049 will add chat method to the main LLM entrypoint, but does not provide a built-in way to load the chat template automatically like in #1756.

Some vLLM models have already hardcoded patches to HuggingFace config.json; these can be found under vllm/transformers_utils/configs.

@Suvralipi
Copy link

By default the LLM model is downloaded from Huggingface/ModelScope. Is there a way we can load model from local filepath or private repository or s3 object storage? How to get models from local storage path (supported models as in vLLM ) while we try to deploy it in local environment ?

@DarkLight1337
Copy link
Member Author

By default the LLM model is downloaded from Huggingface/ModelScope. Is there a way we can load model from local filepath or private repository or s3 object storage? How to get models from local storage path (supported models as in vLLM ) while we try to deploy it in local environment ?

Actually, this is already supported - just pass a filepath to --model.

@Suvralipi
Copy link

Actually, this is already supported - just pass a filepath to --model.
I am trying to deploy model using Kserve with vLLM with inference service as below: The pvc filepath has the required model which can be accessible with other running pods. I am getting similar error if I try to use s3 filepath as well. Both the paths contain the model and config files as expected.

apiVersion: serving.kserve.io/v1beta1
kind: InferenceService
metadata:
name: opt-125m-vllm
spec:
predictor:
containers:
- args:
- --port
- "8080"
- --model
- "pvc://kubeflow-shared-pvc/llm-mlflow/opt-125m"
command:
- python3
- -m
- vllm.entrypoints.api_server
env:
- name: STORAGE_URI
value: "pvc://kubeflow-shared-pvc/llm-mlflow/opt-125m"
- name: PYTORCH_CUDA_ALLOC_CONF
value: "max_split_size_mb:2048"
image: kserve/vllmserver:latest
name: kserve-container
resources:
limits:
cpu: "4"
memory: 8Gi
nvidia.com/gpu: "1"
requests:
cpu: "1"
memory: 8Gi
nvidia.com/gpu: "1"

But its giving me the below error:
Traceback (most recent call last):
File "/usr/lib/python3.8/runpy.py", line 194, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/usr/lib/python3.8/runpy.py", line 87, in _run_code
exec(code, run_globals)
File "/usr/local/lib/python3.8/dist-packages/vllm/entrypoints/api_server.py", line 78, in
engine = AsyncLLMEngine.from_engine_args(engine_args)
File "/usr/local/lib/python3.8/dist-packages/vllm/engine/async_llm_engine.py", line 226, in from_engine_args
engine_configs = engine_args.create_engine_configs()
File "/usr/local/lib/python3.8/dist-packages/vllm/engine/arg_utils.py", line 147, in create_engine_configs
model_config = ModelConfig(self.model, self.tokenizer,
File "/usr/local/lib/python3.8/dist-packages/vllm/config.py", line 57, in init
self.hf_config = get_config(model, trust_remote_code)
File "/usr/local/lib/python3.8/dist-packages/vllm/transformers_utils/config.py", line 17, in get_config
config = AutoConfig.from_pretrained(
File "/usr/local/lib/python3.8/dist-packages/transformers/models/auto/configuration_auto.py", line 1007, in from_pretrained
config_dict, unused_kwargs = PretrainedConfig.get_config_dict(pretrained_model_name_or_path, **kwargs)
File "/usr/local/lib/python3.8/dist-packages/transformers/configuration_utils.py", line 620, in get_config_dict
config_dict, kwargs = cls._get_config_dict(pretrained_model_name_or_path, **kwargs)
File "/usr/local/lib/python3.8/dist-packages/transformers/configuration_utils.py", line 696, in _get_config_dict
raise EnvironmentError(
OSError: Can't load the configuration of 'pvc://kubeflow-shared-pvc/llm-mlflow/opt-125m'. If you were trying to load it from 'https://huggingface.co/models', make sure you don't have a local directory with the same name. Otherwise, make sure 'pvc://kubeflow-shared-pvc/llm-mlflow/opt-125m' is the correct path to a directory containing a config.json file

@DarkLight1337
Copy link
Member Author

Oh, I missed the part where you are using object storage. I only meant that local filepaths are supported.

@Suvralipi
Copy link

Oh, I missed the part where you are using object storage. I only meant that local filepaths are supported.

So it means that it doesn't support persistent storage volume path of object storage. It can only support local filepaths

@DarkLight1337
Copy link
Member Author

Oh, I missed the part where you are using object storage. I only meant that local filepaths are supported.

So it means that it doesn't support persistent storage volume path of object storage. It can only support local filepaths

Yes, that is true. I think supporting non-local filepaths would warrant its own PR/issue.

@DarkLight1337 DarkLight1337 added the good first issue Good for newcomers label Jun 14, 2024
Copy link

This issue has been automatically marked as stale because it has not had any activity within 90 days. It will be automatically closed if no further activity occurs within 30 days. Leave a comment if you feel this issue should remain open. Thank you!

@github-actions github-actions bot added the stale label Oct 26, 2024
@zwhe99
Copy link

zwhe99 commented Oct 29, 2024

Hi @DarkLight1337 ! Do you have a solution for this?

@DarkLight1337
Copy link
Member Author

You can take a look at the code under vllm.transformers_utils and figure out how to pass user configs to override the configs loaded from HF there.

@K-Mistele
Copy link
Contributor

thanks for the callout re #2547 @DarkLight1337 - it seems like maybe being able to override individual fields in config.json through engine/CLI args would be a good approach? if so, I think llama.cpp has a good reference implementation; they allow doing this to override GGUF file key / value pairs with a custom value

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants