Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

question to error model conversion process failed #1785

Open
geraldstanje opened this issue Apr 17, 2024 · 1 comment
Open

question to error model conversion process failed #1785

geraldstanje opened this issue Apr 17, 2024 · 1 comment
Labels
bug Something isn't working

Comments

@geraldstanje
Copy link

geraldstanje commented Apr 17, 2024

Description

djl-serving version: djl-inference:0.26.0-tensorrtllm0.7.1
models:

error: java.util.concurrent.CompletionException: ai.djl.engine.EngineException: Model conversion process failed!
aws instance: g5.12xlarge (= 4 nvidia A10)

my questions are:

  • is djl-serving able to convert a model shown above to tensorrtllm format automatically at startup?
  • can i point the option.model_id to the huggingface repo directly?
  • where can i check if djl-inference:0.26.0-tensorrtllm0.7.1 supports this model?
  • i have a 4 nvidia A10, if i set option.tensor_parallel_degree = 1 it seems more than 1 gpu is used - is that expected?
  • how can i enable the tensorrt specific metrics logging: like Latency Request ms, Latency (ms/token), Thorughput (tokens/second), Request/sec, Queue time ms, Latency Inference ms, gpu utilization, ms/token, number of output tokens etc.?

Expected Behavior

no error

Error Message

relevant error log:

2024-04-16T22:30:56.535-04:00   [INFO ] ModelInfo - S3 url found, start downloading from s3://foo-bar/llms/meta-llama/Llama-2-7b-chat/model
2024-04-16T22:30:56.535-04:00   [INFO ] ModelInfo - artifacts has been downloaded already: /tmp/.[djl.ai/download/39938700c24b769271f50f1d4a609b1feb943970](http://djl.ai/download/39938700c24b769271f50f1d4a609b1feb943970)
2024-04-16T22:30:58.539-04:00   [INFO ] LmiUtils - Converting model to TensorRT-LLM artifacts
2024-04-16T22:30:58.539-04:00   [INFO ] LmiUtils - convert_py: Traceback (most recent call last):
2024-04-16T22:30:58.539-04:00   [INFO ] LmiUtils - convert_py: File "/opt/djl/partition/trt_llm_partition.py", line 80, in <module>
2024-04-16T22:30:58.539-04:00   [INFO ] LmiUtils - convert_py: main()
2024-04-16T22:30:58.539-04:00   [INFO ] LmiUtils - convert_py: File "/opt/djl/partition/trt_llm_partition.py", line 76, in main
2024-04-16T22:30:58.539-04:00   [INFO ] LmiUtils - convert_py: create_trt_llm_repo(properties, args)
2024-04-16T22:30:58.539-04:00   [INFO ] LmiUtils - convert_py: File "/opt/djl/partition/trt_llm_partition.py", line 34, in create_trt_llm_repo
2024-04-16T22:30:58.539-04:00   [INFO ] LmiUtils - convert_py: create_model_repo(model_id_or_path, **kwargs)
2024-04-16T22:30:58.539-04:00   [INFO ] LmiUtils - convert_py: File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm_toolkit/__init__.py", line 47, in create_model_repo
2024-04-16T22:30:58.539-04:00   [INFO ] LmiUtils - convert_py: model = _get_model(model_id_or_path, **kwargs)
2024-04-16T22:30:58.539-04:00   [INFO ] LmiUtils - convert_py: File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm_toolkit/__init__.py", line 53, in _get_model
2024-04-16T22:30:58.539-04:00   [INFO ] LmiUtils - convert_py: model_config = AutoConfig.from_pretrained(model_id_or_path, trust_remote_code=trust_remote_code)
2024-04-16T22:30:58.539-04:00   [INFO ] LmiUtils - convert_py: File "/usr/local/lib/python3.10/dist-packages/transformers/models/auto/configuration_auto.py", line 1082, in from_pretrained
2024-04-16T22:30:58.539-04:00   [INFO ] LmiUtils - convert_py: config_dict, unused_kwargs = PretrainedConfig.get_config_dict(pretrained_model_name_or_path, **kwargs)
2024-04-16T22:30:58.539-04:00   [INFO ] LmiUtils - convert_py: File "/usr/local/lib/python3.10/dist-packages/transformers/configuration_utils.py", line 644, in get_config_dict
2024-04-16T22:30:58.539-04:00   [INFO ] LmiUtils - convert_py: config_dict, kwargs = cls._get_config_dict(pretrained_model_name_or_path, **kwargs)
2024-04-16T22:30:58.539-04:00   [INFO ] LmiUtils - convert_py: File "/usr/local/lib/python3.10/dist-packages/transformers/configuration_utils.py", line 699, in _get_config_dict
2024-04-16T22:30:58.539-04:00   [INFO ] LmiUtils - convert_py: resolved_config_file = cached_file(
2024-04-16T22:30:58.539-04:00   [INFO ] LmiUtils - convert_py: File "/usr/local/lib/python3.10/dist-packages/transformers/utils/hub.py", line 360, in cached_file
2024-04-16T22:30:58.539-04:00   [INFO ] LmiUtils - convert_py: raise EnvironmentError(
2024-04-16T22:30:58.790-04:00   [INFO ] LmiUtils - convert_py: OSError: /tmp/.[djl.ai/download/39938700c24b769271f50f1d4a609b1feb943970](http://djl.ai/download/39938700c24b769271f50f1d4a609b1feb943970) does not appear to have a file named config.json. Checkout 'https://huggingface.co//tmp/.djl.ai/download/39938700c24b769271f50f1d4a609b1feb943970/None' for available files.
2024-04-16T22:30:58.790-04:00   [ERROR] ModelServer - Failed register workflow
2024-04-16T22:30:58.790-04:00   java.util.concurrent.CompletionException: ai.djl.engine.EngineException: Model conversion process failed!
2024-04-16T22:30:58.790-04:00   #011at java.base/java.util.concurrent.CompletableFuture.encodeThrowable(CompletableFuture.java:315) ~[?:?]
2024-04-16T22:30:58.790-04:00   #011at java.base/java.util.concurrent.CompletableFuture.completeThrowable(CompletableFuture.java:320) [?:?]
2024-04-16T22:30:58.790-04:00   #011at java.base/java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1770) [?:?]
2024-04-16T22:30:58.790-04:00   #011at java.base/java.util.concurrent.CompletableFuture$AsyncSupply.exec(CompletableFuture.java:1760) [?:?]
2024-04-16T22:30:58.790-04:00   #011at java.base/java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:373) [?:?]
2024-04-16T22:30:58.790-04:00   #011at java.base/java.util.concurrent.ForkJoinPool$WorkQueue.topLevelExec(ForkJoinPool.java:1182) [?:?]
2024-04-16T22:30:58.790-04:00   #011at java.base/java.util.concurrent.ForkJoinPool.scan(ForkJoinPool.java:1655) [?:?]
2024-04-16T22:30:58.790-04:00   #011at java.base/java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1622) [?:?]
2024-04-16T22:30:58.790-04:00   #011at java.base/java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:165) [?:?]
2024-04-16T22:30:58.790-04:00   Caused by: ai.djl.engine.EngineException: Model conversion process failed!
2024-04-16T22:30:58.790-04:00   #011at ai.djl.serving.wlm.LmiUtils.buildTrtLlmArtifacts(LmiUtils.java:270) ~[wlm-0.26.0.jar:?]
2024-04-16T22:30:58.790-04:00   #011at ai.djl.serving.wlm.LmiUtils.convertIfNeed(LmiUtils.java:132) ~[wlm-0.26.0.jar:?]
2024-04-16T22:30:58.790-04:00   #011at ai.djl.serving.wlm.ModelInfo.initialize(ModelInfo.java:465) ~[wlm-0.26.0.jar:?]
2024-04-16T22:30:58.790-04:00   #011at ai.djl.serving.models.ModelManager.lambda$registerWorkflow$2(ModelManager.java:99) ~[serving-0.26.0.jar:?]
2024-04-16T22:30:58.790-04:00   #011at java.base/java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1768) ~[?:?]
2024-04-16T22:31:01.795-04:00   #011... 6 more
2024-04-16T22:31:03.799-04:00   [INFO ] ModelServer - Stopping model server.

How to Reproduce?

engine = MPI
option.tensor_parallel_degree = 4
option.rolling_batch = trtllm
option.paged_attention = true
option.max_rolling_batch_prefill_tokens = 16080
option.max_rolling_batch_size = 64
option.model_loading_timeout = 900
option.model_id = s3://foo-bar/llms/meta-llama/Llama-2-7b-chat/model

Steps to reproduce

deployed djl serving with model and djl serving config as shown before.

@geraldstanje geraldstanje added the bug Something isn't working label Apr 17, 2024
@sindhuvahinis
Copy link
Contributor

@geraldstanje Are you sure, you had all the artifacts in s3://foo-bar/llms/meta-llama/Llama-2-7b-chat/model. Do you have the model artifacts inside another object in here?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants