You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
error: java.util.concurrent.CompletionException: ai.djl.engine.EngineException: Model conversion process failed!
aws instance: g5.12xlarge (= 4 nvidia A10)
my questions are:
is djl-serving able to convert a model shown above to tensorrtllm format automatically at startup?
can i point the option.model_id to the huggingface repo directly?
where can i check if djl-inference:0.26.0-tensorrtllm0.7.1 supports this model?
i have a 4 nvidia A10, if i set option.tensor_parallel_degree = 1 it seems more than 1 gpu is used - is that expected?
how can i enable the tensorrt specific metrics logging: like Latency Request ms, Latency (ms/token), Thorughput (tokens/second), Request/sec, Queue time ms, Latency Inference ms, gpu utilization, ms/token, number of output tokens etc.?
Expected Behavior
no error
Error Message
relevant error log:
2024-04-16T22:30:56.535-04:00 [INFO ] ModelInfo - S3 url found, start downloading from s3://foo-bar/llms/meta-llama/Llama-2-7b-chat/model
2024-04-16T22:30:56.535-04:00 [INFO ] ModelInfo - artifacts has been downloaded already: /tmp/.[djl.ai/download/39938700c24b769271f50f1d4a609b1feb943970](http://djl.ai/download/39938700c24b769271f50f1d4a609b1feb943970)
2024-04-16T22:30:58.539-04:00 [INFO ] LmiUtils - Converting model to TensorRT-LLM artifacts
2024-04-16T22:30:58.539-04:00 [INFO ] LmiUtils - convert_py: Traceback (most recent call last):
2024-04-16T22:30:58.539-04:00 [INFO ] LmiUtils - convert_py: File "/opt/djl/partition/trt_llm_partition.py", line 80, in <module>
2024-04-16T22:30:58.539-04:00 [INFO ] LmiUtils - convert_py: main()
2024-04-16T22:30:58.539-04:00 [INFO ] LmiUtils - convert_py: File "/opt/djl/partition/trt_llm_partition.py", line 76, in main
2024-04-16T22:30:58.539-04:00 [INFO ] LmiUtils - convert_py: create_trt_llm_repo(properties, args)
2024-04-16T22:30:58.539-04:00 [INFO ] LmiUtils - convert_py: File "/opt/djl/partition/trt_llm_partition.py", line 34, in create_trt_llm_repo
2024-04-16T22:30:58.539-04:00 [INFO ] LmiUtils - convert_py: create_model_repo(model_id_or_path, **kwargs)
2024-04-16T22:30:58.539-04:00 [INFO ] LmiUtils - convert_py: File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm_toolkit/__init__.py", line 47, in create_model_repo
2024-04-16T22:30:58.539-04:00 [INFO ] LmiUtils - convert_py: model = _get_model(model_id_or_path, **kwargs)
2024-04-16T22:30:58.539-04:00 [INFO ] LmiUtils - convert_py: File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm_toolkit/__init__.py", line 53, in _get_model
2024-04-16T22:30:58.539-04:00 [INFO ] LmiUtils - convert_py: model_config = AutoConfig.from_pretrained(model_id_or_path, trust_remote_code=trust_remote_code)
2024-04-16T22:30:58.539-04:00 [INFO ] LmiUtils - convert_py: File "/usr/local/lib/python3.10/dist-packages/transformers/models/auto/configuration_auto.py", line 1082, in from_pretrained
2024-04-16T22:30:58.539-04:00 [INFO ] LmiUtils - convert_py: config_dict, unused_kwargs = PretrainedConfig.get_config_dict(pretrained_model_name_or_path, **kwargs)
2024-04-16T22:30:58.539-04:00 [INFO ] LmiUtils - convert_py: File "/usr/local/lib/python3.10/dist-packages/transformers/configuration_utils.py", line 644, in get_config_dict
2024-04-16T22:30:58.539-04:00 [INFO ] LmiUtils - convert_py: config_dict, kwargs = cls._get_config_dict(pretrained_model_name_or_path, **kwargs)
2024-04-16T22:30:58.539-04:00 [INFO ] LmiUtils - convert_py: File "/usr/local/lib/python3.10/dist-packages/transformers/configuration_utils.py", line 699, in _get_config_dict
2024-04-16T22:30:58.539-04:00 [INFO ] LmiUtils - convert_py: resolved_config_file = cached_file(
2024-04-16T22:30:58.539-04:00 [INFO ] LmiUtils - convert_py: File "/usr/local/lib/python3.10/dist-packages/transformers/utils/hub.py", line 360, in cached_file
2024-04-16T22:30:58.539-04:00 [INFO ] LmiUtils - convert_py: raise EnvironmentError(
2024-04-16T22:30:58.790-04:00 [INFO ] LmiUtils - convert_py: OSError: /tmp/.[djl.ai/download/39938700c24b769271f50f1d4a609b1feb943970](http://djl.ai/download/39938700c24b769271f50f1d4a609b1feb943970) does not appear to have a file named config.json. Checkout 'https://huggingface.co//tmp/.djl.ai/download/39938700c24b769271f50f1d4a609b1feb943970/None' for available files.
2024-04-16T22:30:58.790-04:00 [ERROR] ModelServer - Failed register workflow
2024-04-16T22:30:58.790-04:00 java.util.concurrent.CompletionException: ai.djl.engine.EngineException: Model conversion process failed!
2024-04-16T22:30:58.790-04:00 #011at java.base/java.util.concurrent.CompletableFuture.encodeThrowable(CompletableFuture.java:315) ~[?:?]
2024-04-16T22:30:58.790-04:00 #011at java.base/java.util.concurrent.CompletableFuture.completeThrowable(CompletableFuture.java:320) [?:?]
2024-04-16T22:30:58.790-04:00 #011at java.base/java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1770) [?:?]
2024-04-16T22:30:58.790-04:00 #011at java.base/java.util.concurrent.CompletableFuture$AsyncSupply.exec(CompletableFuture.java:1760) [?:?]
2024-04-16T22:30:58.790-04:00 #011at java.base/java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:373) [?:?]
2024-04-16T22:30:58.790-04:00 #011at java.base/java.util.concurrent.ForkJoinPool$WorkQueue.topLevelExec(ForkJoinPool.java:1182) [?:?]
2024-04-16T22:30:58.790-04:00 #011at java.base/java.util.concurrent.ForkJoinPool.scan(ForkJoinPool.java:1655) [?:?]
2024-04-16T22:30:58.790-04:00 #011at java.base/java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1622) [?:?]
2024-04-16T22:30:58.790-04:00 #011at java.base/java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:165) [?:?]
2024-04-16T22:30:58.790-04:00 Caused by: ai.djl.engine.EngineException: Model conversion process failed!
2024-04-16T22:30:58.790-04:00 #011at ai.djl.serving.wlm.LmiUtils.buildTrtLlmArtifacts(LmiUtils.java:270) ~[wlm-0.26.0.jar:?]
2024-04-16T22:30:58.790-04:00 #011at ai.djl.serving.wlm.LmiUtils.convertIfNeed(LmiUtils.java:132) ~[wlm-0.26.0.jar:?]
2024-04-16T22:30:58.790-04:00 #011at ai.djl.serving.wlm.ModelInfo.initialize(ModelInfo.java:465) ~[wlm-0.26.0.jar:?]
2024-04-16T22:30:58.790-04:00 #011at ai.djl.serving.models.ModelManager.lambda$registerWorkflow$2(ModelManager.java:99) ~[serving-0.26.0.jar:?]
2024-04-16T22:30:58.790-04:00 #011at java.base/java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1768) ~[?:?]
2024-04-16T22:31:01.795-04:00 #011... 6 more
2024-04-16T22:31:03.799-04:00 [INFO ] ModelServer - Stopping model server.
@geraldstanje Are you sure, you had all the artifacts in s3://foo-bar/llms/meta-llama/Llama-2-7b-chat/model. Do you have the model artifacts inside another object in here?
Description
djl-serving version: djl-inference:0.26.0-tensorrtllm0.7.1
models:
error: java.util.concurrent.CompletionException: ai.djl.engine.EngineException: Model conversion process failed!
aws instance: g5.12xlarge (= 4 nvidia A10)
my questions are:
option.tensor_parallel_degree = 1
it seems more than 1 gpu is used - is that expected?Expected Behavior
no error
Error Message
relevant error log:
How to Reproduce?
Steps to reproduce
deployed djl serving with model and djl serving config as shown before.
The text was updated successfully, but these errors were encountered: