Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

run end_to_end_test_llama.py error #134

Open
wants to merge 24 commits into
base: main
Choose a base branch
from

Conversation

SherronBurtint
Copy link

Running python3 tools/end_to_end_test_llama.py, an error was prompted, [400] HTTP end point doesn't support models with decoupled transaction policy

]

try:
result = client.infer(model_name, inputs)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see the llama model is decoupled , so shouldn't the call be async_stream_infer instead of infer?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I correct it to start_stream(call_back()) and async_stream_infer() and use old input(HttpInferInput), but got an error "TypeError: Not a cmessage" from tritonclient/grpc/_utils.py

SamuraiBUPT and others added 19 commits June 26, 2023 18:09
fix the int8_mode and decoupled mode backend support
when i follow the llame_guide.md to build this lib ,this error occur

```bash
/workspace/build/fastertransformer_backend/src/libfastertransformer.cc: In member function 'std::shared_ptr<AbstractTransformerModel> triton::backend::fastertransformer_backend::ModelState::ModelFactory(triton::common::TritonJson::Value&, const string&)':
/workspace/build/fastertransformer_backend/src/libfastertransformer.cc:340:98: error: 'int8_mode' was not declared in this scope
  340 |       ft_model = std::make_shared<LlamaTritonModel<__nv_bfloat16>>(tp, pp, custom_ar, model_dir, int8_mode);
      |                                                                                                  ^~~~~~~~~
[100%] Linking CXX executable ../../../../../bin/multi_gpu_gpt_interactive_example
[100%] Built target gptneox_example
[100%] Built target multi_gpu_gpt_triton_example
[100%] Built target llama_example
/workspace/build/fastertransformer_backend/src/libfastertransformer.cc:343:90: error: 'int8_mode' was not declared in this scope
  343 |       ft_model = std::make_shared<LlamaTritonModel<float>>(tp, pp, custom_ar, model_dir, int8_mode);
```
i think the variable should fix.  after i move it , the build success
Update libfastertransformer.cc
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

7 participants