From 2880ae8b07e009a3b10b1e31d41b3010fd33232d Mon Sep 17 00:00:00 2001 From: Will Lin Date: Wed, 28 Aug 2024 07:04:39 +0000 Subject: [PATCH] docs: profiler example --- docs/source/dev/profiling/profiling_index.rst | 16 ++++++++++++++-- 1 file changed, 14 insertions(+), 2 deletions(-) diff --git a/docs/source/dev/profiling/profiling_index.rst b/docs/source/dev/profiling/profiling_index.rst index 0cf714282e7b2..40ce38a2193b5 100644 --- a/docs/source/dev/profiling/profiling_index.rst +++ b/docs/source/dev/profiling/profiling_index.rst @@ -24,13 +24,25 @@ Traces can be visualized using https://ui.perfetto.dev/. Set the env variable VLLM_RPC_GET_DATA_TIMEOUT_MS to a big number before you start the server. Say something like 30 minutes. ``export VLLM_RPC_GET_DATA_TIMEOUT_MS=1800000`` -Example commands: +Example commands and usage: +=========================== + +Offline Inference: +------------------ + +Source https://github.com/vllm-project/vllm/blob/main/examples/offline_inference_with_profiler.py. + +.. literalinclude:: ../../../../examples/offline_inference_with_profiler.py + :language: python + :linenos: + OpenAI Server: +-------------- .. code-block:: bash - VLLM_TORCH_PROFILER_DIR=/mnt/traces/ python -m vllm.entrypoints.openai.api_server --model meta-llama/Meta-Llama-3-70B + VLLM_TORCH_PROFILER_DIR=./vllm_profile python -m vllm.entrypoints.openai.api_server --model meta-llama/Meta-Llama-3-70B benchmark_serving.py: