From e72169bce07dffd4435aab264105f9b05a2e777c Mon Sep 17 00:00:00 2001 From: Wallas Santos Date: Mon, 2 Sep 2024 12:20:29 -0300 Subject: [PATCH] [Doc] Compatibility matrix for mutual exclusive features Signed-off-by: Wallas Santos --- docs/source/index.rst | 1 + docs/source/serving/compatibility_matrix.rst | 191 +++++++++++++++++++ 2 files changed, 192 insertions(+) create mode 100644 docs/source/serving/compatibility_matrix.rst diff --git a/docs/source/index.rst b/docs/source/index.rst index 4b817c4ba9498..0b9ce5197c8b8 100644 --- a/docs/source/index.rst +++ b/docs/source/index.rst @@ -85,6 +85,7 @@ Documentation serving/usage_stats serving/integrations serving/tensorizer + serving/compatibility_matrix serving/faq .. toctree:: diff --git a/docs/source/serving/compatibility_matrix.rst b/docs/source/serving/compatibility_matrix.rst new file mode 100644 index 0000000000000..01801b209e0f7 --- /dev/null +++ b/docs/source/serving/compatibility_matrix.rst @@ -0,0 +1,191 @@ +.. _compatibility_matrix: + +Compatibility Matrix +==================== + +The table below shows mutually exclusive features along with support for some device types. + +.. list-table:: + :header-rows: 1 + :widths: 20 8 8 8 8 8 8 8 8 8 8 8 + + * - Feature + - Chunked Prefill + - APC + - LoRa + - Prompt Adapter + - Speculative decoding + - CUDA Graphs + - Encoder/Decoder + - Logprobs + - Prompt Logprobs + - Async Output + - Multi-step + * - APC + - ✅ + - + - + - + - + - + - + - + - + - + - + * - LoRa + - ✗ `[C] `__ + - ✅ + - + - + - + - + - + - + - + - + - + * - Prompt Adapter + - ✅ + - ✅ + - ✅ + - + - + - + - + - + - + - + - + * - Speculative decoding + - ✗ `[C] `__ `[T] `__ + - ✅ + - ✗ `[C] `__ + - ✅ + - + - + - + - + - + - + - + * - CUDA Graphs + - ✅ + - ✅ + - ✅ + - ✅ + - ✅ + - + - + - + - + - + - + * - Encoder/Decoder + - ✗ `[C] `__ + - ✗ `[C] `__ `[T] `__ + - ✗ `[C] `__ + - ✗ `[C] `__ + - ✗ `[C] `__ `[T] `__ + - ✗ `[C] `__ `[T] `__ + - + - + - + - + - + * - Logprobs + - ✅ + - ✅ + - ✅ + - ✅ + - ✅ + - ✅ + - ✅ + - + - + - + - + * - Prompt Logprobs + - ✅ + - ✅ + - ✅ + - ✅ + - ✗ `[C] `__ `[T] `__ + - ✅ + - ✅ + - ✅ + - + - + - + * - Async Output + - ✅ + - ✅ + - ✅ + - ✅ + - ✗ `[C] `__ + - ✅ `[C] `__ + - ✗ `[C] `__ `[C] `__ + - ✅ + - ✅ + - + - + * - Multi-step + - ✗ `[C] `__ + - ✅ + - ✗ `[C] `__ + - ✅ + - ✗ `[C] `__ + - ✅ + - ✗ `[C] `__ + - ✅ + - ✗ `[C] `__ `[T] `__ + - ✅ + - + * - NVIDIA + - ✅ + - ✅ + - ✅ + - ✅ + - ✅ + - ✅ + - ✅ + - ✅ + - ✅ + - ✅ + - ✅ + * - CPU + - ✗ `[C] `__ + - ✗ `[C] `__ + - ✗ `[C] `__ `[T] `__ + - ✗ `[T] `__ + - ✅ + - ✗ `[C] `__ + - ✗ `[C] `__ + - ✅ + - ✅ + - ✗ `[C] `__ + - ✗ `[T] `__ + * - AMD + - ✅ + - ✅ + - ✅ + - ✅ + - ✅ + - ✅ + - ✗ `[C] `__ + - ✅ + - ✅ + - ✅ + - ✗ `[T] `__ + +Note: + +- [C] stands for code checks, that is, there is a checking on running that verify if the combinations is valid and raises and error or log a warning disabling the feature. +- [T] stands for tracking issues or pull requests on vLLM Repo. +- APC stands for Automatic Prefix Caching. +- Async output processing needs CUDA Graphs activated to work, there is a code check in the table to inform that. It is the only ✅ with a [C]. +- Encoder/decoder currently does not work with CUDA Graphs, therefore it is not compatible with Async output processing as well. + + +.. + TODO: Add support for remaining devices. \ No newline at end of file