Fix response preprocessing bug #646

nv-hwoo · 2024-05-10T19:11:22Z

When preprocessing responses, if all responses are either empty or null, then the res_outputs and res_timestamps will eventually be empty and will raise index out of range error. Instead of indexing 0-th and last element, check if the list is empty or not.

src/c++/perf_analyzer/genai-perf/tests/test_llm_metrics.py

dyastremsky

Great work here. Appreciate you finding this bug and adding the relevant tests!

matthewkotila · 2024-05-10T19:19:44Z

src/c++/perf_analyzer/genai-perf/genai_perf/llm_metrics.py

@@ -600,13 +600,13 @@ def _preprocess_response(

            # Remove responses without any content
            # These are only observed to happen at the start or end
-            while res_outputs[0] and self._is_openai_empty_response(
+            while res_outputs and self._is_openai_empty_response(


Do we need additional errors/warnings if all responses are null, though?

wont PA error before that?
with "no responses in window"?

I think this is different than that. This is still a response, it's just an "empty" response.

Also, I don't think PA gives a warning/error for no responses. It gives a warning for not enough requests during a measurement window. I think if there were no responses, PA would hang.

Sometimes models output only a empty string "" and this is what's causing this. Since this is a known behavior of LLM, and this doesn't affect metric numbers, I don't think warning is necessary.

Good catch, Matt! It affects metric numbers because the models are potentially outputting unexpected outputs, right? And this may make metrics look better/worse than they are.

Is the empty string case a known correct case? It feels like if models are returning empty strings, something is most likely going wrong. Have we investigated why they are sometimes outputting only an empty string? Could that be a bug? If that is a desirable and expected case under certain conditions, I feel like we should be allowing a flag that lets users accept empty strings, not accept them by default.

I think the decision was to ignore all the first responses with empty content since TTFT of empty response doesn't really makes sense.

Ahh, I am confusing the empty content case with the first response being DONE. If the first response is DONE, would we want that to count towards TTFT?

Just to clarify,

Empty response: response with "content": "" (empty output string).

Null response: either response with missing "content" field or response is data: [DONE].

Given that, is it normal (or even possible) to have data: [DONE] as first response? And I don't think it makes sense to include that to our TTFT metric since that is not a normal response from the model.

@IzzyPutterman Can model output empty string (e.g. "") as output?

I think regardless of whether a server/model outputting all "empty" responses is a bug in the server/model, this fix improves the usability of GenAI-Perf by not crashing and still allowing largely valid output to be output to user.

But we should try to confirm if all-empty responses is a bug or not in the server/model. If we think it's unexpected behavior, we could file an issue with the server implementers to start (maybe it's not from them, maybe it's more relevant for the model developers?)

nv-hwoo · 2024-05-11T00:12:09Z

@dyastremsky @matthewkotila @IzzyPutterman

Talked with Ganesh and @debermudez As this PR does not change behavior of the original code as it's just fixing the potential index error, I will create another follow up ticket to discuss about this.

* Fix empty response bug * Fix unused variable

* Fix empty response bug * Fix unused variable Fix test Initialize logger to capture logs Add unit test Change to _ instead of removing Check if args.model is not None fix artifact path Support Python 3.8 in GenAI-Perf (#643) Add automation to run unit tests and check code coverage for GenAI-Perf against Python 3.10 (#640) Changes to support Ensemble Top Level Response Caching (#560) Support for fixed number of requests (#633) * first pass. Hardcoded values * Working for concurrency (hardcoded whenever count windows is used for now) * working for req rate as well * Add CLI. Add/fix unit tests * Remove hack. Restore all normal functionality * Refactor thread config into one class. Add more testing * Rename arg to request-count * Fix request rate bug * Update info print * fix corner case * move fixme to a story tag * add assert to avoid corner case * rename variables * self review #1 * copyright changes * add doxygen to functions * Don't allow sweeping over multiple concurrency or request rate with request-count fix test (#637) Support custom artifacts directory and improve default artifacts directory (#636) * Add artifacts dir option and more descriptive profile export filename * Clean up * fix input data path * Add tests * create one to one plot dir for each profile run * change the directory look * add helper method Extend genai perf plots to compare across multiple runs (#635) * Modify PlotManager and plots classes * Support plots for multiple runs -draft * Fix default plot visualization * Remove artifact * Set default compare directory * Support generating parquet files * Remove annotations and fix heatmap * Fix errors * Fix pre-commit * Fix CodeQL warning * Remove unused comments * remove x axis tick label for boxplot * Add logging and label for heatmap subplots * Allow users to adjust width and height * fix grammer --------- Co-authored-by: Hyunjae Woo <[email protected]> Generate plot configurations for plot manager (#632) * Introduce PlotConfig and PlotConfigParser class * Port preprocessing steps and introduce ProfileRunData * Create plot configs for default plots * fix minor bug * Fix comment * Implement parse method in PlotConfigParser * refactor * fix test * Add test * Address feedback * Handle custom endpoint Add more metadata to profile export JSON file (#627) * Add more metadata to profile export data * Fix minor bug * refactor Add compare subcommand (#623) * Move for better visibility * Add compare subparser * Add subcommand compare * Fix test * Add ticket * add --files option and minor fix * Fix tests * Add unit tests * Address feedback * Fix minor error and add section header Revert "Changes to support Ensemble Top Level Response Caching (#560) (#642)" This reverts commit cc6a3b2. Changes to support Ensemble Top Level Response Caching (#560) (#642)

* Fix empty response bug * Fix unused variable

Fix empty response bug

8fb667a

nv-hwoo requested review from debermudez and dyastremsky May 10, 2024 19:11

github-advanced-security bot found potential problems May 10, 2024

View reviewed changes

src/c++/perf_analyzer/genai-perf/tests/test_llm_metrics.py Fixed Show resolved Hide resolved

Fix unused variable

7b5d96e

dyastremsky approved these changes May 10, 2024

View reviewed changes

dyastremsky self-requested a review May 10, 2024 19:18

This comment was marked as outdated.

Sign in to view

dyastremsky approved these changes May 10, 2024

View reviewed changes

matthewkotila reviewed May 10, 2024

View reviewed changes

nv-hwoo merged commit 9499436 into main May 11, 2024
5 checks passed

nv-hwoo deleted the hwoo-fix-bug branch May 11, 2024 00:13

ganeshku1 pushed a commit that referenced this pull request May 11, 2024

Fix response preprocessing bug (#646)

98ebdfe

* Fix empty response bug * Fix unused variable

mc-nv pushed a commit that referenced this pull request May 13, 2024

Fix response preprocessing bug (#646)

875e8fd

* Fix empty response bug * Fix unused variable

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix response preprocessing bug #646

Fix response preprocessing bug #646

nv-hwoo commented May 10, 2024

dyastremsky left a comment

This comment was marked as outdated.

matthewkotila May 10, 2024

debermudez May 10, 2024

matthewkotila May 10, 2024 •

edited

Loading

nv-hwoo May 10, 2024 •

edited

Loading

dyastremsky May 10, 2024 •

edited

Loading

nv-hwoo May 10, 2024

IzzyPutterman May 10, 2024

nv-hwoo May 10, 2024

nv-hwoo May 10, 2024

matthewkotila May 11, 2024

nv-hwoo commented May 11, 2024

Fix response preprocessing bug #646

Fix response preprocessing bug #646

Conversation

nv-hwoo commented May 10, 2024

dyastremsky left a comment

Choose a reason for hiding this comment

This comment was marked as outdated.

matthewkotila May 10, 2024

Choose a reason for hiding this comment

debermudez May 10, 2024

Choose a reason for hiding this comment

matthewkotila May 10, 2024 • edited Loading

Choose a reason for hiding this comment

nv-hwoo May 10, 2024 • edited Loading

Choose a reason for hiding this comment

dyastremsky May 10, 2024 • edited Loading

Choose a reason for hiding this comment

nv-hwoo May 10, 2024

Choose a reason for hiding this comment

IzzyPutterman May 10, 2024

Choose a reason for hiding this comment

nv-hwoo May 10, 2024

Choose a reason for hiding this comment

nv-hwoo May 10, 2024

Choose a reason for hiding this comment

matthewkotila May 11, 2024

Choose a reason for hiding this comment

nv-hwoo commented May 11, 2024

matthewkotila May 10, 2024 •

edited

Loading

nv-hwoo May 10, 2024 •

edited

Loading

dyastremsky May 10, 2024 •

edited

Loading