Add logger extra #4329

olehviniarchyk · 2024-04-24T09:17:35Z

No description provided.

…FIG_PATH

vllm/config.py

docs/source/conf.py

examples/logging_configuration.md

tdg5 · 2024-05-08T13:21:54Z

vllm/config.py

@@ -200,10 +200,11 @@ def _verify_quantization(self) -> None:
                    f"{self.quantization} quantization is currently not "
                    f"supported in ROCm.")
            if (self.quantization not in ["marlin", "gptq_marlin"]):
+                logger_data = {"quantization": self.quantization}


When including extra data, I like to include an event field or something similar so that the logger messages can be filtered by different event types. I'm not sure what a good event name would be here, but I've taken a stab anyway

Suggested change

logger_data = {"quantization": self.quantization}

logger_data = {"event": "quantization-not-fully-optimized", quantization": self.quantization}

Including an event can be especially helpful/important in scenarios where the log message is different each time due to interpolating data.

I'm not convinced this is something worth adding logger_data for... @robertgshaw2-neuralmagic , do you have an opinion?

On one hand, it should be cheap, but I'm not sure how much utility it offers an enterprise

tdg5 · 2024-05-08T13:25:44Z

vllm/config.py

@@ -964,8 +965,9 @@ def verify_with_model_config(self, model_config: ModelConfig):
                "awq", "gptq"
        ]:
            # TODO support marlin and squeezellm
+            logger_data = {"quantization": model_config.quantization}


I've added an event suggestion

Suggested change

logger_data = {"quantization": model_config.quantization}

logger_data = {"event": "quantization-not-tested", "quantization": model_config.quantization}

@robertgshaw2-neuralmagic , I'm not sure this is worth adding extra data for. Thoughts? On one hand, it should be cheap, but I'm not sure how much utility it offers an enterprise

tdg5 · 2024-05-08T13:26:49Z

vllm/config.py

@@ -1074,7 +1076,10 @@ def _get_and_verify_dtype(
            pass
        else:
            # Casting between float16 and bfloat16 is allowed with a warning.
-            logger.warning("Casting %s to %s.", config_dtype, torch_dtype)
+            logger_data = {"config_dtype": config_dtype, "torch_dtype": torch_dtype}


Suggested change

logger_data = {"config_dtype": config_dtype, "torch_dtype": torch_dtype}

logger_data = {"config_dtype": config_dtype, "event": "dtype-cast", "torch_dtype": torch_dtype}

tdg5 · 2024-05-08T13:28:49Z

vllm/config.py

+            "possible_keys": possible_keys,
+            "default_max_len": default_max_len


Suggested change

"possible_keys": possible_keys,

"default_max_len": default_max_len

"default_max_len": default_max_len,

"event": "model-original-max-length-unknown",

"possible_keys": possible_keys,

tdg5 · 2024-05-08T13:42:02Z

vllm/core/scheduler.py

@@ -645,10 +645,14 @@ def _schedule_prefills(
                assert num_new_tokens == num_prompt_tokens

            if num_new_tokens > self.prompt_limit:
+                logger_data = {


Not sure it's worth it, but we could reduce the number of object allocations if we initialize logger_data outside of the while loop and recycle the same instance each time logger_data is needed. A little clean up of the object would be required after each log call (logger_data.clear()), but that could be preferable to more object allocations depending on the nature of the while loop.

tdg5 · 2024-05-08T13:43:15Z

vllm/core/scheduler.py

@@ -645,10 +645,14 @@ def _schedule_prefills(
                assert num_new_tokens == num_prompt_tokens

            if num_new_tokens > self.prompt_limit:
+                logger_data = {
+                    "num_new_tokens": num_new_tokens,


Suggested change

"num_new_tokens": num_new_tokens,

"event": "prefill-prompt-length-exceeded-limit",

"num_new_tokens": num_new_tokens,

tdg5 · 2024-05-08T14:46:35Z

vllm/model_executor/model_loader/tensorizer.py

@@ -338,10 +338,14 @@ def deserialize(self):
        per_second = convert_bytes(deserializer.total_tensor_bytes / duration)
        after_mem = get_mem_usage()
        deserializer.close()
+
+        logger_data = {"total_bytes_str": total_bytes_str}


Maybe it's a premature optimization, but allocating multiple logger_dicts here seems excessive. I'd suggest reusing a single dict instance to reduce the number of object allocations in an effort to minimize the time spent in GC. Also, it seems like duration may be an important bit of data here.

I've included some numeric versions of some of the data since numbers often have different behavior than strings. E.g. they can be graphed more easily.

Suggested change

logger_data = {"total_bytes_str": total_bytes_str}

logger_data = {

"bytes_per_second": deserializer.total_tensor_bytes / duration,

"bytes_per_second_str": per_second,

"duration": duration,

"event": "tensor-deserialize-completed",

"memory_usage_after": after_mem,

"memory_usage_before": before_mem,

"total_bytes": deserializer.total_tensor_bytes,

"total_bytes_str": total_bytes_str,

}

Actually, now that I look at it more, I'd just include all the logger data in the first log message and don't include any logger_data for the other logger calls. Then there's only one logger_data instance and all the relevant data is in one place.

This seems like an instance where logger_data is likely useful.

vllm/model_executor/model_loader/tensorizer.py

tdg5 · 2024-05-08T14:53:36Z

vllm/model_executor/layers/fused_moe/fused_moe.py

@@ -298,8 +298,9 @@ def get_moe_configs(E: int, N: int,
        os.path.dirname(os.path.realpath(__file__)), "configs", json_file_name)
    if os.path.exists(config_file_path):
        with open(config_file_path) as f:
+            logger_data = {"config_file_path": config_file_path}


Suggested change

logger_data = {"config_file_path": config_file_path}

logger_data = {"config_file_path": config_file_path, "event": "moe-layer-configuration-loaded"}

tdg5 · 2024-05-08T14:54:53Z

vllm/logging/new_line_formatter.py

This file or something like it should already be added to main, I think this is junk to be removed

tdg5 · 2024-05-08T14:55:07Z

vllm/logging/__init__.py

I think changes to this file can be dropped

tdg5 · 2024-05-08T14:55:55Z

vllm/executor/gpu_executor.py

@@ -111,8 +111,9 @@ def initialize_cache(self, num_gpu_blocks: int, num_cpu_blocks) -> None:
        # NOTE: This is logged in the executor because there can be >1 worker
        # with other executors. We could log in the engine level, but work
        # remains to abstract away the device for non-GPU configurations.
+        logger_data = {"gpu_blocks": num_gpu_blocks, "cpu_blocks": num_cpu_blocks}


Suggested change

logger_data = {"gpu_blocks": num_gpu_blocks, "cpu_blocks": num_cpu_blocks}

logger_data = {"cpu_blocks": num_cpu_blocks, "event": "gpu-executor-cache-initialized", "gpu_blocks": num_gpu_blocks}

tdg5 · 2024-05-08T14:57:37Z

vllm/executor/cpu_executor.py

@@ -69,7 +69,8 @@ def initialize_cache(self, num_gpu_blocks: int,
        # NOTE: `cpu block` for CPU backend is located on CPU memory but is
        # referred as `gpu block`. Because we want to reuse the existing block
        # management procedure.
-        logger.info("# CPU blocks: %d", num_gpu_blocks)
+        logger_data = {"num_gpu_blocks": num_gpu_blocks}


Suggested change

logger_data = {"num_gpu_blocks": num_gpu_blocks}

logger_data = {"event": "cpu-executor-cache-initialized", "num_gpu_blocks": num_gpu_blocks}

vllm/engine/metrics.py

vllm/model_executor/model_loader/tensorizer.py

vllm/model_executor/model_loader/weight_utils.py

tdg5

I think any logger_data instances we keep should include an eventfield.

Otherwise, I think @robertgshaw2-neuralmagic will need to weigh in on a bunch of these. I think something like 7/40 off the extra logger data instances make sense to me from an enterprise perspective, and I think @robertgshaw2-neuralmagic should weigh in on the other usages.

I marked the usages that made sense to me with a 🚀 and the usages I wasn't sure about with a 👀 .

Co-authored-by: Danny Guinther <[email protected]>

github-actions · 2024-10-28T02:04:55Z

This pull request has been automatically marked as stale because it has not had any activity within 90 days. It will be automatically closed if no further activity occurs within 30 days. Leave a comment if you feel this pull request should remain open. Thank you!

mergify · 2024-11-26T05:51:51Z

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @olehviniarchyk.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

Danny Guinther and others added 13 commits April 22, 2024 15:51

Move logger.py to logger/__init__.py to prepare for rework

c087f9d

Rework logger to enable more pythonic logging configuration

f5810a1

Extend logger to read logging config as specified by VLLM_LOGGING_CON…

fae8bf3

…FIG_PATH

Move logger back to vllm/logger

5dd5049

Fix logging auto-load logic that was lost in merge conflict

db2dee3

- Added logger extra

a39ab24

- Added some general refactoring

bb128ca

- Added some general refactoring

5daabd6

Merge branch 'main' into add-logger-extra

5f820c0

- Added some general refactoring

14ce354

- Added some general refactoring

20a7386

- Added some general refactoring

68e929c

- Added some general refactoring

2bda7f0

olehviniarchyk marked this pull request as ready for review April 26, 2024 14:15

olehviniarchyk added 3 commits April 29, 2024 14:54

Merge branch 'main' into add-logger-extra

5a91cdc

Merge branch 'main' into add-logger-extra

5397489

Merge branch 'main' into add-logger-extra

ba11dee

tdg5 reviewed May 1, 2024

View reviewed changes

vllm/config.py Show resolved Hide resolved

oviniarchyk and others added 2 commits May 1, 2024 15:59

- Added some general refactoring

e298a60

Merge branch 'main' into add-logger-extra

b91a174

tdg5 reviewed May 3, 2024

View reviewed changes

docs/source/conf.py Outdated Show resolved Hide resolved

tdg5 reviewed May 3, 2024

View reviewed changes

examples/logging_configuration.md Outdated Show resolved Hide resolved

olehviniarchyk and others added 2 commits May 6, 2024 11:44

Merge branch 'main' into add-logger-extra

b778807

- Added some general refactoring

695c73d