Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add logger extra #4329

Open
wants to merge 26 commits into
base: main
Choose a base branch
from

Conversation

olehviniarchyk
Copy link

No description provided.

@olehviniarchyk olehviniarchyk marked this pull request as ready for review April 26, 2024 14:15
vllm/config.py Show resolved Hide resolved
docs/source/conf.py Outdated Show resolved Hide resolved
@@ -200,10 +200,11 @@ def _verify_quantization(self) -> None:
f"{self.quantization} quantization is currently not "
f"supported in ROCm.")
if (self.quantization not in ["marlin", "gptq_marlin"]):
logger_data = {"quantization": self.quantization}
Copy link

@tdg5 tdg5 May 8, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When including extra data, I like to include an event field or something similar so that the logger messages can be filtered by different event types. I'm not sure what a good event name would be here, but I've taken a stab anyway

Suggested change
logger_data = {"quantization": self.quantization}
logger_data = {"event": "quantization-not-fully-optimized", quantization": self.quantization}

Including an event can be especially helpful/important in scenarios where the log message is different each time due to interpolating data.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not convinced this is something worth adding logger_data for... @robertgshaw2-neuralmagic , do you have an opinion?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On one hand, it should be cheap, but I'm not sure how much utility it offers an enterprise

@@ -964,8 +965,9 @@ def verify_with_model_config(self, model_config: ModelConfig):
"awq", "gptq"
]:
# TODO support marlin and squeezellm
logger_data = {"quantization": model_config.quantization}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added an event suggestion

Suggested change
logger_data = {"quantization": model_config.quantization}
logger_data = {"event": "quantization-not-tested", "quantization": model_config.quantization}

@robertgshaw2-neuralmagic , I'm not sure this is worth adding extra data for. Thoughts? On one hand, it should be cheap, but I'm not sure how much utility it offers an enterprise

@@ -1074,7 +1076,10 @@ def _get_and_verify_dtype(
pass
else:
# Casting between float16 and bfloat16 is allowed with a warning.
logger.warning("Casting %s to %s.", config_dtype, torch_dtype)
logger_data = {"config_dtype": config_dtype, "torch_dtype": torch_dtype}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
logger_data = {"config_dtype": config_dtype, "torch_dtype": torch_dtype}
logger_data = {"config_dtype": config_dtype, "event": "dtype-cast", "torch_dtype": torch_dtype}

Comment on lines +1123 to +1124
"possible_keys": possible_keys,
"default_max_len": default_max_len
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"possible_keys": possible_keys,
"default_max_len": default_max_len
"default_max_len": default_max_len,
"event": "model-original-max-length-unknown",
"possible_keys": possible_keys,

@@ -645,10 +645,14 @@ def _schedule_prefills(
assert num_new_tokens == num_prompt_tokens

if num_new_tokens > self.prompt_limit:
logger_data = {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure it's worth it, but we could reduce the number of object allocations if we initialize logger_data outside of the while loop and recycle the same instance each time logger_data is needed. A little clean up of the object would be required after each log call (logger_data.clear()), but that could be preferable to more object allocations depending on the nature of the while loop.

@@ -645,10 +645,14 @@ def _schedule_prefills(
assert num_new_tokens == num_prompt_tokens

if num_new_tokens > self.prompt_limit:
logger_data = {
"num_new_tokens": num_new_tokens,
Copy link

@tdg5 tdg5 May 8, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"num_new_tokens": num_new_tokens,
"event": "prefill-prompt-length-exceeded-limit",
"num_new_tokens": num_new_tokens,

@@ -338,10 +338,14 @@ def deserialize(self):
per_second = convert_bytes(deserializer.total_tensor_bytes / duration)
after_mem = get_mem_usage()
deserializer.close()

logger_data = {"total_bytes_str": total_bytes_str}
Copy link

@tdg5 tdg5 May 8, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe it's a premature optimization, but allocating multiple logger_dicts here seems excessive. I'd suggest reusing a single dict instance to reduce the number of object allocations in an effort to minimize the time spent in GC. Also, it seems like duration may be an important bit of data here.

I've included some numeric versions of some of the data since numbers often have different behavior than strings. E.g. they can be graphed more easily.

Suggested change
logger_data = {"total_bytes_str": total_bytes_str}
logger_data = {
"bytes_per_second": deserializer.total_tensor_bytes / duration,
"bytes_per_second_str": per_second,
"duration": duration,
"event": "tensor-deserialize-completed",
"memory_usage_after": after_mem,
"memory_usage_before": before_mem,
"total_bytes": deserializer.total_tensor_bytes,
"total_bytes_str": total_bytes_str,
}

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, now that I look at it more, I'd just include all the logger data in the first log message and don't include any logger_data for the other logger calls. Then there's only one logger_data instance and all the relevant data is in one place.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems like an instance where logger_data is likely useful.

@@ -298,8 +298,9 @@ def get_moe_configs(E: int, N: int,
os.path.dirname(os.path.realpath(__file__)), "configs", json_file_name)
if os.path.exists(config_file_path):
with open(config_file_path) as f:
logger_data = {"config_file_path": config_file_path}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
logger_data = {"config_file_path": config_file_path}
logger_data = {"config_file_path": config_file_path, "event": "moe-layer-configuration-loaded"}

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This file or something like it should already be added to main, I think this is junk to be removed

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think changes to this file can be dropped

@@ -111,8 +111,9 @@ def initialize_cache(self, num_gpu_blocks: int, num_cpu_blocks) -> None:
# NOTE: This is logged in the executor because there can be >1 worker
# with other executors. We could log in the engine level, but work
# remains to abstract away the device for non-GPU configurations.
logger_data = {"gpu_blocks": num_gpu_blocks, "cpu_blocks": num_cpu_blocks}
Copy link

@tdg5 tdg5 May 8, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
logger_data = {"gpu_blocks": num_gpu_blocks, "cpu_blocks": num_cpu_blocks}
logger_data = {"cpu_blocks": num_cpu_blocks, "event": "gpu-executor-cache-initialized", "gpu_blocks": num_gpu_blocks}

@@ -69,7 +69,8 @@ def initialize_cache(self, num_gpu_blocks: int,
# NOTE: `cpu block` for CPU backend is located on CPU memory but is
# referred as `gpu block`. Because we want to reuse the existing block
# management procedure.
logger.info("# CPU blocks: %d", num_gpu_blocks)
logger_data = {"num_gpu_blocks": num_gpu_blocks}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
logger_data = {"num_gpu_blocks": num_gpu_blocks}
logger_data = {"event": "cpu-executor-cache-initialized", "num_gpu_blocks": num_gpu_blocks}

vllm/engine/metrics.py Outdated Show resolved Hide resolved
vllm/engine/metrics.py Outdated Show resolved Hide resolved
Copy link

@tdg5 tdg5 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think any logger_data instances we keep should include an eventfield.

Otherwise, I think @robertgshaw2-neuralmagic will need to weigh in on a bunch of these. I think something like 7/40 off the extra logger data instances make sense to me from an enterprise perspective, and I think @robertgshaw2-neuralmagic should weigh in on the other usages.

I marked the usages that made sense to me with a 🚀 and the usages I wasn't sure about with a 👀 .

Copy link

This pull request has been automatically marked as stale because it has not had any activity within 90 days. It will be automatically closed if no further activity occurs within 30 days. Leave a comment if you feel this pull request should remain open. Thank you!

Copy link

mergify bot commented Nov 26, 2024

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @olehviniarchyk.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

@mergify mergify bot added the needs-rebase label Nov 26, 2024
@github-actions github-actions bot added unstale and removed stale labels Nov 29, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants