-
-
Notifications
You must be signed in to change notification settings - Fork 5.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add logger extra #4329
base: main
Are you sure you want to change the base?
Add logger extra #4329
Conversation
@@ -200,10 +200,11 @@ def _verify_quantization(self) -> None: | |||
f"{self.quantization} quantization is currently not " | |||
f"supported in ROCm.") | |||
if (self.quantization not in ["marlin", "gptq_marlin"]): | |||
logger_data = {"quantization": self.quantization} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When including extra data, I like to include an event
field or something similar so that the logger messages can be filtered by different event
types. I'm not sure what a good event
name would be here, but I've taken a stab anyway
logger_data = {"quantization": self.quantization} | |
logger_data = {"event": "quantization-not-fully-optimized", quantization": self.quantization} |
Including an event
can be especially helpful/important in scenarios where the log message is different each time due to interpolating data.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not convinced this is something worth adding logger_data
for... @robertgshaw2-neuralmagic , do you have an opinion?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On one hand, it should be cheap, but I'm not sure how much utility it offers an enterprise
@@ -964,8 +965,9 @@ def verify_with_model_config(self, model_config: ModelConfig): | |||
"awq", "gptq" | |||
]: | |||
# TODO support marlin and squeezellm | |||
logger_data = {"quantization": model_config.quantization} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've added an event
suggestion
logger_data = {"quantization": model_config.quantization} | |
logger_data = {"event": "quantization-not-tested", "quantization": model_config.quantization} |
@robertgshaw2-neuralmagic , I'm not sure this is worth adding extra data for. Thoughts? On one hand, it should be cheap, but I'm not sure how much utility it offers an enterprise
@@ -1074,7 +1076,10 @@ def _get_and_verify_dtype( | |||
pass | |||
else: | |||
# Casting between float16 and bfloat16 is allowed with a warning. | |||
logger.warning("Casting %s to %s.", config_dtype, torch_dtype) | |||
logger_data = {"config_dtype": config_dtype, "torch_dtype": torch_dtype} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
logger_data = {"config_dtype": config_dtype, "torch_dtype": torch_dtype} | |
logger_data = {"config_dtype": config_dtype, "event": "dtype-cast", "torch_dtype": torch_dtype} |
"possible_keys": possible_keys, | ||
"default_max_len": default_max_len |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"possible_keys": possible_keys, | |
"default_max_len": default_max_len | |
"default_max_len": default_max_len, | |
"event": "model-original-max-length-unknown", | |
"possible_keys": possible_keys, |
@@ -645,10 +645,14 @@ def _schedule_prefills( | |||
assert num_new_tokens == num_prompt_tokens | |||
|
|||
if num_new_tokens > self.prompt_limit: | |||
logger_data = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure it's worth it, but we could reduce the number of object allocations if we initialize logger_data
outside of the while
loop and recycle the same instance each time logger_data
is needed. A little clean up of the object would be required after each log call (logger_data.clear()
), but that could be preferable to more object allocations depending on the nature of the while
loop.
@@ -645,10 +645,14 @@ def _schedule_prefills( | |||
assert num_new_tokens == num_prompt_tokens | |||
|
|||
if num_new_tokens > self.prompt_limit: | |||
logger_data = { | |||
"num_new_tokens": num_new_tokens, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"num_new_tokens": num_new_tokens, | |
"event": "prefill-prompt-length-exceeded-limit", | |
"num_new_tokens": num_new_tokens, |
@@ -338,10 +338,14 @@ def deserialize(self): | |||
per_second = convert_bytes(deserializer.total_tensor_bytes / duration) | |||
after_mem = get_mem_usage() | |||
deserializer.close() | |||
|
|||
logger_data = {"total_bytes_str": total_bytes_str} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe it's a premature optimization, but allocating multiple logger_dicts
here seems excessive. I'd suggest reusing a single dict instance to reduce the number of object allocations in an effort to minimize the time spent in GC. Also, it seems like duration may be an important bit of data here.
I've included some numeric versions of some of the data since numbers often have different behavior than strings. E.g. they can be graphed more easily.
logger_data = {"total_bytes_str": total_bytes_str} | |
logger_data = { | |
"bytes_per_second": deserializer.total_tensor_bytes / duration, | |
"bytes_per_second_str": per_second, | |
"duration": duration, | |
"event": "tensor-deserialize-completed", | |
"memory_usage_after": after_mem, | |
"memory_usage_before": before_mem, | |
"total_bytes": deserializer.total_tensor_bytes, | |
"total_bytes_str": total_bytes_str, | |
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, now that I look at it more, I'd just include all the logger data in the first log message and don't include any logger_data
for the other logger calls. Then there's only one logger_data
instance and all the relevant data is in one place.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems like an instance where logger_data
is likely useful.
@@ -298,8 +298,9 @@ def get_moe_configs(E: int, N: int, | |||
os.path.dirname(os.path.realpath(__file__)), "configs", json_file_name) | |||
if os.path.exists(config_file_path): | |||
with open(config_file_path) as f: | |||
logger_data = {"config_file_path": config_file_path} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
logger_data = {"config_file_path": config_file_path} | |
logger_data = {"config_file_path": config_file_path, "event": "moe-layer-configuration-loaded"} |
vllm/logging/new_line_formatter.py
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This file or something like it should already be added to main
, I think this is junk to be removed
vllm/logging/__init__.py
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think changes to this file can be dropped
@@ -111,8 +111,9 @@ def initialize_cache(self, num_gpu_blocks: int, num_cpu_blocks) -> None: | |||
# NOTE: This is logged in the executor because there can be >1 worker | |||
# with other executors. We could log in the engine level, but work | |||
# remains to abstract away the device for non-GPU configurations. | |||
logger_data = {"gpu_blocks": num_gpu_blocks, "cpu_blocks": num_cpu_blocks} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
logger_data = {"gpu_blocks": num_gpu_blocks, "cpu_blocks": num_cpu_blocks} | |
logger_data = {"cpu_blocks": num_cpu_blocks, "event": "gpu-executor-cache-initialized", "gpu_blocks": num_gpu_blocks} |
@@ -69,7 +69,8 @@ def initialize_cache(self, num_gpu_blocks: int, | |||
# NOTE: `cpu block` for CPU backend is located on CPU memory but is | |||
# referred as `gpu block`. Because we want to reuse the existing block | |||
# management procedure. | |||
logger.info("# CPU blocks: %d", num_gpu_blocks) | |||
logger_data = {"num_gpu_blocks": num_gpu_blocks} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
logger_data = {"num_gpu_blocks": num_gpu_blocks} | |
logger_data = {"event": "cpu-executor-cache-initialized", "num_gpu_blocks": num_gpu_blocks} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think any logger_data
instances we keep should include an event
field.
Otherwise, I think @robertgshaw2-neuralmagic will need to weigh in on a bunch of these. I think something like 7/40 off the extra logger data instances make sense to me from an enterprise perspective, and I think @robertgshaw2-neuralmagic should weigh in on the other usages.
I marked the usages that made sense to me with a 🚀 and the usages I wasn't sure about with a 👀 .
Co-authored-by: Danny Guinther <[email protected]>
Co-authored-by: Danny Guinther <[email protected]>
Co-authored-by: Danny Guinther <[email protected]>
Co-authored-by: Danny Guinther <[email protected]>
This pull request has been automatically marked as stale because it has not had any activity within 90 days. It will be automatically closed if no further activity occurs within 30 days. Leave a comment if you feel this pull request should remain open. Thank you! |
This pull request has merge conflicts that must be resolved before it can be |
No description provided.