Log the full TableMetadata #458

andrew4699 · 2024-11-19T19:21:16Z

Description

Adds more logs. This is useful for determining the types of workloads happening on Polaris.

Type of change

New feature (non-breaking change which adds functionality)

How Has This Been Tested?

Added log assertions

Checklist:

I have performed a self-review of my code
I have commented my code, particularly in hard-to-understand areas
I have made corresponding changes to the documentation
My changes generate no new warnings
If adding new functionality, I have discussed my implementation with the community using the linked GitHub issue

polaris-service/src/test/java/org/apache/polaris/service/PolarisApplicationIntegrationTest.java

eric-maynard · 2024-11-19T19:26:39Z

polaris-service/src/main/java/org/apache/polaris/service/catalog/BasePolarisCatalog.java

@@ -1254,6 +1255,9 @@ public void doRefresh() {
    public void doCommit(TableMetadata base, TableMetadata metadata) {
      LOGGER.debug(
          "doCommit for table {} with base {}, metadata {}", tableIdentifier, base, metadata);
+      LOGGER.info(
+          "doCommit full new metadata: {}",
+          PolarisObjectMapperUtil.serialize(getCurrentPolarisContext(), metadata));


This can be pretty big to log into INFO

Yea I'm both worried about the performance hit of serializing such a large object and the amount of data logged. Some possible options:

Move to DEBUG (still performs serialization, does nothing for people who run in INFO+ which should be common)

Gate the entire thing behind a featureConfiguration

Set a limit on the log output length in featureConfiguration (still performs serialization)

What are your thoughts? I'm leaning toward the 2nd one.

Move to DEBUG (still performs serialization, does nothing for people who run in INFO+ which should be common)

Good callout. Man do I miss scala.

In this case, perhaps we can do something like:

LOGGER.info( "doCommit full new metadata: {}", () -> PolarisObjectMapperUtil.serialize(getCurrentPolarisContext(), metadata));

SLF4J 2.0 supports this, and in general it might be a useful wrapper for us.

Having said all of that, I'm less worried about the serde overhead, since we do so much metadata/property serde anyway, and more about emitting a huge blob into the logs. Particularly the INFO logs which should stay high-signal.

Oh that's nice! I updated it to DEBUG and made it use the supplier syntax.

andrew4699 · 2024-11-20T19:05:52Z

I removed the ReportMetricsRequest logging as it seems like we already log it as a tag with Invoking CatalogApi with params.

snazy

Thanks for the PR @andrew4699.

As mentioned in the other comment, this can be a really huge logged string in the range of multiple megabytes of JSON. I'm not sure whether that's a good idea. My concerns around this are:

Increased cost when using a logging SaaS
Huge log files
Exposing user data in log files

I'd be much in favor of not merging this change as unconditionally, even at debug level, logging will produce a huge amount log data.

Debug logging is often used to investigate service issues - but too excessive debug logging is often not really helpful.

RussellSpitzer · 2024-11-21T15:45:07Z

I would consider this probably a "trace" level event. I agree with @snazy's comments here. I would hesitate to log these whole entities as well for security reasons.

andrew4699 · 2024-11-21T17:59:52Z

@RussellSpitzer @snazy Thank you for the feedback. I'm new to the Iceberg space and appreciate the context you have on these objects. My hope is to make it easier to see the "unstructured" data that gets self-reported by the query engines. Would you feel more comfortable if this was scoped down to printing the last 5 Snapshot ID & summaries?

snazy · 2024-11-22T11:45:45Z

@RussellSpitzer @snazy Thank you for the feedback. I'm new to the Iceberg space and appreciate the context you have on these objects. My hope is to make it easier to see the "unstructured" data that gets self-reported by the query engines. Would you feel more comfortable if this was scoped down to printing the last 5 Snapshot ID & summaries?

What's the use case for logging it at all, that cannot be done from an Iceberg client or curl?

RussellSpitzer · 2024-11-22T16:16:00Z

@RussellSpitzer @snazy Thank you for the feedback. I'm new to the Iceberg space and appreciate the context you have on these objects. My hope is to make it easier to see the "unstructured" data that gets self-reported by the query engines. Would you feel more comfortable if this was scoped down to printing the last 5 Snapshot ID & summaries?

What's the use case for logging it at all, that cannot be done from an Iceberg client or curl?

I have no problem with the functionality but I think it probably should be part of an eventing api. We may want to keep the complete history of the table somewhere (possibly not in the Catalog itself)

andrew4699 · 2024-11-22T17:33:36Z

What's the use case for logging it at all, that cannot be done from an Iceberg client or curl?

This is intended to be server-side so Polaris knows more about its callers.

I have no problem with the functionality but I think it probably should be part of an eventing api. We may want to keep the complete history of the table somewhere (possibly not in the Catalog itself)

Yes I think that would be a useful API and this change could also help the project move in that direction. In some sense these 2 changes can make the value proposition clearer by first providing low-friction access to this unstructured, self-reported data.

andrew4699 added 2 commits November 18, 2024 13:58

More metadata debug logging

08b68a1

Add tests

c92ede6

andrew4699 requested review from jbonofre, ashvina, RussellSpitzer, snazy, vvcephei, takidau, jackye1995, flyrain, eric-maynard, collado-mike, ebyhr and adutra as code owners November 19, 2024 19:21

eric-maynard reviewed Nov 19, 2024

View reviewed changes

polaris-service/src/test/java/org/apache/polaris/service/PolarisApplicationIntegrationTest.java Outdated Show resolved Hide resolved

eric-maynard reviewed Nov 19, 2024

View reviewed changes

andrew4699 added 3 commits November 19, 2024 15:33

Make the test a little better

244b08a

Undo ReportMetricsRequest logging as it's already logged

53074ea

Move to debug, arg supplier

3d46487

andrew4699 requested a review from eric-maynard November 20, 2024 19:06

andrew4699 changed the title ~~Log more things: TableMetadata and ReportMetricsRequest~~ Log the full TableMetadata Nov 20, 2024

eric-maynard approved these changes Nov 21, 2024

View reviewed changes

snazy reviewed Nov 21, 2024

View reviewed changes

andrew4699 requested a review from snazy November 21, 2024 18:40

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Log the full TableMetadata #458

Log the full TableMetadata #458

andrew4699 commented Nov 19, 2024 •

edited

Loading

eric-maynard Nov 19, 2024

andrew4699 Nov 19, 2024

eric-maynard Nov 20, 2024

andrew4699 Nov 20, 2024

andrew4699 commented Nov 20, 2024

snazy left a comment

RussellSpitzer commented Nov 21, 2024

andrew4699 commented Nov 21, 2024 •

edited

Loading

snazy commented Nov 22, 2024

RussellSpitzer commented Nov 22, 2024

andrew4699 commented Nov 22, 2024 •

edited

Loading

Log the full TableMetadata #458

Are you sure you want to change the base?

Log the full TableMetadata #458

Conversation

andrew4699 commented Nov 19, 2024 • edited Loading

Description

Type of change

How Has This Been Tested?

Checklist:

eric-maynard Nov 19, 2024

Choose a reason for hiding this comment

andrew4699 Nov 19, 2024

Choose a reason for hiding this comment

eric-maynard Nov 20, 2024

Choose a reason for hiding this comment

andrew4699 Nov 20, 2024

Choose a reason for hiding this comment

andrew4699 commented Nov 20, 2024

snazy left a comment

Choose a reason for hiding this comment

RussellSpitzer commented Nov 21, 2024

andrew4699 commented Nov 21, 2024 • edited Loading

snazy commented Nov 22, 2024

RussellSpitzer commented Nov 22, 2024

andrew4699 commented Nov 22, 2024 • edited Loading

andrew4699 commented Nov 19, 2024 •

edited

Loading

andrew4699 commented Nov 21, 2024 •

edited

Loading

andrew4699 commented Nov 22, 2024 •

edited

Loading