You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
My understanding of the high level assumptions are:
schema level metadata: always passed through
field level metadata: when there is a clear 1-1 correspondence from an input column with metadata to an output column, the metadata should be preserved
Examples
PROJECT(a, b+c) --> field metadata ona should be preserved, no field metadata on b+c
SUM(a) .. GROUP BY b --> field metadata on b is preserved, not on a
Additional context
No response
The text was updated successfully, but these errors were encountered:
Is your feature request related to a problem or challenge?
There is an (implicit) assumption that metadata attached to Schema is preserved during certain operations in DataFusion.
However, this expectation is clearly not well tested or documented (e.g. see #12733)
Describe the solution you'd like
I would like the assumptions documented
Describe alternatives you've considered
I suggest documentation on in https://docs.rs/datafusion/latest/datafusion/logical_expr/enum.LogicalPlan.html that explains the high level assumptions
Then add a note /link to that section from the optimizers:
https://docs.rs/datafusion/latest/datafusion/optimizer/trait.AnalyzerRule.html
https://docs.rs/datafusion/latest/datafusion/optimizer/trait.OptimizerRule.html
https://docs.rs/datafusion/latest/datafusion/physical_optimizer/trait.PhysicalOptimizerRule.html
My understanding of the high level assumptions are:
Examples
PROJECT(a, b+c)
--> field metadata ona
should be preserved, no field metadata onb+c
SUM(a) .. GROUP BY b
--> field metadata onb
is preserved, not ona
Additional context
No response
The text was updated successfully, but these errors were encountered: