Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Document Schema metadata expectations #12736

Open
Tracked by #12733
alamb opened this issue Oct 3, 2024 · 4 comments
Open
Tracked by #12733

Document Schema metadata expectations #12736

alamb opened this issue Oct 3, 2024 · 4 comments
Assignees
Labels
enhancement New feature or request

Comments

@alamb
Copy link
Contributor

alamb commented Oct 3, 2024

Is your feature request related to a problem or challenge?

There is an (implicit) assumption that metadata attached to Schema is preserved during certain operations in DataFusion.

However, this expectation is clearly not well tested or documented (e.g. see #12733)

Describe the solution you'd like

I would like the assumptions documented

Describe alternatives you've considered

I suggest documentation on in https://docs.rs/datafusion/latest/datafusion/logical_expr/enum.LogicalPlan.html that explains the high level assumptions

Then add a note /link to that section from the optimizers:
https://docs.rs/datafusion/latest/datafusion/optimizer/trait.AnalyzerRule.html
https://docs.rs/datafusion/latest/datafusion/optimizer/trait.OptimizerRule.html
https://docs.rs/datafusion/latest/datafusion/physical_optimizer/trait.PhysicalOptimizerRule.html

My understanding of the high level assumptions are:

  • schema level metadata: always passed through
  • field level metadata: when there is a clear 1-1 correspondence from an input column with metadata to an output column, the metadata should be preserved

Examples

  • PROJECT(a, b+c) --> field metadata ona should be preserved, no field metadata on b+c
  • SUM(a) .. GROUP BY b --> field metadata on b is preserved, not on a

Additional context

No response

@alamb alamb added the enhancement New feature or request label Oct 3, 2024
@alamb
Copy link
Contributor Author

alamb commented Oct 3, 2024

I believe @wiedld plans to work on this

@wiedld
Copy link
Contributor

wiedld commented Oct 3, 2024

take

@alamb
Copy link
Contributor Author

alamb commented Oct 21, 2024

I will take a shot at documenting this

@alamb
Copy link
Contributor Author

alamb commented Nov 13, 2024

#13305 (comment) has some additional context

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants