-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
User documentation: Generate docs from macros, makeDocumentationBuilder::build
infallable
#12822
Conversation
Cargo.toml
Outdated
@@ -48,6 +48,8 @@ members = [ | |||
"datafusion-examples", | |||
"test-utils", | |||
"benchmarks", | |||
"datafusion/macros", | |||
"datafusion/pre-macros" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pre macros needed to have Documentation structure already precreated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if we go with this approach, maybe we can put the doc structure in a crate named specifically such as datafusion/documentation
rather than something generic like pre-macro
s
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Named it doc-gen
as documentation
sort of misleading, users gonna think this is a crate user documentation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
proc-macros would also be a standard rust name for this kind of hting
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was thinking to make it proc-macro as a crate with all potential procedural macros in DF. Afaik we dont have other custom procedural macros?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't have any custom procedural macros, but I was thinking we could / should probably vendor (copy) the recursive
one to reduce our dependency chain
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not quite following.... you mean copy stacker/recurisive
macros introduced recently but this change not related to this PR?
datafusion/functions/src/math/log.rs
Outdated
@@ -37,6 +38,7 @@ use datafusion_expr::{ | |||
}; | |||
use datafusion_expr::{ScalarUDFImpl, Signature, Volatility}; | |||
|
|||
#[udf_doc(description = "log_description", example = "log_example")] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
here we put the documentation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks quite cool ❤️
One thing I would be interested in seeing is if/how a multi line description and example would look. I have had trouble with multi-line macro / docs in the past
datafusion/functions/src/math/log.rs
Outdated
@@ -472,4 +471,16 @@ mod tests { | |||
SortProperties::Unordered | |||
); | |||
} | |||
|
|||
#[test] | |||
fn test_doc() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
and the test shows it applied
WDYT should we move in this direction? |
I find this intriguing and technically very interesting! I do have a few questions:
To me in general while I find this quite interesting I wonder if the benefit outweighs the added complexity of the code generation. To be clear I am absolutely not against this approach - I just have a few concerns. |
Thanks @Omega359
I'm planning to play with to_date function as it covers also multiline comments as @alamb mentioned |
@alamb @Omega359 please have a look on real example Btw multiline seems to be working |
datafusion/pre-macros/src/lib.rs
Outdated
// under the License. | ||
|
||
#[derive(Debug, Clone)] | ||
pub struct DocumentationTest { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I copied all Documentation structures here, added Test so as not to break everything
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this looks great personally
As long as the macro has enough documentation itself, I think this would help reduce a bunch of the boiler plate (though perhaps not all)
Cargo.toml
Outdated
@@ -48,6 +48,8 @@ members = [ | |||
"datafusion-examples", | |||
"test-utils", | |||
"benchmarks", | |||
"datafusion/macros", | |||
"datafusion/pre-macros" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if we go with this approach, maybe we can put the doc structure in a crate named specifically such as datafusion/documentation
rather than something generic like pre-macro
s
use std::any::Any; | ||
use std::sync::OnceLock; | ||
|
||
#[udf_doc( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is very cool
@Omega359 what do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I dumped the example #[udf_doc ...] into my editor to do a direct comparison to the existing solution and to be honest it definitely is nicer to visually parse.
There are situations currently where this approach will not work, at least as it currently seems to be implemented. For example: bit_and_or_xor, some of the regex udfs, etc.
I can't think of a reason why not off the top of my head. I'm hoping to have time to look over your latest updates in this PR later tonight |
use std::any::Any; | ||
use std::sync::OnceLock; | ||
|
||
#[udf_doc( | ||
doc_section(include = "true", label = "Time and Date Functions"), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I worry about this since it's almost certainly going to drift at some point in the future via typos, etc. Is it possible to reference the const's in scalar_doc_sections/window_doc_sections/aggregate_doc_sections ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure if enum/constant can be used directly within the attribute, however there is a possibility to check the value in macro. So if the value is not satisfied to existing enum variants the macro can throw an exception with explaining what section names are supported
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SInce we auto generate the documentation, part of the PR review would be to look at the generated pages. I think we might be able to catch typos at that stage too
+---------------------------------------------------------------+\n\ | ||
```\n\n\ | ||
Additional examples can be found [here](https://github.com/apache/datafusion/blob/main/datafusion-examples/examples/to_date.rs)", | ||
standard_argument(name = "expression", expression_type = "String"), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is likely better as expression_type
-> prefix
let mut syntax_example: Option<LitStr> = None; | ||
let mut sql_example: Option<LitStr> = None; | ||
let mut standard_args: Vec<(Option<LitStr>, Option<LitStr>)> = vec![]; | ||
let mut udf_args: Vec<(Option<LitStr>, Option<LitStr>)> = vec![]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
might be missing related udfs here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Still missing related udfs.
Thanks @Omega359 there is bunch of things needed to be added, its not a final PR. I'll add some minor missing pieces and the PR will be ready for the review, most likely next week. Because I need to make a cleanup, remove test structures, polish crates and everything. |
I'll get back on it little later as need to finish some more urgent work with joins |
I'm closer to this |
We should also take care on handing things like #13367 |
@@ -162,7 +151,7 @@ impl ScalarUDFImpl for ToDateFunc { | |||
} | |||
|
|||
fn documentation(&self) -> Option<&Documentation> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I dont have better solutions here for now, so every impl have to copy this line because its not easy to inject/override a method into trait impl, I'll experiment more with this, but for now it is what it is
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think as long as it is well documented, this boiler plate is ok
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we could call the function something like macro_doc
or derived_documentation
to make it easier to find (aka so someone who runs across it can easily grep
and find the macro definition)
Co-authored-by: Bruce Ritchie <[email protected]>
Co-authored-by: Bruce Ritchie <[email protected]>
@Omega359 I'm planning to merge it today, getting more code conflicts, let me know if any objections |
I'm fine with that @comphead - we can file followup issues to add the alternative syntax, etc |
Filed #13553 to add missing parts |
Which issue does this PR close?
Closes #.
Rationale for this change
What changes are included in this PR?
to_date
as an exampleAre these changes tested?
Are there any user-facing changes?