Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

User documentation: Generate docs from macros, makeDocumentationBuilder::build infallable #12822

Merged
merged 24 commits into from
Nov 24, 2024

Conversation

comphead
Copy link
Contributor

@comphead comphead commented Oct 8, 2024

Which issue does this PR close?

Closes #.

Rationale for this change

What changes are included in this PR?

  • Moved Document generation framework to separate crate
  • Created procedural macros crate and macros to generate user docs from rust attributes
  • Migrated UDF to_date as an example

Are these changes tested?

Are there any user-facing changes?

@github-actions github-actions bot added common Related to common crate functions labels Oct 8, 2024
@comphead
Copy link
Contributor Author

comphead commented Oct 8, 2024

@Omega359 @alamb I tried to play with custom attributes to wrap up the documentation on top of the what @Omega359 already built. I'm experimenting with just 2 fields(description and examples) and the attribute has to be placed on top of the struct.

Cargo.toml Outdated
@@ -48,6 +48,8 @@ members = [
"datafusion-examples",
"test-utils",
"benchmarks",
"datafusion/macros",
"datafusion/pre-macros"
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pre macros needed to have Documentation structure already precreated

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if we go with this approach, maybe we can put the doc structure in a crate named specifically such as datafusion/documentation rather than something generic like pre-macros

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Named it doc-gen as documentation sort of misleading, users gonna think this is a crate user documentation

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

proc-macros would also be a standard rust name for this kind of hting

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking to make it proc-macro as a crate with all potential procedural macros in DF. Afaik we dont have other custom procedural macros?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't have any custom procedural macros, but I was thinking we could / should probably vendor (copy) the recursive one to reduce our dependency chain

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not quite following.... you mean copy stacker/recurisive macros introduced recently but this change not related to this PR?

@@ -37,6 +38,7 @@ use datafusion_expr::{
};
use datafusion_expr::{ScalarUDFImpl, Signature, Volatility};

#[udf_doc(description = "log_description", example = "log_example")]
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

here we put the documentation

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks quite cool ❤️

One thing I would be interested in seeing is if/how a multi line description and example would look. I have had trouble with multi-line macro / docs in the past

@@ -472,4 +471,16 @@ mod tests {
SortProperties::Unordered
);
}

#[test]
fn test_doc() {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and the test shows it applied

@github-actions github-actions bot removed the common Related to common crate label Oct 8, 2024
@comphead
Copy link
Contributor Author

comphead commented Oct 8, 2024

WDYT should we move in this direction?

@Omega359
Copy link
Contributor

Omega359 commented Oct 9, 2024

I find this intriguing and technically very interesting! I do have a few questions:

  • What are you goals with this? I am assuming cleaner looking/easier to create docs. If so I would wonder how it would help in cases were there is significant amounts of docs vs boilerplate such as in the to_date udf.
  • Can it be made to handle n arguments?
  • Is it possible to generate rustdoc with this approach?
  • I noticed that this approach creates a Documentation object on every call. It's obviously pretty minor but it would be nice if it could be static.

To me in general while I find this quite interesting I wonder if the benefit outweighs the added complexity of the code generation. To be clear I am absolutely not against this approach - I just have a few concerns.

@comphead
Copy link
Contributor Author

comphead commented Oct 9, 2024

I find this intriguing and technically very interesting! I do have a few questions:

  • What are you goals with this? I am assuming cleaner looking/easier to create docs. If so I would wonder how it would help in cases were there is significant amounts of docs vs boilerplate such as in the to_date udf.
  • Can it be made to handle n arguments?
  • Is it possible to generate rustdoc with this approach?
  • I noticed that this approach creates a Documentation object on every call. It's obviously pretty minor but it would be nice if it could be static.

To me in general while I find this quite interesting I wonder if the benefit outweighs the added complexity of the code generation. To be clear I am absolutely not against this approach - I just have a few concerns.

Thanks @Omega359
The idea is to reuse your current approach but make it on the level of custom Rust attribute instead of having the documentation directly in the code. So the steps can look like:

  • Move Documentation structure to pre-macros crate
  • generate the code from macros instead of having it statically
  • we can have it static everything will be the same as its now, the only difference is the code will be created through the macros.

I'm planning to play with to_date function as it covers also multiline comments as @alamb mentioned

@comphead comphead mentioned this pull request Oct 10, 2024
@comphead
Copy link
Contributor Author

comphead commented Oct 10, 2024

@alamb @Omega359 please have a look on real example to_date (I still need to include argements to be called with the builder). As you can see it is the same approach as before, the only difference the documentation is set by attributes rather than code, but it generates documentation builders as before.

Btw multiline seems to be working

// under the License.

#[derive(Debug, Clone)]
pub struct DocumentationTest {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I copied all Documentation structures here, added Test so as not to break everything

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this looks great personally

As long as the macro has enough documentation itself, I think this would help reduce a bunch of the boiler plate (though perhaps not all)

Cargo.toml Outdated
@@ -48,6 +48,8 @@ members = [
"datafusion-examples",
"test-utils",
"benchmarks",
"datafusion/macros",
"datafusion/pre-macros"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if we go with this approach, maybe we can put the doc structure in a crate named specifically such as datafusion/documentation rather than something generic like pre-macros

use std::any::Any;
use std::sync::OnceLock;

#[udf_doc(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is very cool

@Omega359 what do you think?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I dumped the example #[udf_doc ...] into my editor to do a direct comparison to the existing solution and to be honest it definitely is nicer to visually parse.

There are situations currently where this approach will not work, at least as it currently seems to be implemented. For example: bit_and_or_xor, some of the regex udfs, etc.

datafusion/macros/Cargo.toml Outdated Show resolved Hide resolved
@comphead
Copy link
Contributor Author

comphead commented Oct 11, 2024

Thanks @alamb I'll consider the changes, @Omega359 is it good to have a documentation method on struct level instead of impl?

@Omega359
Copy link
Contributor

@Omega359 is it good to have a documentation method on struct level instead of impl?

I can't think of a reason why not off the top of my head. I'm hoping to have time to look over your latest updates in this PR later tonight

use std::any::Any;
use std::sync::OnceLock;

#[udf_doc(
doc_section(include = "true", label = "Time and Date Functions"),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I worry about this since it's almost certainly going to drift at some point in the future via typos, etc. Is it possible to reference the const's in scalar_doc_sections/window_doc_sections/aggregate_doc_sections ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if enum/constant can be used directly within the attribute, however there is a possibility to check the value in macro. So if the value is not satisfied to existing enum variants the macro can throw an exception with explaining what section names are supported

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SInce we auto generate the documentation, part of the PR review would be to look at the generated pages. I think we might be able to catch typos at that stage too

+---------------------------------------------------------------+\n\
```\n\n\
Additional examples can be found [here](https://github.com/apache/datafusion/blob/main/datafusion-examples/examples/to_date.rs)",
standard_argument(name = "expression", expression_type = "String"),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is likely better as expression_type -> prefix

let mut syntax_example: Option<LitStr> = None;
let mut sql_example: Option<LitStr> = None;
let mut standard_args: Vec<(Option<LitStr>, Option<LitStr>)> = vec![];
let mut udf_args: Vec<(Option<LitStr>, Option<LitStr>)> = vec![];
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

might be missing related udfs here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still missing related udfs.

@comphead
Copy link
Contributor Author

Thanks @Omega359 there is bunch of things needed to be added, its not a final PR. I'll add some minor missing pieces and the PR will be ready for the review, most likely next week. Because I need to make a cleanup, remove test structures, polish crates and everything.

@comphead comphead marked this pull request as draft October 11, 2024 19:44
@comphead
Copy link
Contributor Author

I'll get back on it little later as need to finish some more urgent work with joins

@comphead
Copy link
Contributor Author

I'm closer to this

@comphead
Copy link
Contributor Author

We should also take care on handing things like #13367

@@ -162,7 +151,7 @@ impl ScalarUDFImpl for ToDateFunc {
}

fn documentation(&self) -> Option<&Documentation> {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I dont have better solutions here for now, so every impl have to copy this line because its not easy to inject/override a method into trait impl, I'll experiment more with this, but for now it is what it is

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think as long as it is well documented, this boiler plate is ok

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we could call the function something like macro_doc or derived_documentation to make it easier to find (aka so someone who runs across it can easily grep and find the macro definition)

@comphead comphead marked this pull request as ready for review November 18, 2024 19:34
@comphead
Copy link
Contributor Author

comphead commented Nov 24, 2024

@Omega359 I'm planning to merge it today, getting more code conflicts, let me know if any objections

@Omega359
Copy link
Contributor

I'm fine with that @comphead - we can file followup issues to add the alternative syntax, etc

@comphead comphead merged commit 1a6e9f5 into apache:main Nov 24, 2024
27 checks passed
@comphead
Copy link
Contributor Author

Filed #13553 to add missing parts

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api change Changes the API exposed to users of the crate functions logical-expr Logical plan and expressions
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants