-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
inline documentation rendered as eg. Markdown #375
Comments
This is absolutely where my head is at. I think this could look something like Javadoc. That, coupled with some dbt queries to infer the schemas of the models, plus constraint info provided by schema tests, would be incredibly cool. |
@dwallace0723 I understand that you've been doing some documentation work within a dbt project -- what kinds of things do you find yourself documenting? We've thought about
Is that about congruent with your experiences? |
@drewbanin So far, I've been using documentation mostly for the third bullet point you listed. That being said, I can definitely see myself needing proper documentation for the first two bullets as well. The way I've been doing this for now is having a comment above a line of SQL that links to the corresponding piece of documentation located in a .md file within the dbt project directory. In practice, the flow looks like this: Not the most seamless experience, but it gets the job done for now. |
that is cool |
I'd be interested in working with someone on a minimal initial version of this. I'd be perfectly happy with the following minimal features:
Something like:
Stripping these out via regex and dropping them into a single markdown or similar file shouldn't be a lot of work. Dealing with references wouldn't be too bad either. Thoughts? Tons of room for enhancements, but even this would be fantastic. |
@HarlanH I'm super happy to see some pseudo code for this :) I like the idea of using
^ I think that's super gross, and would prefer something that looks a little more like regular SQL. Another option is to repurpose the my_model:
# same as before
constraints:
not_null:
- field_1
- field_2
# also provide for block comments, etc
documentation:
- field: field_1
comment: This field is made of grass
- field: field_2
comment: This field is made of Athenry While this is obviously more readily parseable, i do really like the idea of colocating the docs with the code. There's also the question of where these markdown docs get compiled to, if they're versioned, etc etc. So, still some fundamental and unanswered questions here I think. I'd love to keep pulling on this thread, so please let us know if you have more thoughts here! |
I guess I'm OK with convention over code for the column docs, and agree that jinja's pretty ugly for this sort of thing. I also am unsure about how to handle the first of these:
I definitely would not want to embed that in a jinja block... On the other hand, I'd expect the output as written to look something like:
At least that'd be perfectly adequate initially. If the line can't be parsed into I would prioritize in-line docs over a solution like leveraging the schema, even if it reduces functionality -- too much sad experience trying to keep separate files in sync! On compilation, I was thinking of a On versioning, what do you have in mind? |
I wrote a quick-and-dirty standalone script that seems to do a fair amount of what I want. It's regex-heavy, so a bit fragile, but seems to be a decent start. https://gist.github.com/HarlanH/277006989774372515c7130e63809315 |
@HarlanH cool! Unsure if you're aware, but dbt generates a Networkx graph file in |
I also would prefer to define the comments in the model file, close to the query it documents. If the issue is parsing sql reliably, can we not define the commens in a block that is easy to isolate from the query, similar to the way configuration code is inserted? I would prefer a syntax that allows me to define start and end of comment block and to put comments in a very readable format. Something like yml could work but something less verbose would be even better. As far a output, I would love to see an interactive, graphical representation of the network chart that shows the model doc and compiled sql upon you clicking a on a node. It would also be good to have a printable format but I don't know how to make this work with large graphs. |
Pedro, yeah, the biggest design trade-off seems to me to be whether you want to wrap the SELECT columns in non-SQL syntax, ala Drew's suggestion, or if you want to try to parse the SQL (either buggily with regexes, or with a real parser), or if you want to force the author to repeat herself. Drew, does the existing dbt implementation actually run a SQL parser at any point? |
@HarlanH dbt does do an iota of sql parsing, but the plan is to get rid of that soon. I definitely think some sort of jinja function is the way to go here. For me, the core unanswered questions are:
For 1) I think we'll just need to play around with some different options. It could be jinja functions, or some sort of jinja block.... tbd Point 2) is a little more open-ended. It's tempting to just write a ton of python code to generate markdown files, but I really like the idea of making dbt mostly unaware of things like documentation. Instead, dbt would provide all of the relevant information, and we could encode the documentation logic into something like a package which could very well be shipped along with dbt. dbt already has a notion of "analysis" files which are compiled, but not executed. I think in a similar way, these docs would be "compiled" to markdown, but of course not executed. We'd just need to figure out templating, expose the list of models / comments in the context, and figure out how to change the output directory for the compiled markdown. ^ That's kind of where my head is at. Does that seem reasonable? |
Hm, I don't really like this syntax for SELECT clauses, especially for really complex column definitions:
Does the contents of a Could something like this be made to work, or does it violate dbt design assumptions?
|
+1 to this feature! Ultimately, I want to build a databook (like this!) that is tightly coupled to my code - one thing I've thought about is if I could store these definitions in another table, similar to Something that I find interesting about this approach:
is that you could then potentially open up the door to referencing previous columns when building fields (I miss this feature of LookML and find myself repeating code or having to write lots of cascading CTEs). e.g. (running with cats and dogs, let's pretend this is a list of people in my office and the pets they own, and that the only pets you can have are cat or dogs)
Agree that then gets a hard to read, and that probably strays too far from regular SQL, and also is not the point of this issue, but just a thought 😸 |
Funny I was thinking something exactly along the same lines as @HarlanH mentioned here parsing sql comments. What felt good about that was that it still serves some value as raw sql:
is still valid sql and even serves as local documentation if nothing else.
Could probably clean it up further with a macro, but maybe doing something similar to what Drew said inside the models? Another thing that seemed interesting would be a config value to "inherit" comments from ephemeral models - ie define the comments as far up the pipe as possible (or as close to the defining transforms as possible). |
Just a note that I've got a lot of columns now that are defined by macros, so column comments often can't extract a useful name from the pre-macro SQL (although I suppose if the comment-extractor worked on the compiled versions? hm...)
|
Closing this - docs site is going out in 0.11.0 on 9/6 :) |
This is from @pedromachado in slack:
I've been thinking that it would be good to have a documentation feature built into dbt. Perhaps something that could leverage 1) especially formatted comments embedded in the SQL and 2) the model dependency graph to generate some documentation or at least a diagram with all the tables/columns.
The text was updated successfully, but these errors were encountered: