-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[CT-403] [Feature] Allow programmatic access to macro properties #4919
Comments
You've provided a nice example of how this would be useful. We don't include macro nodes in the "graph" (i.e. the DAG used for processing nodes in dbt) at this time, and adding them in the graph per se would throw off the way that a fairly large number of things work. There might be some other place that would work to store macro information, perhaps a separate macro manifest. Do you only want to loop through macro nodes? |
Yes, sorry for the confusion - I wasn't proposing adding them to the graph, only having access to them in something like the graph, such as the separate macro manifest you mentioned. Looping through a filtered set of macro nodes is my current use case, although I'm sure some creative developers would figure out another way to use it! |
@tiffanygwilson Very cool issue, thanks for opening! I think the specific use case you're outlining, which I'll call "UDF management," deserves to be considered and tackled as its own problem—a step further than what's outlined in the linked discourse article. Should dbt be able to create arbitrary object types in the database, with arbitrary configurations (#3391)? Should we support creating UDFs/functions as their own dedicated node type (#136)? Or even, as a kind of model materialization? They're DAG-aware, logic-bearing, addressable as database objects... it sorta makes sense! More generally, not thinking about UDFs in particular — the problem you're describing is kind of like a custom dbt task: For each DAG node / project resource meeting certain selection criteria, run some SQL, using that node's configuration as input. It has a lot in common with To pull the two thoughts together: those are the same advantages that UDF-as-model-materialization has over UDF-as-macro-of-macros, since the former uses a "real" task ( |
👍 on the "UDF (but not only) management, it is definitely a needed feature. I ended up on this thread because it is the most recent one on the topic and I felt I needed to add my voice/use case for this feature. A UDF is a way to keep queries more clean, store the logic somewhere else, keeping it compact and easier to maintain. Not everything can be solved with a macro that expands to inline SQL, because sometimes UDF are not even written in SQL, for example an external UDF in Databricks. In other threads I saw considerations about UDF performances, well, I would live them to the developer or team creating UDFs. Maybe the trade-off speed vs code maintainability is acceptable for them. We are also using the above mentioned method to create UDFs with macros, but what would be really nice to have is the possibility to invoke UDFs like
or
But I understand that this can be tricky and requires some work to be done. The benefit will be to have UDFs (but other objects too) as first class citizens in dbt. |
@francescomucio: With something like DDO (or your own equivalent implementation), you can reference UDFs (or any other Snowflake object) like a regular DBT model:
@tiffanygwilson: This might also be interesting for you. 👆 |
We can expose "macros" variable in Jinja context, similar to other jinja context variables (not dbt project variables). We may not want to update macro properties with this, but just reading the properties should be ok. |
We're thinking more about UDF management, and it's increasingly likely we treat these as a first-class resource type in a future release: #5741
Definitely read-only, like the other properties in @karunpoudel To be honest, I'm still not convinced that this is a good change. Could you say more about your intended use case? What are you hoping to be able to do here? Given that UDF management was the original motivation in this issue, I'd much rather we tackle that problem head-on, as I think it deserves. |
My main goal is to capture the state of a model and it's dependent macros when a model was executed, and use that information in --state parameter during SlimCI. Different models in a project can be refreshed in different schedule (some daily, some hourly, weekly etc) so manifest.json from any single run would not represent current state of a database. I am trying to save the definition of each model and it macros in a database using a macro in post_hook. This post_hook could be set in project.yml. Then, I am going to use above data in database to generate a manifest.json for --state parameter. Since same macro can have different definition (at different point in time) for different models, macro definition should be part of a resource node for this use-case. I am thinking of adding a field in ParsedNode called "dependent_macro_state" that will have it dependent macro's definition. When --state parameter is used with a new parameter called For the 1st part, currently, we can already get the model definition from |
Even if UDF becomes 1st class object, people might want to access macros definition in jinja context for various use case. |
I'd like programmatic access to macro properties, so I can make a list of macros based on their folder location. |
The code that we use to load macros and the order in which we load them is fragile, complicated and subject to change (we're working on a future change to that right now). Unfortunately allowing this to work the way that you request would only complicate further an already complicated area of the code and would constitute a programmatic interface which would make some changes harder or impossible. For example, most macros cannot be counted on to return correct values at parse time. We do often discuss ways that we could make macros more useful, particularly for simpler string related settings, but what's proposed here would only make those solutions more difficult. Embedding data in macro files to be retrieved in other places is an anti-pattern that can be solved in a number of other ways, other than using macros. We we're going to close this ticket for now. If you have additional information that you'd like to add to the discussion, please do. |
Is there an existing feature request for this?
Describe the Feature
I would like to be able to access the properties of macros, such as names and tags, at runtime in a similar way as you can access the properties of sources using
graph.sources.values()
.Describe alternatives you've considered
It is possible to write a python script to parse a
macros.yml
file that contains macro properties to locate specific keywords. However, since dbt cannot execute arbitrary python scripts during execution, any functionality gained by said scripts requires developers to take an extra step of executing manually, which is likely to be forgotten since it is not a standard part of dbt.Who will this benefit?
This feature would benefit users who want to use jinja to dynamically generate sql based on the current properties of macros. For example, I am using the method described here to manage Snowflake user-defined functions in dbt. While it is not difficult to add the "create_function" macro to a list in a
create_udfs()
macro, it would be really nice to save developers that step. Using jinja templating in thecreate_udfs()
macro, one could loop through the macros configured in a yaml file (which, while not necessary, we want for the sake of documenting the arguments and output of each macro/function) and dynamically add any function with the tagudf
to thecreate_udfs()
macro.I am currently using a similar method to populate a macro using the tags applied to sources via
graph.sources.values()
. It would be useful for anything configured in ayaml
file to have those properties available when running dbt.Are you interested in contributing this feature?
Unfortunately I do not have the bandwidth right now to try this as a first contribution.
Anything else?
Illustrative example of how this could work if macros were in the graph:
The text was updated successfully, but these errors were encountered: