-
Notifications
You must be signed in to change notification settings - Fork 504
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update star macro to use adapter.quote #205
Conversation
The default identifier using the double quotes is not working when using Spark as backend. We should use backticks in this case. Also, there was a typo in the name `identifer.sql` so I renamed it to `identifier.sql`
I have started working on adding spark integration here: https://github.com/bjgbeelen/dbt-utils/tree/add-spark-integration I've got a local docker-compose setup against which I can run the tests. There's still some work to do :-) dbt seeds: |
Hey @bjgbeelen - this looks really cool! @jtcohen6 spoke about this PR the other day. We think that the best answer here would be to not include spark-specific logic in this package. Instead, we want to make dbt able to better share macros across packages. In that world, we could have a We need to make some code changes in dbt to support this, and it's not something that's going to happen very soon, but I would like to prioritize it for a future release. Until then, @jtcohen6, what do you think? Should we just add some spark-specific logic in here and remove it when dbt's capabilities improve? Or do you think we'd be better off keeping spark out of Also curious to hear what you think @bjgbeelen and @Fokko :) |
I agree with all of the above @drewbanin. I'm in favor of prioritizing the requisite changes to dbt-core to support package "descendants," e.g. Along those lies, I played around with an attempt to implement I ended up having to copy-paste a lot of code, and a much more elegant implementation would require the changes to adapter macros which you discuss here: dbt-labs/dbt-core#2301
Personally, I'm in favor of lowering the drawbridge to non-core adapters—especially Spark, which is our best-supported and most widely used. The code should all look the same, here and now vs. later and elsewhere. It'll just be a matter of copy-pasting all Of course, anything Spark-specific or Spark-exclusive should go in one of @clrcrl: I recognize the inclusion of non-core adapters may add some maintenance burden to |
Thanks @jtcohen6 - I'm supportive of that change, but also want to hear what @clrcrl thinks too :) |
Thanks all for the discussion so far! First, some context on my PR: At a client we are using dbt together with Spark. I was in need for a solution to exclude some columns from my selection and that's how I got to try the I like the idea about a dbt-utils(-core) and adapter specific packages with macro overrides. It would be great though if in the mean time the community, like my example at the client, can already benefit from dbt-utils with other adapters when needed. Especially if it is eventually a case of just copying the I'm not sure what is considered core, but I'd feel you'd also want a More specifically for the So regarding (eventually) adding integration tests against Spark (either in this package or the
That would open up possibilities to also have local testing support (against only Postgres as a start) and reusing the same steps for both local testing as well as circleci integration. Then I can build further upon that to run against a Spark container setup (in a separate branch still)? |
@bjgbeelen You raise a good point! You're right—BigQuery is often just as idiosyncratic as Spark, and it requires its own implementations of macros in this package. The question of "what is considered core" is in some sense a historical construct, and in another sense it reflects our degree of ongoing support. Postgres, Redshift, Snowflake, and BigQuery are all "core" adapters; we will always ship new versions of While we also maintain
Again, you're right — this macro, In the general case, though, we'll need a new version of adapter macros that enables "child" packages to override "parent" macros. E.g. |
star
macro to use adapter.quote
star
macro to use adapter.quote
It seems that that whole No need for Spark specific overrides yet :-) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This LGTM! Going to ship it when the integration tests pass :)
Hmm I'm not sure if/how my change relates to the snowflake |
Oops- I actually spoke to soon in my last comment. Can you revert the deletion of the |
@drewbanin they passed now after you re-triggered them? |
@bjgbeelen just wanted to make sure you saw my comment above #205 (comment) |
Hi @drewbanin , i indeed overlooked your comments as I was distracted by the (snow)flaky test that failed ;-) But I agree about not making a breaking change. It didn't come to mind yet that the I reverted the deletion (I did still correct the spelling error). I also added a deprecation warning. I think it is cleaner to get rid of this one eventually. |
Thanks for sticking with it @bjgbeelen! This LGTM - going to merge it now :) |
The default identifier using the double quotes is not working when using Spark as backend.
We should use backticks in this case.
Also, there was a typo in the name
identifer.sql
so I renamed it toidentifier.sql
Regarding testing, i have tested this in a local setup (using the docker-compose from the
spark-dbt
package). Adding Spark in the integration tests is probably a separate PR on its own, which I'm happy to investigate when I find the time