-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
support hive as an adapter #559
Comments
I'd be interested in seeing this and also am open to putting some effort in to make it happen. |
I think this is also a good idea. Can the team provide more context on what is needed for this specific issue? From the docs on contributing:
|
@raybuhr dbt has historically "shipped" with adapters (snowflake, redshift, bigquery), which made it super difficult to add new adapters to the codebase. We're pretty far along the path to pulling the adapters out of dbt-proper, which should make it possible to add news ones! This is a good issue to watch, as it's the culmination of a bunch of other relevant issues: #966 We still have some work to do on this front, but we're getting there! @raybuhr do you have experience using Hive SQL? There are two parts to building an adapter:
Have you given any thought to how materializations would work on Hive? I admittedly have only played with it briefly. We'd need to figure out if it supports transactions, how to manage create table/view statements, hooks, insert/delete/merge (for incremental models), CTEs, etc etc etc Would you be interested in filling out an issue if I can supply a template? It would be something like "Request a new adapter", and you'd have to fill in answers to the questions above, and then some. It can be a community effort (cc @jrandrews) :) |
@drewbanin yes, I've used HIVE before, though am using Snowflake in my current role. I'm going to an open source hacknight tomorrow and thought this looked like a good issue to work on. I would be willing to fill out the issue if you supply the template. |
ok, I'll see if I can whip something up! |
@raybuhr just pulled this together: https://github.com/fishtown-analytics/dbt/wiki/New-Adapter-Information-Sheet It's hard to know what all of the possible questions to ask are, but I think this sheet does a decent job of collecting some critical information. Responses to questions like these will help us assess the viability of a new adapter, as well as guide the eventual implementation. Thanks so much for taking a look at this! Looking forward to your answers, as well as any feedback you may have on the template! |
Hey, I made some progress on this initially when I went to that hacknight in October, but haven't found the time since. Here's a gist of what I came up with so far. I'm hoping to be able to contribute more to it in the near future, but want open it up in case others want to help. |
Thanks @raybuhr! This is a great start! |
Hi there, |
Hey @kzarzycki! You're very right that Presto and Hive should be similar -- I think a good starting point is to check out the code in the dbt-presto plugin. You can find documentation on dbt's plugin system, as well as more information on how to build a new plugin in the docs: https://docs.getdbt.com/docs/building-a-new-adapter Let me know if you have any specific questions as you dig into these things :) |
How do you recommend we deal with different Hive versions? For instance my company is on a Hive 1.2 variant and Hive 3 has native support for materialized views. Therefore they will need different implementations for that functionality. Should it just be a single hive adapter and then version is specified in the configuration, or would it make sense to have a hive-1-adapter and hive-3-adapter? |
Hey @cprosser - I think we should start by targeting a single version of Hive -- probably the latest stable version. If the connection semantics differ between hive versions, then that could present a problem. If the differences are instead around specific functionality (like materialized views), then we can just document the matrix of supported functionality. Do you have a good handle on what the other big differences between Hive 1 and Hive 3 are? |
Well, the latest stable version isn't available to me to work with, but I just did some research and not that much is changed in the core language between Hive v1.2 that I'm on and subsequent version. The bulk of the changes occurred under the hood to improve execution time, with only a few small changes made to the date functions. The only thing that I think could make a material difference is Acid support on ORC tables which is full merge/upsert functionality. But it's very limited on the distributions that support it, and not available by default on all tables. Probably best to avoid. I'm going to see if I can make any headway with the version I'm on. It's the lowest common denominator and forces you to deal with issues that are dealt with automatically (like join ordering) in the newer versions. |
FYI for plugin maintainers: #1655 |
closing this one - out of scope for core |
Hi Everyone, |
No description provided.
The text was updated successfully, but these errors were encountered: