Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

support hive as an adapter #559

Closed
jthandy opened this issue Oct 11, 2017 · 16 comments
Closed

support hive as an adapter #559

jthandy opened this issue Oct 11, 2017 · 16 comments
Labels
adapter_plugins Issues relating to third-party adapter plugins good_first_issue Straightforward + self-contained changes, good for new contributors! help_wanted Trickier changes, with a clear starting point, good for previous/experienced contributors

Comments

@jthandy
Copy link
Member

jthandy commented Oct 11, 2017

No description provided.

@jrandrews
Copy link

I'd be interested in seeing this and also am open to putting some effort in to make it happen.

@raybuhr
Copy link

raybuhr commented Oct 23, 2018

I think this is also a good idea. Can the team provide more context on what is needed for this specific issue?

From the docs on contributing:

We recommend that you log any feature requests as issues and discuss implementation approach with the team prior to getting to work.

@drewbanin
Copy link
Contributor

@raybuhr dbt has historically "shipped" with adapters (snowflake, redshift, bigquery), which made it super difficult to add new adapters to the codebase. We're pretty far along the path to pulling the adapters out of dbt-proper, which should make it possible to add news ones!

This is a good issue to watch, as it's the culmination of a bunch of other relevant issues: #966

We still have some work to do on this front, but we're getting there!

@raybuhr do you have experience using Hive SQL? There are two parts to building an adapter:

  1. connection-level operations, responsible for connecting to the db, issuing queries, handling errors, etc
  2. sql-level operations, responsible for creating relations from dbt models

Have you given any thought to how materializations would work on Hive? I admittedly have only played with it briefly. We'd need to figure out if it supports transactions, how to manage create table/view statements, hooks, insert/delete/merge (for incremental models), CTEs, etc etc etc

Would you be interested in filling out an issue if I can supply a template? It would be something like "Request a new adapter", and you'd have to fill in answers to the questions above, and then some. It can be a community effort (cc @jrandrews) :)

@raybuhr
Copy link

raybuhr commented Oct 23, 2018

@drewbanin yes, I've used HIVE before, though am using Snowflake in my current role. I'm going to an open source hacknight tomorrow and thought this looked like a good issue to work on. I would be willing to fill out the issue if you supply the template.

@drewbanin
Copy link
Contributor

ok, I'll see if I can whip something up!

@drewbanin
Copy link
Contributor

@raybuhr just pulled this together: https://github.com/fishtown-analytics/dbt/wiki/New-Adapter-Information-Sheet

It's hard to know what all of the possible questions to ask are, but I think this sheet does a decent job of collecting some critical information. Responses to questions like these will help us assess the viability of a new adapter, as well as guide the eventual implementation. Thanks so much for taking a look at this! Looking forward to your answers, as well as any feedback you may have on the template!

@raybuhr
Copy link

raybuhr commented Nov 8, 2018

Hey, I made some progress on this initially when I went to that hacknight in October, but haven't found the time since. Here's a gist of what I came up with so far. I'm hoping to be able to contribute more to it in the near future, but want open it up in case others want to help.

@drewbanin drewbanin added help_wanted Trickier changes, with a clear starting point, good for previous/experienced contributors good_first_issue Straightforward + self-contained changes, good for new contributors! adapter labels Nov 12, 2018
@drewbanin
Copy link
Contributor

Thanks @raybuhr! This is a great start!

@kzarzycki
Copy link

Hi there,
I wanted to refresh this topic. Is there any plan to implement Hive support soon? Now when Presto support is done #1245 , which should be similar, it should be much easier to do?
I'm very interested in using dbt, but me and in my company we use hive everywhere...

@drewbanin
Copy link
Contributor

Hey @kzarzycki! You're very right that Presto and Hive should be similar -- I think a good starting point is to check out the code in the dbt-presto plugin.

You can find documentation on dbt's plugin system, as well as more information on how to build a new plugin in the docs: https://docs.getdbt.com/docs/building-a-new-adapter

Let me know if you have any specific questions as you dig into these things :)

@cprosser
Copy link

How do you recommend we deal with different Hive versions? For instance my company is on a Hive 1.2 variant and Hive 3 has native support for materialized views. Therefore they will need different implementations for that functionality. Should it just be a single hive adapter and then version is specified in the configuration, or would it make sense to have a hive-1-adapter and hive-3-adapter?

@drewbanin
Copy link
Contributor

drewbanin commented Jul 24, 2019

Hey @cprosser - I think we should start by targeting a single version of Hive -- probably the latest stable version.

If the connection semantics differ between hive versions, then that could present a problem. If the differences are instead around specific functionality (like materialized views), then we can just document the matrix of supported functionality. Do you have a good handle on what the other big differences between Hive 1 and Hive 3 are?

@cprosser
Copy link

Well, the latest stable version isn't available to me to work with, but I just did some research and not that much is changed in the core language between Hive v1.2 that I'm on and subsequent version. The bulk of the changes occurred under the hood to improve execution time, with only a few small changes made to the date functions. The only thing that I think could make a material difference is Acid support on ORC tables which is full merge/upsert functionality. But it's very limited on the distributions that support it, and not available by default on all tables. Probably best to avoid.

I'm going to see if I can make any headway with the version I'm on. It's the lowest common denominator and forces you to deal with issues that are dealt with automatically (like join ordering) in the newer versions.

@drewbanin
Copy link
Contributor

FYI for plugin maintainers: #1655

@drewbanin
Copy link
Contributor

closing this one - out of scope for core

@jtcohen6 jtcohen6 added the adapter_plugins Issues relating to third-party adapter plugins label Jul 19, 2022
@himanshuajmera
Copy link

himanshuajmera commented Aug 26, 2022

Hi Everyone,
At Cloudera, we have developed a dbt-hive adapter and it is available for the community in this repo and as a python package.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
adapter_plugins Issues relating to third-party adapter plugins good_first_issue Straightforward + self-contained changes, good for new contributors! help_wanted Trickier changes, with a clear starting point, good for previous/experienced contributors
Projects
None yet
Development

No branches or pull requests

8 participants