Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Tabular provider #23704

Merged
merged 24 commits into from
Jul 6, 2022
Merged

Add Tabular provider #23704

merged 24 commits into from
Jul 6, 2022

Conversation

Fokko
Copy link
Contributor

@Fokko Fokko commented May 13, 2022


First version of the Tabular hook :)

^ Add meaningful description above

Read the Pull Request Guidelines for more information.
In case of fundamental code change, Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in a newsfragement file, named {pr_number}.significant.rst, in newsfragments.

Copy link

@kbendick kbendick left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @Fokko! Left some comments with questions about various things.

airflow/providers/tabular/hooks/tabular.py Outdated Show resolved Hide resolved
airflow/utils/db.py Outdated Show resolved Hide resolved
docs/apache-airflow-providers-tabular/connections.rst Outdated Show resolved Hide resolved
@@ -59,6 +59,7 @@
/airflow/providers/google/ @turbaszek
/airflow/providers/snowflake/ @turbaszek @potiuk @mik-laj
/airflow/providers/cncf/kubernetes @jedcunningham
/airflow/providers/tabular/ @Fokko
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚀

@Fokko Fokko force-pushed the fd-provider-tabular branch 2 times, most recently from eb6028a to ddfe525 Compare May 16, 2022 19:58
@Fokko Fokko requested review from XD-DENG and ashb as code owners May 17, 2022 13:28
Copy link
Contributor

@eladkal eladkal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is an ongoing discussion in mailing list about if we should add more providers
https://lists.apache.org/thread/nvfc75kj2w1tywvvkw8ho5wkx1dcvgrn

though I would say this is a unique case as a PMC is the codeowner :)

airflow/models/dag.py Outdated Show resolved Hide resolved
@Fokko Fokko force-pushed the fd-provider-tabular branch from a682089 to 7c03bf4 Compare May 17, 2022 14:01
@Fokko
Copy link
Contributor Author

Fokko commented May 17, 2022

Thanks for pointing me to the thread @eladkal

I actually started on a 3rd party provider, but setting it up within Airflow itself makes it so much easier because of Breeze :) @kbendick and I will maintain this one.

@Fokko Fokko force-pushed the fd-provider-tabular branch 2 times, most recently from b75b5a3 to 13c3d7a Compare May 17, 2022 14:56
Copy link
Member

@kaxil kaxil left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will probably be in the same boat as #22659 where we are waiting for what comes out of https://lists.apache.org/thread/nvfc75kj2w1tywvvkw8ho5wkx1dcvgrn

@Fokko Fokko force-pushed the fd-provider-tabular branch 3 times, most recently from fa180e7 to 38afb69 Compare May 17, 2022 18:20
Co-authored-by: Kyle Bendickson <[email protected]>
@Fokko Fokko force-pushed the fd-provider-tabular branch from 38afb69 to 29e6511 Compare May 17, 2022 18:52
@kaxil
Copy link
Member

kaxil commented May 17, 2022

This will probably be in the same boat as #22659 where we are waiting for what comes out of https://lists.apache.org/thread/nvfc75kj2w1tywvvkw8ho5wkx1dcvgrn

Welp, looks like @eladkal had already commented about it :)

@Fokko
Copy link
Contributor Author

Fokko commented Jun 14, 2022

@potiuk thanks for sharing the command. I just ran it and cleanup up the excess in the command hash file.

Would be great to get this in at some point, looking forward to the summary

@potiuk
Copy link
Member

potiuk commented Jun 14, 2022

Would be great to get this in at some point, looking forward to the summary

Coming :)

@Fokko
Copy link
Contributor Author

Fokko commented Jun 16, 2022

@potiuk I saw your mail: https://lists.apache.org/thread/6ngq79df7op541gfwntspdtsvzlv1cr6

It makes sense and I see the current issues. That being said, would it be possible to get this in? Both @kbendick and I will make sure that the operator will be maintained. One thing on the operator, the Tabular one will be quite small. I can also see that we release Apache Iceberg operators in the future as well, but that will probably also be just a wrapper around the Python pyiceberg library.

@potiuk
Copy link
Member

potiuk commented Jun 16, 2022

@potiuk I saw your mail: https://lists.apache.org/thread/6ngq79df7op541gfwntspdtsvzlv1cr6

It makes sense and I see the current issues. That being said, would it be possible to get this in? Both @kbendick and I will make sure that the operator will be maintained. One thing on the operator, the Tabular one will be quite small. I can also see that we release Apache Iceberg operators in the future as well, but that will probably also be just a wrapper around the Python pyiceberg library.

I believe so :). I wanted just to get some community feedback, but I think having just a commitment from someone who has an interest (be it direct stakeholder or say a company that implements a lot of integrations with the product) - even verbally and confirmed in an issue should be more than enough. Feel free to comment in the discussion I started too :).

@potiuk
Copy link
Member

potiuk commented Jun 27, 2022

Hey @Fokko, It took a bit long but I think we got to a consensus regarding new providers, and we also propose some kind of mix-governance aproach, where (among the others) the stakeholders for the future providers (which we are going to technically split to separate repositories soon) should take a bit more responsibility for maintenance:

#24680

If that does not scare you away, and you still want to add the provider to Airflow community providers, feel free to rebase the PR. I will also ask you (hopefully it will be merged soon) to rebase it after we merge #24672 - we are going to change the way how we keep depdencies for providers in order to prepare them to separate to different repository.

Let us know what you want to do, either close the PR or rebase it and lead it to completion after #24672 is merged.

@Fokko
Copy link
Contributor Author

Fokko commented Jun 29, 2022

Hey @potiuk, of course I'm still interested in getting this in! I'm working on rebasing the PR, but I'm also seeing some issues with the recent changes. Let me dig into it and get back to you

airflow/providers/tabular/CHANGELOG.rst Outdated Show resolved Hide resolved
@@ -0,0 +1,48 @@
# Licensed to the Apache Software Foundation (ASF) under one
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

new example dags should be written according to AIP-47
the example dag should be in system/providers/tabular path
see #24641 for reference

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All green now :)

airflow/providers/tabular/provider.yaml Outdated Show resolved Hide resolved
Copy link
Contributor

@josh-fell josh-fell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just some small things but LGTM.

`Tabular <https://tabular.io/>`__


Release: 0.0.1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Release: 0.0.1
Release: 1.0.0

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah.. was to quick :).. @Fokko - mind for a small follow-up?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure thing!

Installation
------------

You can install this package on top of an existing Airflow 2.1+ installation via
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
You can install this package on top of an existing Airflow 2.1+ installation via
You can install this package on top of an existing Airflow 2.2+ installation via

The minimum version for providers was bumped to 2.2.0 recently 🙂

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missed that one, I'll do a follow-up! 👍🏻

from airflow.providers.tabular.hooks.tabular import TabularHook


class TestTabularHook(unittest.TestCase):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For net-new unit tests we should prefer pytest over unittest.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I actually prefer pytest as well. Let me follow up on this one

@@ -0,0 +1,39 @@
.. Licensed to the Apache Software Foundation (ASF) under one
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we typically want to have a URI example for connection in the doc. Is that right @mik-laj?

Copy link
Member

@potiuk potiuk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks cool!

@potiuk potiuk merged commit 10273e3 into apache:main Jul 6, 2022
@Fokko Fokko deleted the fd-provider-tabular branch July 6, 2022 14:03
@kbendick
Copy link

kbendick commented Jul 6, 2022

Thank you @Fokko!

@ephraimbuddy ephraimbuddy added the changelog:skip Changes that should be skipped from the changelog (CI, tests, etc..) label Aug 15, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:dev-tools area:providers changelog:skip Changes that should be skipped from the changelog (CI, tests, etc..) kind:documentation
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants