forked from apache/airflow
-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Add pgvector provider implementation (apache#35399)
This PR is part of our larger effort to add first-class integrations to support LLMOps that was [presented at Airflow Summit](https://www.youtube.com/watch?v=mgA6m3ggKhs&t=4s). This PR adds explicitly the pgvector Provider. https://github.com/pgvector/pgvector Email Discussion related to the effort can be found here - https://lists.apache.org/thread/0d669fmy4hn29h5c0wj0ottdskd77ktp
- Loading branch information
1 parent
b319fa1
commit 202b63d
Showing
36 changed files
with
1,036 additions
and
123 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -87,6 +87,7 @@ body: | |
- oracle | ||
- pagerduty | ||
- papermill | ||
- pgvector | ||
- pinecone | ||
- plexus | ||
- postgres | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,26 @@ | ||
.. Licensed to the Apache Software Foundation (ASF) under one | ||
or more contributor license agreements. See the NOTICE file | ||
distributed with this work for additional information | ||
regarding copyright ownership. The ASF licenses this file | ||
to you under the Apache License, Version 2.0 (the | ||
"License"); you may not use this file except in compliance | ||
with the License. You may obtain a copy of the License at | ||
.. http://www.apache.org/licenses/LICENSE-2.0 | ||
.. Unless required by applicable law or agreed to in writing, | ||
software distributed under the License is distributed on an | ||
"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY | ||
KIND, either express or implied. See the License for the | ||
specific language governing permissions and limitations | ||
under the License. | ||
``apache-airflow-providers-pgvector`` | ||
|
||
Changelog | ||
--------- | ||
|
||
1.0.0 | ||
..... | ||
|
||
Initial version of the provider. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,43 @@ | ||
# | ||
# Licensed to the Apache Software Foundation (ASF) under one | ||
# or more contributor license agreements. See the NOTICE file | ||
# distributed with this work for additional information | ||
# regarding copyright ownership. The ASF licenses this file | ||
# to you under the Apache License, Version 2.0 (the | ||
# "License"); you may not use this file except in compliance | ||
# with the License. You may obtain a copy of the License at | ||
# | ||
# http://www.apache.org/licenses/LICENSE-2.0 | ||
# | ||
# Unless required by applicable law or agreed to in writing, | ||
# software distributed under the License is distributed on an | ||
# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY | ||
# KIND, either express or implied. See the License for the | ||
# specific language governing permissions and limitations | ||
# under the License. | ||
# | ||
# NOTE! THIS FILE IS AUTOMATICALLY GENERATED AND WILL BE | ||
# OVERWRITTEN WHEN PREPARING DOCUMENTATION FOR THE PACKAGES. | ||
# | ||
# IF YOU WANT TO MODIFY IT, YOU SHOULD MODIFY THE TEMPLATE | ||
# `PROVIDER__INIT__PY_TEMPLATE.py.jinja2` IN the `dev/provider_packages` DIRECTORY | ||
# | ||
from __future__ import annotations | ||
|
||
import packaging.version | ||
|
||
__all__ = ["__version__"] | ||
|
||
__version__ = "1.0.0" | ||
|
||
try: | ||
from airflow import __version__ as airflow_version | ||
except ImportError: | ||
from airflow.version import version as airflow_version | ||
|
||
if packaging.version.parse(packaging.version.parse(airflow_version).base_version) < packaging.version.parse( | ||
"2.5.0" | ||
): | ||
raise RuntimeError( | ||
f"The package `apache-airflow-providers-pgvector:{__version__}` requires Apache Airflow 2.5.0+" | ||
) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,24 @@ | ||
# | ||
# Licensed to the Apache Software Foundation (ASF) under one | ||
# or more contributor license agreements. See the NOTICE file | ||
# distributed with this work for additional information | ||
# regarding copyright ownership. The ASF licenses this file | ||
# to you under the Apache License, Version 2.0 (the | ||
# "License"); you may not use this file except in compliance | ||
# with the License. You may obtain a copy of the License at | ||
# | ||
# http://www.apache.org/licenses/LICENSE-2.0 | ||
# | ||
# Unless required by applicable law or agreed to in writing, | ||
# software distributed under the License is distributed on an | ||
# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY | ||
# KIND, either express or implied. See the License for the | ||
# specific language governing permissions and limitations | ||
# under the License. | ||
# | ||
# NOTE! THIS FILE IS AUTOMATICALLY GENERATED AND WILL BE | ||
# OVERWRITTEN WHEN PREPARING DOCUMENTATION FOR THE PACKAGES. | ||
# | ||
# IF YOU WANT TO MODIFY IT, YOU SHOULD MODIFY THE TEMPLATE | ||
# `PROVIDER__INIT__PY_TEMPLATE.py.jinja2` IN the `dev/provider_packages` DIRECTORY | ||
# |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,80 @@ | ||
# Licensed to the Apache Software Foundation (ASF) under one | ||
# or more contributor license agreements. See the NOTICE file | ||
# distributed with this work for additional information | ||
# regarding copyright ownership. The ASF licenses this file | ||
# to you under the Apache License, Version 2.0 (the | ||
# "License"); you may not use this file except in compliance | ||
# with the License. You may obtain a copy of the License at | ||
# | ||
# http://www.apache.org/licenses/LICENSE-2.0 | ||
# | ||
# Unless required by applicable law or agreed to in writing, | ||
# software distributed under the License is distributed on an | ||
# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY | ||
# KIND, either express or implied. See the License for the | ||
# specific language governing permissions and limitations | ||
# under the License. | ||
|
||
from __future__ import annotations | ||
|
||
from airflow.providers.postgres.hooks.postgres import PostgresHook | ||
|
||
|
||
class PgVectorHook(PostgresHook): | ||
"""Extend PostgresHook for working with PostgreSQL and pgvector extension for vector data types.""" | ||
|
||
def __init__(self, *args, **kwargs) -> None: | ||
"""Initialize a PgVectorHook.""" | ||
super().__init__(*args, **kwargs) | ||
|
||
def create_table(self, table_name: str, columns: list[str], if_not_exists: bool = True) -> None: | ||
""" | ||
Create a table in the Postgres database. | ||
:param table_name: The name of the table to create. | ||
:param columns: A list of column definitions for the table. | ||
:param if_not_exists: If True, only create the table if it does not already exist. | ||
""" | ||
create_table_sql = "CREATE TABLE" | ||
if if_not_exists: | ||
create_table_sql = f"{create_table_sql} IF NOT EXISTS" | ||
create_table_sql = f"{create_table_sql} {table_name} ({', '.join(columns)})" | ||
self.run(create_table_sql) | ||
|
||
def create_extension(self, extension_name: str, if_not_exists: bool = True) -> None: | ||
""" | ||
Create a PostgreSQL extension. | ||
:param extension_name: The name of the extension to create. | ||
:param if_not_exists: If True, only create the extension if it does not already exist. | ||
""" | ||
create_extension_sql = "CREATE EXTENSION" | ||
if if_not_exists: | ||
create_extension_sql = f"{create_extension_sql} IF NOT EXISTS" | ||
create_extension_sql = f"{create_extension_sql} {extension_name}" | ||
self.run(create_extension_sql) | ||
|
||
def drop_table(self, table_name: str, if_exists: bool = True) -> None: | ||
""" | ||
Drop a table from the Postgres database. | ||
:param table_name: The name of the table to drop. | ||
:param if_exists: If True, only drop the table if it exists. | ||
""" | ||
drop_table_sql = "DROP TABLE" | ||
if if_exists: | ||
drop_table_sql = f"{drop_table_sql} IF EXISTS" | ||
drop_table_sql = f"{drop_table_sql} {table_name}" | ||
self.run(drop_table_sql) | ||
|
||
def truncate_table(self, table_name: str, restart_identity: bool = True) -> None: | ||
""" | ||
Truncate a table, removing all rows. | ||
:param table_name: The name of the table to truncate. | ||
:param restart_identity: If True, restart the serial sequence if the table has one. | ||
""" | ||
truncate_sql = f"TRUNCATE TABLE {table_name}" | ||
if restart_identity: | ||
truncate_sql = f"{truncate_sql} RESTART IDENTITY" | ||
self.run(truncate_sql) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,24 @@ | ||
# | ||
# Licensed to the Apache Software Foundation (ASF) under one | ||
# or more contributor license agreements. See the NOTICE file | ||
# distributed with this work for additional information | ||
# regarding copyright ownership. The ASF licenses this file | ||
# to you under the Apache License, Version 2.0 (the | ||
# "License"); you may not use this file except in compliance | ||
# with the License. You may obtain a copy of the License at | ||
# | ||
# http://www.apache.org/licenses/LICENSE-2.0 | ||
# | ||
# Unless required by applicable law or agreed to in writing, | ||
# software distributed under the License is distributed on an | ||
# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY | ||
# KIND, either express or implied. See the License for the | ||
# specific language governing permissions and limitations | ||
# under the License. | ||
# | ||
# NOTE! THIS FILE IS AUTOMATICALLY GENERATED AND WILL BE | ||
# OVERWRITTEN WHEN PREPARING DOCUMENTATION FOR THE PACKAGES. | ||
# | ||
# IF YOU WANT TO MODIFY IT, YOU SHOULD MODIFY THE TEMPLATE | ||
# `PROVIDER__INIT__PY_TEMPLATE.py.jinja2` IN the `dev/provider_packages` DIRECTORY | ||
# |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,49 @@ | ||
# | ||
# Licensed to the Apache Software Foundation (ASF) under one | ||
# or more contributor license agreements. See the NOTICE file | ||
# distributed with this work for additional information | ||
# regarding copyright ownership. The ASF licenses this file | ||
# to you under the Apache License, Version 2.0 (the | ||
# "License"); you may not use this file except in compliance | ||
# with the License. You may obtain a copy of the License at | ||
# | ||
# http://www.apache.org/licenses/LICENSE-2.0 | ||
# | ||
# Unless required by applicable law or agreed to in writing, | ||
# software distributed under the License is distributed on an | ||
# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY | ||
# KIND, either express or implied. See the License for the | ||
# specific language governing permissions and limitations | ||
# under the License. | ||
from __future__ import annotations | ||
|
||
from pgvector.psycopg2 import register_vector | ||
|
||
from airflow.providers.common.sql.operators.sql import SQLExecuteQueryOperator | ||
|
||
|
||
class PgVectorIngestOperator(SQLExecuteQueryOperator): | ||
""" | ||
This operator is designed for ingesting data into a PostgreSQL database with pgvector support. | ||
It inherits from the SQLExecuteQueryOperator and extends its functionality by registering | ||
the pgvector data type with the database connection before executing queries. | ||
.. seealso:: | ||
For more information on how to use this operator, take a look at the guide: | ||
:ref:`howto/operator:PgVectorIngestOperator` | ||
""" | ||
|
||
def __init__(self, *args, **kwargs) -> None: | ||
"""Initialize a new PgVectorIngestOperator.""" | ||
super().__init__(*args, **kwargs) | ||
|
||
def _register_vector(self) -> None: | ||
"""Register the vector type with your connection.""" | ||
conn = self.get_db_hook().get_conn() | ||
register_vector(conn) | ||
|
||
def execute(self, context): | ||
self._register_vector() | ||
super().execute(context) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,51 @@ | ||
# Licensed to the Apache Software Foundation (ASF) under one | ||
# or more contributor license agreements. See the NOTICE file | ||
# distributed with this work for additional information | ||
# regarding copyright ownership. The ASF licenses this file | ||
# to you under the Apache License, Version 2.0 (the | ||
# "License"); you may not use this file except in compliance | ||
# with the License. You may obtain a copy of the License at | ||
# | ||
# http://www.apache.org/licenses/LICENSE-2.0 | ||
# | ||
# Unless required by applicable law or agreed to in writing, | ||
# software distributed under the License is distributed on an | ||
# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY | ||
# KIND, either express or implied. See the License for the | ||
# specific language governing permissions and limitations | ||
# under the License. | ||
|
||
--- | ||
package-name: apache-airflow-providers-pgvector | ||
|
||
name: pgvector | ||
|
||
description: | | ||
`pgvector <https://github.com/pgvector/pgvector>`__ | ||
suspended: false | ||
|
||
versions: | ||
- 1.0.0 | ||
|
||
integrations: | ||
- integration-name: pgvector | ||
external-doc-url: https://github.com/pgvector/pgvector | ||
how-to-guide: | ||
- /docs/apache-airflow-providers-pgvector/operators/pgvector.rst | ||
tags: [software] | ||
|
||
dependencies: | ||
- apache-airflow>=2.5.0 | ||
- apache-airflow-providers-postgres>=5.7.1 | ||
- pgvector>=0.2.3 | ||
|
||
hooks: | ||
- integration-name: pgvector | ||
python-modules: | ||
- airflow.providers.pgvector.hooks.pgvector | ||
|
||
operators: | ||
- integration-name: pgvector | ||
python-modules: | ||
- airflow.providers.pgvector.operators.pgvector |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.