Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Developer guide for new structure data source integration #966

Merged
merged 10 commits into from
Aug 26, 2023
Merged
8 changes: 7 additions & 1 deletion docs/_toc.yml
Original file line number Diff line number Diff line change
Expand Up @@ -78,8 +78,14 @@ parts:
- file: source/contribute/debugging
title: Debugging EvaDB

- file: source/contribute/new-command
- file: source/contribute/extend
title: Extending EvaDB
sections:
- file: source/contribute/new-data-source
title: Structured Data Source Integration
- file: source/contribute/new-command
title: Operators


- file: source/contribute/release
title: Releasing EvaDB
5 changes: 5 additions & 0 deletions docs/source/contribute/extend.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
Extending EvaDB
====
This document details steps invovled in extending EvaDB.

.. tableofcontents::
2 changes: 1 addition & 1 deletion docs/source/contribute/new-command.rst
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
Extending EvaDB
Operators / Commands
=============

This document details the steps involved in adding support for a new operator (or command) in EvaDB. We illustrate the process using a DDL command.
Expand Down
89 changes: 89 additions & 0 deletions docs/source/contribute/new-data-source.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,89 @@
Structured Data Source Integration
====
This document details steps invovled in adding a new structured data source integration in EvaDB.


Example Data Source Integration In EvaDB
----

- `PostgreSQL <https://github.com/georgia-tech-db/evadb/tree/master/evadb/third_party/databases/postgres>`_


Create Data Source Handler
----

1. Create a new directory at `evadb/third_party/databases/ <https://github.com/georgia-tech-db/evadb/tree/master/evadb/third_party/databases>`_
~~~~

.. note::

The directory name is also the engine name used in the `CREATE DATABASE mydb_source WITH ENGINE = "..."`. In this document, we use **mydb** as the example data source we want to integrate in EvaDB.

The directory should contain three files:

- __init__.py
- requirements.txt
- mydb_handler.py

The *__init__.py* can contain copyright information. The *requirements.txt* contains the extra python libraries that need to be installed via pip for the mydb data source.

.. note::

EvaDB will only install a data source's specific dependency libraries when a connection to the data source is created by the user via, e.g., `CREATE DATABASE mydb_source WITH ENGINE = "mydb";`.
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jiashenC This is what I have for requirements.txt. Could you help elaborate on that? Thanks!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, i see. I missed that part.


2. Implement the data source handler
~~~~

In *mydb_handler.py*, you need to implement the `DBHandler` declared at `evadb/third_party/databases/types.py <https://github.com/georgia-tech-db/evadb/blob/master/evadb/third_party/databases/types.py>`_. There are 7 functions that you need to implement:

.. code:: python

class MydbHandler(DBHandler):

def __init__(self, name: str, **kwargs):
...
def connect(self):
...
def disconnect(self):
...
def check_connection(self) -> DBHandlerStatus:
...
def get_tables(self) -> DBHandlerResponse:
...
def get_columns(self, table_name: str) -> DBHandlerResponse:
...
def execute_native_query(self, query_string: str) -> DBHandlerResponse:
...

The *get_tables* should retrieve the list of tables from the data source. The *get_columns* should retrieve the columns of a specified table from the database. The *execute_native_query* specifies how to execute the query through the data source's engine. For more details, please check the function signature and documentation at `evadb/third_party/databases/types.py <https://github.com/georgia-tech-db/evadb/blob/master/evadb/third_party/databases/types.py>`_.

You can get the data source's configuration parameters from `__init__(self, name: str, **kwargs)`. Below is an example:

.. code:: python

def __init__(self, name: str, **kwargs):
super().__init__(name)
self.host = kwargs.get("host")
self.port = kwargs.get("port")
self.user = kwargs.get("user")
self.password = kwargs.get("password")

.. note::

Those paramters will be specified when the user creates a connection to the data source: `CREATE DATABASE mydb_source WITH ENGINE = "mydb", PARAMETERS = {"host": "localhost", "port": "5432", "user": "eva", "password": "password"};`.

You can check the PostgreSQL's handler example at `evadb/third_party/databases/postgres/postgres_handler.py <https://github.com/georgia-tech-db/evadb/blob/master/evadb/third_party/databases/postgres/postgres_handler.py>`_ for ideas.


Register the Data Source Handler
----

Add your created data source handler in `get_database_handler` function at `evadb/third_party/databases/interface.py <https://github.com/georgia-tech-db/evadb/blob/master/evadb/third_party/databases/interface.py>`_. Below is an example of registering the created mydb data source:

.. code:: python

...
elif engine == "mydb":
return mod.MydbHandler(engine, **kwargs)
...

2 changes: 1 addition & 1 deletion evadb/third_party/databases/types.py
Original file line number Diff line number Diff line change
Expand Up @@ -53,7 +53,7 @@ class DBHandler:
name (str): The name associated with the database handler instance.
"""

def __init__(self, name: str):
def __init__(self, name: str, **kwargs):
self.name = name

def connect(self):
Expand Down