Skip to content

Commit

Permalink
Developer guide for new structure data source integration (#966)
Browse files Browse the repository at this point in the history
  • Loading branch information
xzdandy authored and aryan-rajoria committed Aug 27, 2023
1 parent 9aed030 commit 51e2e0f
Show file tree
Hide file tree
Showing 5 changed files with 103 additions and 3 deletions.
8 changes: 7 additions & 1 deletion docs/_toc.yml
Original file line number Diff line number Diff line change
Expand Up @@ -78,8 +78,14 @@ parts:
- file: source/contribute/debugging
title: Debugging EvaDB

- file: source/contribute/new-command
- file: source/contribute/extend
title: Extending EvaDB
sections:
- file: source/contribute/new-data-source
title: Structured Data Source Integration
- file: source/contribute/new-command
title: Operators


- file: source/contribute/release
title: Releasing EvaDB
5 changes: 5 additions & 0 deletions docs/source/contribute/extend.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
Extending EvaDB
====
This document details steps invovled in extending EvaDB.

.. tableofcontents::
2 changes: 1 addition & 1 deletion docs/source/contribute/new-command.rst
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
Extending EvaDB
Operators / Commands
=============

This document details the steps involved in adding support for a new operator (or command) in EvaDB. We illustrate the process using a DDL command.
Expand Down
89 changes: 89 additions & 0 deletions docs/source/contribute/new-data-source.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,89 @@
Structured Data Source Integration
====
This document details steps invovled in adding a new structured data source integration in EvaDB.


Example Data Source Integration In EvaDB
----

- `PostgreSQL <https://github.com/georgia-tech-db/evadb/tree/master/evadb/third_party/databases/postgres>`_


Create Data Source Handler
----

1. Create a new directory at `evadb/third_party/databases/ <https://github.com/georgia-tech-db/evadb/tree/master/evadb/third_party/databases>`_
~~~~

.. note::

The directory name is also the engine name used in the `CREATE DATABASE mydb_source WITH ENGINE = "..."`. In this document, we use **mydb** as the example data source we want to integrate in EvaDB.

The directory should contain three files:

- __init__.py
- requirements.txt
- mydb_handler.py

The *__init__.py* can contain copyright information. The *requirements.txt* contains the extra python libraries that need to be installed via pip for the mydb data source.

.. note::

EvaDB will only install a data source's specific dependency libraries when a connection to the data source is created by the user via, e.g., `CREATE DATABASE mydb_source WITH ENGINE = "mydb";`.

2. Implement the data source handler
~~~~

In *mydb_handler.py*, you need to implement the `DBHandler` declared at `evadb/third_party/databases/types.py <https://github.com/georgia-tech-db/evadb/blob/master/evadb/third_party/databases/types.py>`_. There are 7 functions that you need to implement:

.. code:: python
class MydbHandler(DBHandler):
def __init__(self, name: str, **kwargs):
...
def connect(self):
...
def disconnect(self):
...
def check_connection(self) -> DBHandlerStatus:
...
def get_tables(self) -> DBHandlerResponse:
...
def get_columns(self, table_name: str) -> DBHandlerResponse:
...
def execute_native_query(self, query_string: str) -> DBHandlerResponse:
...
The *get_tables* should retrieve the list of tables from the data source. The *get_columns* should retrieve the columns of a specified table from the database. The *execute_native_query* specifies how to execute the query through the data source's engine. For more details, please check the function signature and documentation at `evadb/third_party/databases/types.py <https://github.com/georgia-tech-db/evadb/blob/master/evadb/third_party/databases/types.py>`_.

You can get the data source's configuration parameters from `__init__(self, name: str, **kwargs)`. Below is an example:

.. code:: python
def __init__(self, name: str, **kwargs):
super().__init__(name)
self.host = kwargs.get("host")
self.port = kwargs.get("port")
self.user = kwargs.get("user")
self.password = kwargs.get("password")
.. note::

Those paramters will be specified when the user creates a connection to the data source: `CREATE DATABASE mydb_source WITH ENGINE = "mydb", PARAMETERS = {"host": "localhost", "port": "5432", "user": "eva", "password": "password"};`.

You can check the PostgreSQL's handler example at `evadb/third_party/databases/postgres/postgres_handler.py <https://github.com/georgia-tech-db/evadb/blob/master/evadb/third_party/databases/postgres/postgres_handler.py>`_ for ideas.


Register the Data Source Handler
----

Add your created data source handler in `get_database_handler` function at `evadb/third_party/databases/interface.py <https://github.com/georgia-tech-db/evadb/blob/master/evadb/third_party/databases/interface.py>`_. Below is an example of registering the created mydb data source:

.. code:: python
...
elif engine == "mydb":
return mod.MydbHandler(engine, **kwargs)
...
2 changes: 1 addition & 1 deletion evadb/third_party/databases/types.py
Original file line number Diff line number Diff line change
Expand Up @@ -53,7 +53,7 @@ class DBHandler:
name (str): The name associated with the database handler instance.
"""

def __init__(self, name: str):
def __init__(self, name: str, **kwargs):
self.name = name

def connect(self):
Expand Down

0 comments on commit 51e2e0f

Please sign in to comment.