Skip to content

Commit

Permalink
feat: improve the use case doc (v1) (#928)
Browse files Browse the repository at this point in the history
* Image classification
* Image similarity search
* Q&A application
  • Loading branch information
jiashenC authored Aug 12, 2023
1 parent bec2a15 commit a5324ba
Show file tree
Hide file tree
Showing 4 changed files with 293 additions and 15 deletions.
27 changes: 12 additions & 15 deletions docs/_toc.yml
Original file line number Diff line number Diff line change
Expand Up @@ -8,14 +8,20 @@ parts:
- file: source/overview/faq
- file: source/overview/concepts

- caption: EvaDB Tutorials
- caption: Use Cases
chapters:
- file: source/tutorials/11-similarity-search-for-motif-mining.ipynb
title: Image Similarity Search on Reddit [FAISS + Qdrant]
- file: source/tutorials/08-chatgpt.ipynb
title: ChatGPT + Whisper [OpenAI + HuggingFace]
- file: source/tutorials/image_classification.rst
title: Image Classification
- file: source/tutorials/02-object-detection.ipynb
title: Object Detection
- file: source/tutorials/03-emotion-analysis.ipynb
title: Emotion Analysis with PyTorch
title: Emotion Analysis
- file: source/tutorials/07-object-segmentation-huggingface.ipynb
title: Image Segmentation [HuggingFace]
- file: source/tutorials/similar_image_search.rst
title: Image Search [FAISS]
- file: source/tutorials/qa_video.rst
title: Q&A from Videos [ChatGPT + HuggingFace]

- caption: User Reference
chapters:
Expand Down Expand Up @@ -55,15 +61,6 @@ parts:
- file: source/overview/docker
title: Docker

- caption: More Tutorials
chapters:
- file: source/tutorials/01-mnist.ipynb
title: MNIST Image Classification
- file: source/tutorials/02-object-detection.ipynb
title: Object Detection with YOLO
- file: source/tutorials/07-object-segmentation-huggingface.ipynb
title: Image Segmentation with HuggingFace

- caption: Developer Guide
chapters:
- file: source/contribute/index
Expand Down
104 changes: 104 additions & 0 deletions docs/source/tutorials/image_classification.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,104 @@
Implementing a Image Classification Pipeline using EvaDB on a Video
====

Assume the database has loaded a video ``mnist_video``.

1. Connect to EvaDB
----

.. code-block:: python
import evadb
cursor = evadb.connect().cursor()
2. Register Image Classification Model as a Function in SQL
----

Create an image classification function from python source code.

.. code-block:: python
query = cursor.query("""
CREATE UDF IF NOT EXISTS MnistImageClassifier
IMPL 'evadb/udfs/mnist_image_classifier.py';
""").execute()
3. Execute Image Classification through SQL
----

After the function is registered to EvaDB system, it can be directly called and used in SQL query.

.. code-block:: python
query = cursor.table("mnist_video").select("MnistImageClassifier(data).label")
.. note::

SQL statement

.. code-block:: sql
SELECT MnistImageClassifier(data).label FROM mnist_video
Get results in a ``DataFrame``.

.. code-block:: python
query.df()
The result contains a projected ``label`` column, which indicates the digit of a particular frame.

.. code-block::
+------------------------------+
| mnistimageclassifier.label |
|------------------------------|
| 6 |
| 6 |
| 6 |
| 6 |
| 6 |
| 6 |
| 4 |
| 4 |
... ...
4. Optional: Process Only Segments of Videos based on Conditions
----

Like normal SQL, you can also specify conditions to filter out some frames of the video.

.. code-block:: python
query = cursor.table("mnist_video") \
.filter("id < 2") \
.select("MnistImageClassifier(data).label")
.. note::

SQL statement

.. code-block:: sql
SELECT MnistImageClassifier(data).label FROM mnist_video
WHERE id < 2
Return results in a ``DataFrame``.

.. code-block:: python
query.df()
Now, the ``DataFrame`` only contains 2 rows after filtering.

.. code-block::
+------------------------------+
| mnistimageclassifier.label |
|------------------------------|
| 6 |
| 6 |
+------------------------------+
Check out our `Jupyter Notebook <https://github.com/georgia-tech-db/evadb/blob/master/tutorials/01-mnist.ipynb>`_ for working example.
92 changes: 92 additions & 0 deletions docs/source/tutorials/qa_video.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,92 @@
Q&A Application on Videos
====

1. Connect to EvaDB
----

.. code-block:: python
import evadb
cursor = evadb.connect().cursor()
2. Register Functions
----

Whisper
****

.. code-block:: python
cursor.query("""
CREATE UDF SpeechRecognizer
TYPE HuggingFace
'task' 'automatic-speech-recognition'
'model' 'openai/whisper-base';
""").execute()
.. note::

EvaDB allows users to register any model in HuggingFace as a function.

ChatGPT
****

.. code-block:: python
cursor.query("""
CREATE UDF ChatGPT
IMPL 'evadb/udfs/chatgpt.py'
""").execute()
# Set OpenAI token
import os
os.environ["OPENAI_KEY"] = "sk-..."
.. note::

ChatGPT function is a wrapper around OpenAI API call. You can also switch to other LLM models that can run locally.

3. Summarize Video in Text
----

Create a table with text summary of the video.
Text summarization is generated by running audio-to-text ``Whisper`` model from ``HuggingFace``.

.. code-block:: python
cursor.query("""
CREATE TABLE text_summary AS
SELECT SpeechRecognizer(audio) FROM ukraine_video;
""").execute()
This results a table shown below.

.. code-block::
+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| text_summary.text |
|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| The war in Ukraine has been on for 415 days. Who is winning it? Not Russia. Certainly not Ukraine. It is the US oil companies. US oil companies have reached $200 billion in pure profits. The earnings are still on. They are still milking this war and sharing the spoils. Let us look at how Exxon mobile has been doing. In 2022, the company made $56 billion in profits. Oil companies capitalized on instability and they are profiting from pain. American oil companies are masters of this art. You may remember the war in Iraq. The US went to war in Iraq by selling a lie. The Americans did not find any weapons of mass destruction but they did find lots of oil. And in the year since, American officials have admitted this. And this story is not over. It's repeating itself in Ukraine. They are feeding another war and filling the coffers of US oil companies. |
+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
4. Q&A using ChatGPT
----

We can now embed the ChatGPT prompt inside SQL with text summary from the table as its knowledge base.

.. code-block:: python
cursor.query("""
SELECT ChatGPT('Is this video summary related to Ukraine russia war', text)
FROM text_summary;
""").df()
This query returns a projected ``DataFrame``.

.. code-block::
+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| chatgpt.response |
|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| Based on the provided context, it seems that the video summary is related to the Ukraine-Russia war. It discusses how US oil companies are allegedly profiting from the war in Ukraine, similar to how they allegedly benefited from the war in Iraq. |
+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
85 changes: 85 additions & 0 deletions docs/source/tutorials/similar_image_search.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,85 @@
Implementing a Similar Image Search Pipeline using EvaDB on Images
====

In this use case, we want to search similar images based on an image provided by the user.
To implement this use case, we leverage EvaDB's capability of easily expressing feature extraction pipeline.
Additionaly, we also leverage EvaDB's capability of building a similarity search index and searching the index to
locate similar images through ``FAISS`` library.

For this use case, we use a reddit image dataset that can be downloaded from `Here <https://www.dropbox.com/scl/fo/fcj6ojmii0gw92zg3jb2s/h\?dl\=1\&rlkey\=j3kj1ox4yn5fhonw06v0pn7r9>`_.
We populate a table in the database that contains all images.

1. Connect to EvaDB
----

.. code-block:: python
import evadb
cursor = evadb.connect().cursor()
2. Register SIFT as Function
----

.. code-block:: python
cursor.query("""
CREATE UDF IF NOT EXISTS SiftFeatureExtractor
IMPL 'evadb/udfs/sift_feature_extractor.py'
""").execute()
3. Search Similar Images
----

To locate images that have similar appearance, we will first build an index based on embeddings of images.
Then, for the given image, EvaDB can find similar images by searching in the index.

Build Index using ``FAISS``
****

The below query creates a new index on the projected column ``SiftFeatureExtractor(data)`` from the ``reddit_dataset`` table.

.. code-block:: python
cursor.query("""
CREATE INDEX reddit_sift_image_index
ON reddit_dataset (SiftFeatureExtractor(data))
USING FAISS
""").execute()
Search Index for a Given Image
****

EvaDB leverages the ``ORDER BY ... LIMIT ...`` SQL syntax to retrieve the top 5 similar images.
In this example, ``Similarity(x, y)`` is a built-in function to calculate distance between ``x`` and ``y``.
In current version, ``x`` is a single tuple and ``y`` is a column that contains multiple tuples.
By default EvaDB does pairwise distance calculation between ``x`` and all tuples from ``y``.
In this case, EvaDB leverages the index that we have already built.

.. code-block:: python
query = cursor.query("""
SELECT name FROM reddit_dataset ORDER BY
Similarity(
SiftFeatureExtractor(Open('reddit-images/g1074_d4mxztt.jpg')),
SiftFeatureExtractor(data)
)
LIMIT 5
""")
query.df()
The ``DataFrame`` contains the top 5 similar images.

.. code-block::
+---------------------------------+
| reddit_dataset.name |
|---------------------------------|
| reddit-images/g1074_d4mxztt.jpg |
| reddit-images/g348_d7ju7dq.jpg |
| reddit-images/g1209_ct6bf1n.jpg |
| reddit-images/g1190_cln9xzr.jpg |
| reddit-images/g1190_clna2x2.jpg |
+---------------------------------+
Check out our `Jupyter Notebook <https://github.com/georgia-tech-db/evadb/blob/master/tutorials/11-similarity-search-for-motif-mining.ipynb>`_ for working example.
We also demonstrate more complicated features of EvaDB for similarity search.

0 comments on commit a5324ba

Please sign in to comment.