Skip to content

Commit

Permalink
Polish search API, frontend and backend (#28)
Browse files Browse the repository at this point in the history
* updates
  • Loading branch information
rbs333 authored Oct 2, 2024
1 parent cbff994 commit 0fa0d9f
Show file tree
Hide file tree
Showing 93 changed files with 30,080 additions and 12,078 deletions.
51 changes: 51 additions & 0 deletions .github/workflows/run_tests.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,51 @@
name: Test Suite

on:
pull_request:
branches:
- main

push:
branches:
- main

jobs:
test:
name: Python ${{ matrix.python-version }} - ${{ matrix.connection }} [redis-stack ${{matrix.redis-stack-version}}]
runs-on: ubuntu-latest

strategy:
fail-fast: false
matrix:
python-version: ["3.11"]
redis-stack-version: ['latest']

services:
redis:
image: redis/redis-stack-server:${{matrix.redis-stack-version}}
ports:
- 6379:6379

steps:
- uses: actions/checkout@v2
- name: Set up Python ${{ matrix.python-version }}
uses: actions/setup-python@v4
with:
python-version: ${{ matrix.python-version }}
cache: 'pip'

- name: Install Poetry
uses: snok/install-poetry@v1

- name: Install dependencies
working-directory: ./backend
run: |
poetry install --all-extras
- name: Run tests
env:
OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }}
COHERE_API_KEY: ${{ secrets.COHERE_API_KEY }}
working-directory: ./backend
run: |
poetry run test
8 changes: 8 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -1,9 +1,17 @@
arxiv-metadata-oai-snapshot.json
arxiv-papers-1000.json
arxiv.zip
*.DS_STORE
*.log
.env
.ipynb_checkpoints
*.pkl
.venv
venv
__pycache__
new_backend/arxivsearch/templates/
*/.nvm
.coverage*
coverage.*
htmlcov/
legacy-data/
24 changes: 24 additions & 0 deletions .vscode/launch.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,24 @@
{
// Use IntelliSense to learn about possible attributes.
// Hover to view descriptions of existing attributes.
// For more information, visit: https://go.microsoft.com/fwlink/?linkid=830387
"version": "0.2.0",
"configurations": [
{
"name": "Python Debugger: FastAPI",
"type": "debugpy",
"cwd": "${workspaceFolder}/backend/",
"env": {
"PYTHONPATH": "${cwd}"
},
"request": "launch",
"module": "uvicorn",
"args": [
"arxivsearch.main:app",
"--port=8888",
"--reload"
],
"jinja": true,
}
]
}
8 changes: 8 additions & 0 deletions .vscode/settings.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
{
"python.testing.pytestArgs": [
"backend"
],
"python.testing.unittestEnabled": false,
"python.testing.pytestEnabled": true,
"python.testing.cwd": "${workspaceFolder}/backend/",
}
31 changes: 20 additions & 11 deletions Dockerfile
Original file line number Diff line number Diff line change
@@ -1,38 +1,47 @@
FROM node:18.8-alpine AS ReactImage
FROM node:22.0.0 AS ReactImage

WORKDIR /app/frontend

ENV NODE_PATH=/app/frontend/node_modules
ENV PATH=$PATH:/app/frontend/node_modules/.bin

COPY ./frontend/package.json ./
RUN yarn install --no-optional
RUN npm install

ADD ./frontend ./
RUN yarn build
RUN npm run build


FROM python:3.9-slim-buster AS ApiImage
FROM python:3.11 AS ApiImage

ENV PYTHONUNBUFFERED 1
ENV PYTHONDONTWRITEBYTECODE 1

RUN python3 -m pip install --upgrade pip setuptools wheel

WORKDIR /app/
COPY ./data/ ./data
VOLUME [ "/data" ]

RUN apt-get update && \
apt-get install -y curl && \
rm -rf /var/lib/apt/lists/*

RUN curl -sSL https://install.python-poetry.org | POETRY_HOME=/opt/poetry python && \
cd /usr/local/bin && \
ln -s /opt/poetry/bin/poetry && \
poetry config virtualenvs.create false

RUN mkdir -p /app/backend

# copy deps first so we don't have to reload everytime
COPY ./backend/poetry.lock ./backend/pyproject.toml ./backend/

WORKDIR /app/backend
RUN poetry install --all-extras --no-interaction

COPY ./backend/ .
RUN pip install -e . --no-cache-dir

# add static react files to fastapi image
COPY --from=ReactImage /app/frontend/build /app/backend/arxivsearch/templates/build

LABEL org.opencontainers.image.source https://github.com/RedisVentures/redis-arxiv-search

WORKDIR /app/backend/arxivsearch

CMD ["sh", "./entrypoint.sh"]
CMD ["poetry", "run", "start-app"]
70 changes: 59 additions & 11 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,11 +1,12 @@

<div align="center">
<a href="https://github.com/RedisVentures/redis-arXiv-search"><img src="https://github.com/RedisVentures/redis-arXiv-search/blob/main/backend/arxivsearch/data/redis-logo.png?raw=true" width="30%"><img></a>
<a href="https://github.com/RedisVentures/redis-arXiv-search"><img src="https://redis.io/wp-content/uploads/2024/04/Logotype.svg?raw=true" width="30%"><img></a>
<br />
<br />
<div display="inline-block">
<a href="https://docsearch.redisvl.com"><b>Hosted Demo</b></a>&nbsp;&nbsp;&nbsp;
<a href="https://github.com/RedisVentures/redis-arXiv-search"><b>Code</b></a>&nbsp;&nbsp;&nbsp;
<a href="https://github.com/redis-developer/redis-ai-resources"><b>More AI Recipes</b></a>&nbsp;&nbsp;&nbsp;
<a href="https://datasciencedojo.com/blog/ai-powered-document-search/"><b>Blog Post</b></a>&nbsp;&nbsp;&nbsp;
<a href="https://redis.io/docs/interact/search-and-query/advanced-concepts/vectors/"><b>Redis Vector Search Documentation</b></a>&nbsp;&nbsp;&nbsp;
</div>
Expand All @@ -16,15 +17,14 @@
# 🔎 Redis arXiv Search
*This repository is the official codebase for the arxiv paper search app hosted at: **https://docsearch.redisvl.com***


[Redis](https://redis.com) is a highly performant, production-ready vector database, which can be used for many types of applications. Here we showcase Redis vector search applied to a document retrieval use case. Read more about AI-powered search in [the technical blog post](https://datasciencedojo.com/blog/ai-powered-document-search/) published by our partners, *[Data Science Dojo](https://datasciencedojo.com)*.

### Dataset

The arXiv papers dataset was sourced from the the following [Kaggle link](https://www.kaggle.com/Cornell-University/arxiv). arXiv is commonly used for scientific research in a variety of fields. Exposing a semantic search layer enables natural human language to be used to discover relevant papers.


![Demo](data/assets/arXivSearch.png)

## Application

This app was built as a Single Page Application (SPA) with the following components:
Expand All @@ -39,9 +39,46 @@ This app was built as a Single Page Application (SPA) with the following compone
- **[React-Bootstrap](https://react-bootstrap.github.io/)** for some UI elements
- **[Huggingface](https://huggingface.co/sentence-transformers)**, **[OpenAI](https://platform.openai.com)**, and **[Cohere](https://cohere.com)** for vector embedding creation

Some inspiration was taken from this [Cookiecutter project](https://github.com/Buuntu/fastapi-react)
Some inspiration was taken from this [tiangolo/full-stack-fastapi-template](https://github.com/tiangolo/full-stack-fastapi-template)
and turned into a SPA application instead of a separate front-end server approach.

### General Project Structure

```
/backend
/arxivsearch
/api
/routes
papers.py # primary paper search logic lives here
/db
load.py # seeds Redis DB
redis_helpers.py # redis util
/schema
# pydantic models for serialization/validation from API
/tests
/utils
config.py
spa.py # logic for serving compiled react project
main.py # entrypoint
/frontend
/public
# index, manifest, logos, etc.
/src
/config
/styles
/views
# primary components live here
api.ts # logic for connecting with BE
App.tsx # project entry
Routes.tsk # route definitions
...
/data
# folder mounted as volume in Docker
# load script auto populates initial data from S3
```

### Embedding Providers
Embeddings represent the semantic properies of the raw text and enable vector similarity search. This applications supports `HuggingFace`, `OpenAI`, and `Cohere` embeddings out of the box.

Expand Down Expand Up @@ -99,22 +136,33 @@ $ docker compose -f docker-local-redis.yml up
## Customizing (optional)
- You can use the provided Jupyter Notebook in the [`data/`](data/README.md) directory to create paper embeddings and metadata. The output JSON files will end up stored in the `data/` directory and used when creating your own container.
- Use the `./build.sh` script to build your own docker image based on the application source code and dataset changes.
- If you want to use K8s instead of Docker Compose, we have some [resources to help you get started](k8s/README.md).
### Run local redis with Docker
```bash
docker run -d --name redis -p 6379:6379 -p 8001:8001 redis/redis-stack:latest
```
### FastApi with poetry
To run the backend locally
1. `cd backend`
2. `poetry install`
3. `poetry run start-app`
*poetry run start-app runs the initial db load script and launch the API*
### React Dev Environment
It's typically easier to build front end in an interactive environment, testing changes in realtime.

1. Deploy the app using steps above.
2. Install packages (you may need to use `npm` to install `yarn`)
2. Install packages
```bash
$ cd frontend/
$ yarn install --no-optional
$ npm install
````
4. Use `yarn` to serve the application from your machine
4. Use `npm` to serve the application from your machine
```bash
$ yarn start
$ npm run start
```
5. Navigate to `http://localhost:3000` in a browser.
Expand Down
8 changes: 8 additions & 0 deletions backend/.dockerignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
# Python
__pycache__
app.egg-info
*.pyc
.mypy_cache
.coverage
htmlcov
.venv
6 changes: 6 additions & 0 deletions backend/arxivsearch/api/main.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
from fastapi import APIRouter

from arxivsearch.api.routes import papers

api_router = APIRouter()
api_router.include_router(papers.router, prefix="/papers", tags=["papers"])
Loading

0 comments on commit 0fa0d9f

Please sign in to comment.