Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

main <- alpha #1

Merged
merged 31 commits into from
Jul 26, 2023
Merged

main <- alpha #1

merged 31 commits into from
Jul 26, 2023

Conversation

Anush008
Copy link
Member

@Anush008 Anush008 commented Jul 17, 2023

OpenSauced logo

🍕 RepoQuery 🍕

A REST service to answer user-queries about public GitHub repositories

🔎 The Project

RepoQuery is an early-beta project, that uses recursive OpenAI function calling paired with semantic search using All-MiniLM-L6-V2 to index and answer user queries about public GitHub repositories.

Related Tickets & Documents

open-sauced/ai#192
open-sauced/ai#226

📬 Service Endpoints

Run in Postman

1. /embed

To generate and store embeddings for a GitHub repository.

Parameters

The parameters are passed as a JSON object in the request body:

  • owner (string, required): The owner of the repository.
  • name (string, required): The name of the repository.
  • branch (string, required): The name of the branch.

Response

The request is processed by the server and responses are sent as Server-sent events(SSE). The event stream will contain the following events with optional data.

sse_events! {
EmbedEvent,
(FetchRepo, "FETCH_REPO"),
(EmbedRepo, "EMBED_REPO"),
(SaveEmbeddings, "SAVE_EMBEDDINGS"),
(Done, "DONE"),
}

Example

curl --location 'localhost:3000/embed' \
--header 'Content-Type: application/json' \
--data '{
    "owner": "open-sauced",
    "name": "ai",
    "branch": "beta"
}'

2. /query

To perform a query on the API with a specific question related to a repository.

Parameters

The parameters are passed as a JSON object in the request body:

  • query (string, required): The question or query you want to ask.
  • repository (object, required): Information about the repository for which you want to get the answer.
    • owner (string, required): The owner of the repository.
    • name (string, required): The name of the repository.
    • branch (string, required): The name of the branch.

Response

The request is processed by the server and responses are sent as Server-sent events(SSE). The event stream will contain the following events with optional data.

sse_events! {
QueryEvent,
(SearchCodebase, "SEARCH_CODEBASE"),
(SearchFile, "SEARCH_FILE"),
(SearchPath, "SEARCH_PATH"),
(GenerateResponse, "GENERATE_RESPONSE"),
(Done, "DONE"),
}

Example

curl --location 'localhost:3000/query' \
--header 'Content-Type: application/json' \
--data '{
    "query": "How is the PR description being generated using AI?",
    "repository": {
        "owner": "open-sauced",
        "name": "ai",
        "branch": "beta"
    }
}'

3. /collection

To check if a repository has been indexed.

Parameters

  • owner (string, required): The owner of the repository.
  • name (string, required): The name of the repository.
  • branch (string, required): The name of the branch.

Response

This endpoint returns an OK status code if the repository has been indexed by the service.

Example

curl --location 'localhost:3000/embed?owner=open-sauced&name=ai&branch=beta'

🧪 Running Locally

To run the project locally, there are a few prerequisites:

Once, the above requirements are satisfied, you can run the project like so:

Environment variables

The project requires the following environment variables to be set.

Database setup

Start Docker and run the following commands to spin-up a Docker container with a QdrantDB image.

docker pull qdrant/qdrant
docker run -p 6333:6333 -p 6334:6334 qdrant/qdrant

The database dashboard will be accessible at localhost:6333/dashboard, the project communicates with the DB on port 6334.

Running the project

Run the following command to install the dependencies and run the project on port 3000.

cargo run --release

This command will build and run the project with optimizations enabled(Highly recommended).

@Anush008 Anush008 changed the title Migrate from Anush008/Embedding-generation-proto main <- alpha Jul 17, 2023
Copy link
Member

@jpmcb jpmcb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is looking awesome!!

I'm unable to build due to a number of compiler errors (see nit errors left in this review)

Rust I'm building with locally:

❯ cargo --version
cargo 1.70.0 (ec8a8a0ca 2023-04-25)
❯ rustc --version
rustc 1.70.0 (90c541806 2023-05-31)

Looks like this is also a WIP 😅 so feel free to disregard my comments and ping me when this is ready to look at! Very cool!!

src/embeddings/onnx.rs Outdated Show resolved Hide resolved
src/github/mod.rs Show resolved Hide resolved
src/utils/conversation/mod.rs Outdated Show resolved Hide resolved
src/routes/mod.rs Outdated Show resolved Hide resolved
src/embeddings/onnx.rs Outdated Show resolved Hide resolved
@Anush008
Copy link
Member Author

Hey. I'll be updating the build steps and there's todo!(), that's breaking the build. I working on it right now. Will let you know when it's ready. Thanks for taking a look.

Copy link
Member

@jpmcb jpmcb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Quick note: we'll want to keep the bar for contributors high, so let's make sure we add a README.md and some development docs.

@Anush008
Copy link
Member Author

Quick note: we'll want to keep the bar for contributors high, so let's make sure we add a README.md and some development docs.

Right. Will do.

@Anush008 Anush008 marked this pull request as ready for review July 22, 2023 05:47
@Anush008
Copy link
Member Author

Hey @jpmcb. The project now has build steps and routes documented. Can you take a look? Once this first draft is merged, we can look into further improving the output by tweaking src/conversation/prompts.rs, moving to GPT-4, adding auth to the service using Supabase, improving the semantic search.

@Anush008 Anush008 requested a review from jpmcb July 22, 2023 06:03
@jpmcb
Copy link
Member

jpmcb commented Jul 24, 2023

Awesome, this is on my docket to review today: great work!!

Copy link
Member

@jpmcb jpmcb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not seeing an attribution or license on model/model_quantized.onnx in https://huggingface.co/rawsh/multi-qa-MiniLM-distill-onnx-L6-cos-v1

Is this ok to use in our project? Are there any constraints on us deploying it?


Overall, this looks really great. Well done. I'm planning to do another, more in depth pass this afternoon but wanted to drop this early feedback for you.

.github/workflows/rust.yml Outdated Show resolved Hide resolved
.github/workflows/rust.yml Outdated Show resolved Hide resolved
Cargo.toml Outdated Show resolved Hide resolved
Cargo.toml Outdated Show resolved Hide resolved
Cargo.toml Outdated Show resolved Hide resolved
README.md Outdated Show resolved Hide resolved
README.md Outdated Show resolved Hide resolved
src/github/mod.rs Show resolved Hide resolved
src/github/mod.rs Show resolved Hide resolved
README.md Show resolved Hide resolved
@Anush008
Copy link
Member Author

I'm not seeing an attribution or license on model/model_quantized.onnx in https://huggingface.co/rawsh/multi-qa-MiniLM-distill-onnx-L6-cos-v1

Is this ok to use in our project? Are there any constraints on us deploying it?

https://www.sbert.net/#citing-authors mentions

If you use one of the multilingual models, feel free to cite our publication Making Monolingual Sentence Embeddings Multilingual using Knowledge Distillation:

I've updated the README.md with the attributions. 02db5c6.

Signed-off-by: John McBride <[email protected]>
@jpmcb jpmcb mentioned this pull request Jul 25, 2023
19 tasks
Copy link
Member

@jpmcb jpmcb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall, looks good. Once existing comments are addressed, will merge 👍🏼

@Anush008 Anush008 merged commit f0b727e into main Jul 26, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants