Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Semantic search vector.dev codebase and code repository #18871

Open
jonathanpv opened this issue Oct 17, 2023 · 1 comment
Open

Semantic search vector.dev codebase and code repository #18871

jonathanpv opened this issue Oct 17, 2023 · 1 comment
Labels
type: feature A value-adding code addition that introduce new functionality. website: search Anything related to the website's Algolia search indexing/config

Comments

@jonathanpv
Copy link
Contributor

jonathanpv commented Oct 17, 2023

A note for the community

  • Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
  • If you are interested in working on this issue or have submitted a pull request, please leave a comment

Use Cases

as a user I can ramp up and learn vector quickly

as a user I can search with natural language through vector.dev documentation and tutorials

Attempted Solutions

No response

Proposal

Proposal:

I propose periodic indexing of the entire content vector.dev has from tutorials, code, etc

This can be done with embedding models like text-ada-002, but in order to have a cheap service and not depend on expensive search like algolia, elasticsearch, etc. We can leverage open source tool LanceDB (its in rust).

LanceDB is a disk implementation of ANN (approximate nearest neighbor) which is the search algorithm for anything requiring vectors/matrices. E.g the language that machine learning speaks.

I propose a lightweight playground UI, user's provide their openAI token (we don't store this), we provide the lancefile, may cost $30-$50 to build and if we want to host it in an S3 bucket that will just be another monthly cost requirement.

Result:

User can search the docs through a playground ui, maybe even embed it in VRL playground?

If it's the case Lance doesn't work fully in wasm, then a CLI tool to search the docs of vector will definitely be possible.

This can live in an entirely separate project. Perhaps vectordotdev/vector-docs-search

Flow:

Web version (may be limited since lance is rust->wasm):

[client goes to vector-docs-search site]
[vector-docs-search-site provides lancefile through S3 bucket]
[client writes a prompt/search query]
[prompt/search query goes through client's provided openAI key]
[lance.js searches through lancefile (500ms) <- high latency is due to optimizations needing to be done in particular with S3 loading]
[client gets a list of most relevant documents]
[client can then chain those to openAI and get a natural language response back]

Desktop CLI

The same as the above flow, but instead of hosting the lancefile in S3 we can directly have the user download it, and it will simply be a single file python script to search.

References

https://www.youtube.com/watch?v=cfXpBA-7qIo&t=584s

Version

No response

@jonathanpv jonathanpv added the type: feature A value-adding code addition that introduce new functionality. label Oct 17, 2023
@jonathanpv
Copy link
Contributor Author

jonathanpv commented Oct 17, 2023

I'm personally working with Lance to build my own product so if I get through that and it becomes an easy step to replicate for vector.dev I can contribute that myself. However if someone else knows about this space I would love to collaborate.

@neuronull neuronull added the website: search Anything related to the website's Algolia search indexing/config label Oct 18, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type: feature A value-adding code addition that introduce new functionality. website: search Anything related to the website's Algolia search indexing/config
Projects
None yet
Development

No branches or pull requests

2 participants