Add Document To Vector Store #838

khoangothe · 2024-09-12T21:01:28Z

Documents, crawled urls, and website will be chunked and loaded to the inputted vector store if vector_store is not None. Although adding the data in Document would be more efficient, but I think this solution makes the code decoupled and easier to maintain.

By default this changes won't add any features, but new applications can be implemented based on the vectorstore (like chatting with the sources)

khoangothe · 2024-09-12T21:05:58Z

Would be cool to have this idea reviewed! I'll add a script test if I am allowed to proceed

assafelovic · 2024-09-13T06:43:11Z

Hey @khoangothe this is a great direction! Can you share an example of how it can be used?

khoangothe · 2024-09-13T10:45:32Z

Hi @assafelovic, Thanks for the review! I just added a commit to document how it should be used. Here's the code that I used to test locally. Basically I stored the info in vector store whenever a vector store is defined and report_source is not langchain_vectorstore (in this case the vector_store will be used for knowledge instead of storing new things)

Will add a test script soon.

import asyncio

from gpt_researcher import GPTResearcher

from langchain_community.vectorstores import InMemoryVectorStore
from langchain_openai import OpenAIEmbeddings

from dotenv import load_dotenv

load_dotenv()

async def main():
    vector_store = InMemoryVectorStore(embedding=OpenAIEmbeddings())

    query = "Which one is the best LLM"

    # Create an instance of GPTResearcher
    researcher = GPTResearcher(
        query=query,
        report_type="research_report",
        report_source="web",
        vector_store=vector_store, 
    )

    # Conduct research and write the report
    await researcher.conduct_research()

    # Check if the vector_store contains information from the sources
    related_contexts = await vector_store.asimilarity_search("GPT-4", k=5)
    print(related_contexts)
    print(len(related_contexts))


asyncio.run(main())

assafelovic · 2024-09-14T09:28:29Z

Thanks @khoangothe excuse me if I might be missing something but how is this different than this? https://docs.gptr.dev/docs/gpt-researcher/context/vector-stores

khoangothe · 2024-09-14T10:32:51Z

@assafelovic Sorry if my examples were not clear enough. The one you link allows you to talk to your vector store, so your vector store needs to already have information for gpt-researcher to do research on (when report_source is set to langchain_vectorstore). My changes will allow GPT-Researcher to add new reports to the vector store, so the scraped website + documents will be added to your vector store, so later you can reuse your vector store for other purpose, like RAG.

In the example, starting with an empty InMemoryVectorStore right after await researcher.conduct_research(), the vector store will have everything stored. and related_context now contains information that was scraped that is similar to the query

hslee16 · 2024-09-14T18:32:23Z

Looks good 👍🏼

ElishaKay · 2024-09-22T05:32:13Z

@assafelovic

This path of persistent & re-usable vector storage that can be leveraged across reports & follow-up questions is very interesting for me.

I've merged this branch into #819 & am planning on testing extensively with PGVector Storage.

khoangothe · 2024-10-05T20:26:37Z

@ElishaKay @assafelovic Hi guys, I was able to resolve the merge conflict and provided tests cases for the scenarios that I implemented. For each type of knowledge source (urls, hybrid, local, webs, langchain documents) data will be ingested in the vector_store that user provided, usage is written in the tests (I added a pdf to test the local and hybrid functionalities). I also raised 1 issue in Discord so hopefully you guys can check it out.
It would be cool to have this pr reviewed, tested and hopefully merged! I want to also implement chatting with Data Source but it currently depends on this PR. Thanks guys for your help!

assafelovic

This is awesome @khoangothe kudos for the hard work and implementation. Looking forward to the next PRs that can empower this

danieldekay · 2024-10-30T15:09:40Z

Love the idea, as you can build out a knowledge base through various queries this way. It adds a bit more human-in-the-loop for complex topics.

One use case could be if you already have a corpus of literature, but want to add more recent content via GPT-R's searches.

khoangothe added 2 commits September 13, 2024 03:52

add document to vectorstore

3178c18

clean up

bd8f35a

khoangothe mentioned this pull request Sep 12, 2024

[Experimental] Chat with history #832

Closed

add documentation for vector store

eac322d

Merge remote-tracking branch 'origin' into features/add-to-vector-store

555574e

ElishaKay mentioned this pull request Sep 22, 2024

AI Dev Team #819

Open

merge

7abebdd

khoangothe force-pushed the features/add-to-vector-store branch from 7d7348e to 7abebdd Compare October 5, 2024 11:01

khoangothe added 2 commits October 5, 2024 18:04

fix merge conflict

459287e

test vector_store scenarios

a85d929

assafelovic approved these changes Oct 6, 2024

View reviewed changes

assafelovic merged commit 9ed35db into assafelovic:master Oct 6, 2024

ElishaKay mentioned this pull request Oct 30, 2024

Azure Embedding Quota Limit #936

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Document To Vector Store #838

Add Document To Vector Store #838

khoangothe commented Sep 12, 2024

khoangothe commented Sep 12, 2024

assafelovic commented Sep 13, 2024

khoangothe commented Sep 13, 2024 •

edited

Loading

assafelovic commented Sep 14, 2024

khoangothe commented Sep 14, 2024 •

edited

Loading

hslee16 commented Sep 14, 2024

ElishaKay commented Sep 22, 2024 •

edited

Loading

khoangothe commented Oct 5, 2024 •

edited

Loading

assafelovic left a comment

danieldekay commented Oct 30, 2024

Add Document To Vector Store #838

Add Document To Vector Store #838

Conversation

khoangothe commented Sep 12, 2024

khoangothe commented Sep 12, 2024

assafelovic commented Sep 13, 2024

khoangothe commented Sep 13, 2024 • edited Loading

assafelovic commented Sep 14, 2024

khoangothe commented Sep 14, 2024 • edited Loading

hslee16 commented Sep 14, 2024

ElishaKay commented Sep 22, 2024 • edited Loading

khoangothe commented Oct 5, 2024 • edited Loading

assafelovic left a comment

Choose a reason for hiding this comment

danieldekay commented Oct 30, 2024

khoangothe commented Sep 13, 2024 •

edited

Loading

khoangothe commented Sep 14, 2024 •

edited

Loading

ElishaKay commented Sep 22, 2024 •

edited

Loading

khoangothe commented Oct 5, 2024 •

edited

Loading