Web research retriever (#8102) · langchain-ai/langchain@907d6c1

Commit

Web research retriever (#8102)

Given a user question, this will -
* Use LLM to generate a set of queries.
* Query for each.
* The URLs from search results are stored in self.urls.
* A check is performed for any new URLs that haven't been processed yet
(not in self.url_database).
* Only these new URLs are loaded, transformed, and added to the
vectorstore.
* The vectorstore is queried for relevant documents based on the
questions generated by the LLM.
* Only unique documents are returned as the final result.

This code will avoid reprocessing of URLs across multiple runs of
similar queries, which should improve the performance of the retriever.
It also keeps track of all URLs that have been processed, which could be
useful for debugging or understanding the retriever's behavior.

---------

Co-authored-by: Harrison Chase <[email protected]>

Loading branch information

2 people authored and hinthornw committed Jul 27, 2023

1 parent 01d7676 commit 907d6c1

0 comments on commit `907d6c1`

Please sign in to comment.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Commit

There are no files selected for viewing

0 comments on commit `907d6c1`

Commit

There are no files selected for viewing

0 comments on commit 907d6c1

0 comments on commit `907d6c1`