Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

community[minor]: FAISS Filter Function Enhancement with Advanced Query Operators #28207

Merged
merged 28 commits into from
Dec 11, 2024

Conversation

vincentzhang15
Copy link
Contributor

Description

We are submitting as a team of four for a project. Other team members are @RuofanChen03, @LikeWang10067, @TANYAL77.

This pull requests expands the filtering capabilities of the FAISS vectorstore by adding MongoDB-style query operators indicated as follows, while including comprehensive testing for the added functionality.

  • $eq (equals)
  • $neq (not equals)
  • $gt (greater than)
  • $lt (less than)
  • $gte (greater than or equal)
  • $lte (less than or equal)
  • $in (membership in list)
  • $nin (not in list)
  • $and (all conditions must match)
  • $or (any condition must match)
  • $not (negation of condition)

Issue

This closes #26379.

Sample Usage

import faiss
import asyncio
from langchain_community.vectorstores import FAISS
from langchain.schema import Document
from langchain_huggingface import HuggingFaceEmbeddings

embeddings = HuggingFaceEmbeddings(model_name="sentence-transformers/all-mpnet-base-v2")
documents = [
    Document(page_content="Process customer refund request", metadata={"schema_type": "financial", "handler_type": "refund",}),
    Document(page_content="Update customer shipping address", metadata={"schema_type": "customer", "handler_type": "update",}),
    Document(page_content="Process payment transaction", metadata={"schema_type": "financial", "handler_type": "payment",}),
    Document(page_content="Handle customer complaint", metadata={"schema_type": "customer","handler_type": "complaint",}),
    Document(page_content="Process invoice payment", metadata={"schema_type": "financial","handler_type": "payment",})
]

async def search(vectorstore, query, schema_type, handler_type, k=2):
    schema_filter = {"schema_type": {"$eq": schema_type}}
    handler_filter = {"handler_type": {"$eq": handler_type}}
    combined_filter = {
        "$and": [
            schema_filter,
            handler_filter,
        ]
    }
    base_retriever = vectorstore.as_retriever(
        search_kwargs={"k":k, "filter":combined_filter}
    )
    return await base_retriever.ainvoke(query)

async def main():
    vectorstore = FAISS.from_texts(
        texts=[doc.page_content for doc in documents],
        embedding=embeddings,
        metadatas=[doc.metadata for doc in documents]
    )
    
    def printt(title, documents):
        print(title)
        if not documents:
            print("\tNo documents found.")
            return
        for doc in documents:
            print(f"\t{doc.page_content}. {doc.metadata}")

    printt("Documents:", documents)
    printt('\nquery="process payment", schema_type="financial", handler_type="payment":', await search(vectorstore, query="process payment", schema_type="financial", handler_type="payment", k=2))
    printt('\nquery="customer update", schema_type="customer", handler_type="update":', await search(vectorstore, query="customer update", schema_type="customer", handler_type="update", k=2))
    printt('\nquery="refund process", schema_type="financial", handler_type="refund":', await search(vectorstore, query="refund process", schema_type="financial", handler_type="refund", k=2))
    printt('\nquery="refund process", schema_type="financial", handler_type="foobar":', await search(vectorstore, query="refund process", schema_type="financial", handler_type="foobar", k=2))
    print()

if __name__ == "__main__":asyncio.run(main())

Output

Documents:
	Process customer refund request. {'schema_type': 'financial', 'handler_type': 'refund'}
	Update customer shipping address. {'schema_type': 'customer', 'handler_type': 'update'}
	Process payment transaction. {'schema_type': 'financial', 'handler_type': 'payment'}
	Handle customer complaint. {'schema_type': 'customer', 'handler_type': 'complaint'}
	Process invoice payment. {'schema_type': 'financial', 'handler_type': 'payment'}

query="process payment", schema_type="financial", handler_type="payment":
	Process payment transaction. {'schema_type': 'financial', 'handler_type': 'payment'}
	Process invoice payment. {'schema_type': 'financial', 'handler_type': 'payment'}

query="customer update", schema_type="customer", handler_type="update":
	Update customer shipping address. {'schema_type': 'customer', 'handler_type': 'update'}

query="refund process", schema_type="financial", handler_type="refund":
	Process customer refund request. {'schema_type': 'financial', 'handler_type': 'refund'}

query="refund process", schema_type="financial", handler_type="foobar":
	No documents found.

@dosubot dosubot bot added the size:XL This PR changes 500-999 lines, ignoring generated files. label Nov 19, 2024
Copy link

vercel bot commented Nov 19, 2024

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name Status Preview Comments Updated (UTC)
langchain ✅ Ready (Inspect) Visit Preview 💬 Add feedback Nov 19, 2024 7:02am

@dosubot dosubot bot added community Related to langchain-community Ɑ: vector store Related to vector store module labels Nov 19, 2024
vincentzhang15 and others added 2 commits November 19, 2024 06:36
Co-Authored-By: RuofanChen03 <[email protected]>
Co-Authored-By: Like Wang <[email protected]>
Co-Authored-By: Shanni Li <[email protected]>
Co-Authored-By: RuofanChen03 <[email protected]>
Co-Authored-By: Like Wang <[email protected]>
Co-Authored-By: Shanni Li <[email protected]>
@eyurtsev eyurtsev changed the title Community: FAISS Filter Function Enhancement with Advanced Query Operators community[minor]: FAISS Filter Function Enhancement with Advanced Query Operators Dec 11, 2024
@dosubot dosubot bot added the lgtm PR looks good. Use to confirm that a PR is ready for merging. label Dec 11, 2024
@eyurtsev
Copy link
Collaborator

@vincentzhang15 , @RuofanChen03, @LikeWang10067, @TANYAL77 -- thank you! Very nicely done PR!

@eyurtsev eyurtsev merged commit df5008f into langchain-ai:master Dec 11, 2024
17 checks passed
ccurme added a commit that referenced this pull request Jan 2, 2025
…Demonstration (#28938)

## Description
This pull request updates the documentation for FAISS regarding filter
construction, following the changes made in commit `df5008f`.

## Issue
None. This is a follow-up PR for documentation of
[#28207](#28207)

## Dependencies:
None.

---------

Co-authored-by: Chester Curme <[email protected]>
pprados pushed a commit to pprados/langchain that referenced this pull request Jan 3, 2025
…Demonstration (langchain-ai#28938)

## Description
This pull request updates the documentation for FAISS regarding filter
construction, following the changes made in commit `df5008f`.

## Issue
None. This is a follow-up PR for documentation of
[langchain-ai#28207](langchain-ai#28207)

## Dependencies:
None.

---------

Co-authored-by: Chester Curme <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
community Related to langchain-community lgtm PR looks good. Use to confirm that a PR is ready for merging. size:XL This PR changes 500-999 lines, ignoring generated files. Ɑ: vector store Related to vector store module
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

FAISS filter is not working
5 participants