Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enhancement Request: Upgrade Filter Functionality in VectorStore Wrapping Elasticsearch #2699

Closed
shotaro427 opened this issue Sep 22, 2023 · 4 comments
Labels
auto:enhancement A large net-new component, integration, or chain. Use sparingly. The largest features

Comments

@shotaro427
Copy link

Issue Description:

I would like to propose an upgrade to the filter functionality in the VectorStore that wraps Elasticsearch. Specifically, I'd like to make it as flexible as the Python package langchain in terms of filtering capabilities.

Current State:

Currently, the filtering is limited to using only the "term" operator. This is handled in the function buildMetadataTerms located in src/vectorstores/elasticsearch.ts.

Requested Feature:

I'd like to request an enhancement that would allow for more flexible filtering options, similar to what is available in the langchain Python package.

Additional Context:

Is there a specific reason that we are limited to using only the "term" operator? Any context or historical reasons for this limitation would be appreciated as it could inform the development of this feature.

Next Steps:

If specific implementation strategies are required, please feel free to propose them in the discussion.

@dosubot dosubot bot added the auto:enhancement A large net-new component, integration, or chain. Use sparingly. The largest features label Sep 22, 2023
@dosubot
Copy link

dosubot bot commented Sep 22, 2023

🤖

Thank you for your enhancement request. I agree that having more flexible filtering options in the VectorStore that wraps Elasticsearch would be beneficial.

The current limitation to the "term" operator is due to the way the buildMetadataTerms method is implemented in the ElasticVectorSearch class. It's hard-coded to create filter objects with the "term" operator. Here's the current implementation:

private buildMetadataTerms(
    filter?: ElasticFilter
  ): { term: Record<string, unknown> }[] {
    if (filter == null) return [];
    const result = [];
    for (const [key, value] of Object.entries(filter)) {
      result.push({ term: { [`metadata.${key}`]: value } });
    }
    return result;
  }

To make the filter functionality more flexible, we could modify this method to accept a more complex filter object that specifies the operator to use for each filter condition. For example, the filter object could be an array of objects, where each object has a "field", "operator", and "value" property. The "field" property would specify the field to filter on, the "operator" property would specify the operator to use (e.g., "term", "range", etc.), and the "value" property would specify the value to filter on.

Here's an example of how we could modify the buildMetadataTerms method to implement this:

private buildMetadataTerms(
    filter?: { field: string, operator: string, value: unknown }[]
  ): object[] {
    if (filter == null) return [];
    const result = [];
    for (const condition of filter) {
      result.push({ [condition.operator]: { [`metadata.${condition.field}`]: condition.value } });
    }
    return result;
  }

With this modification, we could use any operator supported by Elasticsearch in our filters, not just the "term" operator.

As for the different types of operators supported by Elasticsearch, they include but are not limited to: "term", "terms", "range", "exists", "missing", "prefix", "wildcard", "regexp", "fuzzy", "type", "ids", "geo_distance", "geo_bounding_box", "geo_polygon", "geo_distance_range", "geo_shape", and "script". You can find more information about these operators in the Elasticsearch documentation.

I hope this answers your question. If you have any further questions or need clarification, please let me know.

Sources


This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.

Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.

If you want to continue the conversation, start your reply with @dosu-bot.

@shotaro427
Copy link
Author

When will this feature be released?
If you are looking for my own commitments, can you give me the contributor rules?

@dqbd
Copy link
Collaborator

dqbd commented Sep 22, 2023

Hey @shotaro427!

Contributions are more than welcome here! Feel free to check out the contributing guidelines here: https://github.com/langchain-ai/langchainjs/blob/main/CONTRIBUTING.md

Copy link

dosubot bot commented Dec 22, 2023

Hi, @shotaro427,

I'm helping the langchainjs team manage their backlog and am marking this issue as stale. From what I understand, you requested an upgrade to the filter functionality in the VectorStore wrapping Elasticsearch to match the flexibility of the langchain Python package. There was a discussion about the current limitation to the "term" operator and a suggestion to modify the buildMetadataTerms method to accept a more complex filter object. Additionally, you inquired about the release timeline and contributor rules, to which dqbd responded with a link to the contributing guidelines.

Could you please confirm if this issue is still relevant to the latest version of the langchainjs repository? If it is, please let the langchainjs team know by commenting on the issue. Otherwise, feel free to close the issue yourself, or it will be automatically closed in 7 days. Thank you!

@dosubot dosubot bot added the stale Issue has not had recent activity or appears to be solved. Stale issues will be automatically closed label Dec 22, 2023
@dosubot dosubot bot closed this as not planned Won't fix, can't repro, duplicate, stale Dec 30, 2023
@dosubot dosubot bot removed the stale Issue has not had recent activity or appears to be solved. Stale issues will be automatically closed label Dec 30, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
auto:enhancement A large net-new component, integration, or chain. Use sparingly. The largest features
Projects
None yet
Development

No branches or pull requests

2 participants