Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Add Azure AI Search integration #1122

Merged
merged 27 commits into from
Nov 7, 2024
Merged

Conversation

Amnah199
Copy link
Contributor

@Amnah199 Amnah199 commented Oct 2, 2024

Related Issues

Proposed Changes:

This PR includes the following features:

  • Implement a document store for Azure AI Search..
  • Implement an embedding retriever
  • Implement filters. Note that Azure Search supports additional functionalities such as any and all. The current filter implementation includes metadata filtering using operators in Haystack filters and content filtering using search function in Azure. This is how its done in other integrations. More filters can be added later.

How did you test it?

  • Added unit tests for components
  • Used haystack.testing.document_store for testing filters and CRUD operations
  • Manual testing by running examples

Notes for the reviewer

Config files need to be fixed. Also the ReadMe.md and log files will be added.

Checklist

@github-actions github-actions bot added the type:documentation Improvements or additions to documentation label Oct 2, 2024
"The index '%s' does not exist. A new index will be created.",
self._index_name,
)
self.create_index(self._index_name)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I noticed that the create_index function signature allows kwargs to be passed in. Should they be passed in here? I was thinking about the case where the user wants to enable semantic re-ranking/search in their index, which can be toggled in the SearchIndex() call below.

Copy link
Contributor

@ttmenezes ttmenezes Oct 17, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Spoke with my team, and we want to prioritize allowing users to toggle on semantic re-ranking. This is a setting configured at index creation time, so adding a toggle for this setting in the AzureAISearchDocumentStore seems like the way forward. At retrieval time, we can allow the user to toggle semantic ranking, which will only work if their index was initialized with it. It is also worth noting that on the free tier of the semantic ranker, the user has a quota of 1,000 semantic requests per month, and then they can upgrade beyond that.

Thoughts on implementing support for this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a great suggestion. I believe we can add support for semantic re-ranking during index creation. We usually allow passing kwargs specific to the database, but I'll need to check the documentation to understand how this will affect index creation and the remaining methods. Regardless, I'll add support for semantic re-ranking after referring to API.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Decided to introduce this change as a separate PR to avoid extending the current PR further.

@Amnah199 Amnah199 marked this pull request as ready for review October 23, 2024 09:51
@Amnah199 Amnah199 requested a review from a team as a code owner October 23, 2024 09:51
@Amnah199 Amnah199 requested review from silvanocerza and vblagoje and removed request for a team and silvanocerza October 23, 2024 09:51
Copy link
Member

@vblagoje vblagoje left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left some rough first pass feedback, will revisit

Copy link
Member

@anakin87 anakin87 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I felt free to take a look and noticed some possible improvements.

Copy link
Contributor

@ttmenezes ttmenezes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Core Azure AI Search functionality looks good, and I have tested it locally after pulling the code. Semantic reranking, hybrid retrieval, and BM25 retrieval will be added in future PRs.

Copy link
Member

@vblagoje vblagoje left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor review found two more issues to address.

@vblagoje vblagoje self-requested a review November 7, 2024 09:00
@Amnah199 Amnah199 merged commit 06d77cc into main Nov 7, 2024
10 of 11 checks passed
@Amnah199 Amnah199 deleted the feat-azure-ai-search-integration branch November 7, 2024 10:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
topic:CI type:documentation Improvements or additions to documentation
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Integration for Azure AI Search
4 participants