-
Notifications
You must be signed in to change notification settings - Fork 124
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: Add Azure AI Search integration #1122
Conversation
.../azure_ai_search/src/haystack_integrations/document_stores/azure_ai_search/document_store.py
Outdated
Show resolved
Hide resolved
.../azure_ai_search/src/haystack_integrations/document_stores/azure_ai_search/document_store.py
Outdated
Show resolved
Hide resolved
...earch/src/haystack_integrations/components/azure_ai_search/retrievers/embedding_retriever.py
Outdated
Show resolved
Hide resolved
"The index '%s' does not exist. A new index will be created.", | ||
self._index_name, | ||
) | ||
self.create_index(self._index_name) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I noticed that the create_index
function signature allows kwargs to be passed in. Should they be passed in here? I was thinking about the case where the user wants to enable semantic re-ranking/search in their index, which can be toggled in the SearchIndex()
call below.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Spoke with my team, and we want to prioritize allowing users to toggle on semantic re-ranking. This is a setting configured at index creation time, so adding a toggle for this setting in the AzureAISearchDocumentStore
seems like the way forward. At retrieval time, we can allow the user to toggle semantic ranking, which will only work if their index was initialized with it. It is also worth noting that on the free tier of the semantic ranker, the user has a quota of 1,000 semantic requests per month, and then they can upgrade beyond that.
Thoughts on implementing support for this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a great suggestion. I believe we can add support for semantic re-ranking during index creation. We usually allow passing kwargs
specific to the database, but I'll need to check the documentation to understand how this will affect index creation and the remaining methods. Regardless, I'll add support for semantic re-ranking after referring to API.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Decided to introduce this change as a separate PR to avoid extending the current PR further.
.../azure_ai_search/src/haystack_integrations/document_stores/azure_ai_search/document_store.py
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left some rough first pass feedback, will revisit
...earch/src/haystack_integrations/components/retrievers/azure_ai_search/embedding_retriever.py
Show resolved
Hide resolved
...rations/azure_ai_search/src/haystack_integrations/document_stores/azure_ai_search/filters.py
Show resolved
Hide resolved
.../azure_ai_search/src/haystack_integrations/document_stores/azure_ai_search/document_store.py
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I felt free to take a look and noticed some possible improvements.
...earch/src/haystack_integrations/components/retrievers/azure_ai_search/embedding_retriever.py
Outdated
Show resolved
Hide resolved
.../azure_ai_search/src/haystack_integrations/document_stores/azure_ai_search/document_store.py
Outdated
Show resolved
Hide resolved
.../azure_ai_search/src/haystack_integrations/document_stores/azure_ai_search/document_store.py
Outdated
Show resolved
Hide resolved
.../azure_ai_search/src/haystack_integrations/document_stores/azure_ai_search/document_store.py
Outdated
Show resolved
Hide resolved
.../azure_ai_search/src/haystack_integrations/document_stores/azure_ai_search/document_store.py
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Core Azure AI Search functionality looks good, and I have tested it locally after pulling the code. Semantic reranking, hybrid retrieval, and BM25 retrieval will be added in future PRs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cursor review found two more issues to address.
Related Issues
Proposed Changes:
This PR includes the following features:
any
andall
. The current filter implementation includes metadata filtering using operators in Haystack filters and content filtering usingsearch
function in Azure. This is how its done in other integrations. More filters can be added later.How did you test it?
haystack.testing.document_store
for testing filters and CRUD operationsNotes for the reviewer
Config files need to be fixed. Also the
ReadMe.md
and log files will be added.Checklist
fix:
,feat:
,build:
,chore:
,ci:
,docs:
,style:
,refactor:
,perf:
,test:
.