-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: Add DocumentLanguageClassifier 2.0 #6037
Conversation
Pull Request Test Coverage Report for Build 6508357776
💛 - Coveralls |
In the end to end test I cam across a problem with the connection named "text/plain" and opened an issue in canals: deepset-ai/canals#130 |
haystack/preview/components/preprocessors/document_language_classifier.py
Outdated
Show resolved
Hide resolved
haystack/preview/components/preprocessors/document_language_classifier.py
Outdated
Show resolved
Hide resolved
```python | ||
document_store = MemoryDocumentStore() | ||
p = Pipeline() | ||
p.add_component(instance=TextFileToDocument(), name="text_file_converter") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not strictly related to this PR, but why inverting the order of the original signature:
def add_component(self, name: str, instance: Component) -> None:
if passing the instance first is more intuitive we're on time to change it!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's more intuitive, yes. I created an issue here: https://github.com/deepset-ai/canals/issues/137
Related Issues
LanguageClassifier
#5677 (1st part is feat: Add TextLanguageClassifier 2.0 #6026)LanguageClassifier
#5677Proposed Changes:
How did you test it?
new unit tests. e2e test still pending.
Notes for the reviewer
Storing the detected language in the Document's metadata could still be done. Not convinced that we need it at this point. Happy to discuss.
Regarding the e2e test, I am working on getting the other PR #5976 merged first. Then I'll add more assertions.
The component is in the preprocessing module right now but could also fit in the routers. I'd keep it in preprocessors because it not only routes but also detects the language.
Checklist
fix:
,feat:
,build:
,chore:
,ci:
,docs:
,style:
,refactor:
,perf:
,test:
.