-
Notifications
You must be signed in to change notification settings - Fork 32
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Multilingual search #651
Multilingual search #651
Conversation
Affected libs:
|
if set to current, the search is requested with the current language of the application (UI language)
if current is set and UI lang is 'eng', then the requested fields are .langeng ones
5c17735
to
6f73c9d
Compare
Hi, here's my 2 cents I'm wondering if it would be useful to implement kind of a fallback mechanism to I can imagine the usecase where a user with e.g. an english configured browser lands on the catalogue and try to search something. If the catalogue contains very little english records, he will find almost nothing even if the words used for the search are technical (often the same in all languages) or proper nouns. Can't we imagine something like {
"query_string": {
"query": "carte établie",
"default_operator": "AND",
"fields": [
"resourceTitleObject.langeng^6",
"tag.langeng^5",
"resourceAbstractObject.langeng^4",
"lineageObject.langeng^3",
"any.langeng^2",
"any.*",
"uuid"
]
}
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with @benoitregamey here, also does the request fail if for instance the field .langesp
is used, but there's absolutely no spanish translations in the catalog?
Yes, we actually discussed that internally, we should have a fallback on I'll adapt the query |
dcd2da7
to
474116c
Compare
I've adapted the query if [
'resourceTitleObject.langeng^5',
'tag.langeng^4',
'resourceAbstractObject.langeng^3',
'lineageObject.langeng^2',
'any.langeng',
'uuid',
'resourceTitleObject.*',
'tag.*',
'resourceAbstractObject.*',
'lineageObject.*',
'any.*',
] @jahow suggested to fallback all fields, I am wondering if it's usefull though as every field falls into |
if the lang is 'current', then multilingual fields priority are increased by 10, with a fallback on the * wildcard
Ok, I've made another adaption where the priority are kept in the fallback as well. [
'resourceTitleObject.langeng^15',
'resourceTitleObject.*^5',
'tag.langeng^14',
'tag.*^4',
'resourceAbstractObject.langeng^13',
'resourceAbstractObject.*^3',
'lineageObject.langeng^12',
'lineageObject.*^2',
'any.langeng^11',
'any.*',
'uuid',
] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! I've tested it with a multilingual config and it seems to be producing good results. I made a couple of comments about readability, feel free to merge when you think this is ready!
? this.isCurrentSearchLang() | ||
? `lang${this.lang3}` | ||
: `lang${this.metadataLang}` | ||
: `*` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nested ternary operators are hard to read IMO
|
||
describe('ElasticsearchService', () => { | ||
let service: ElasticsearchService | ||
let searchFilters | ||
|
||
beforeEach(() => { | ||
service = new ElasticsearchService('fre') | ||
service = new ElasticsearchService(langServiceMock, 'fre') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using a testBed here would make the test more readable I think, as it would be clear what are the injected dependencies (currently the test manipulates the service properties directly inc. private ones)
Introduces a new option for the
metadata_language
configuration property:current
.This is mostly useful for multilingual catalogs.
If
metadata_language == current
, then the Elasticsearch search is queried in the current language of the application (the UI language).The other rules remains
metadata_language
isundefined
: we use the.*
wildcardmetadata_language
is set to one language: we use the that language for the search eg..langeng
Here what the search looks like with the UI in english and the property set to
current