Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

transliterate (and query) non-latin text with the elasticsearch ICU plugin #508

Open
harrisonpim opened this issue Jun 24, 2022 · 0 comments
Labels
search relevance Tuning and improving ranking and relevance

Comments

@harrisonpim
Copy link
Contributor

harrisonpim commented Jun 24, 2022

Until now, we've been able to search for analysed forms of non-latin text (eg stemming in arabic), but haven't been able to search for transliterations of that text, unless provided by a human cataloguer. The ICU transform token filter looks like it could make that possible.

For example, from this post

curl -XGET 'localhost:9200/icu_transform_analyzer/_analyze?analyzer=latin&pretty' -d '{
  "text": "キャンパス" 
}'
Returns kyanpasu


curl -XGET 'localhost:9200/icu_transform_analyzer/_analyze?analyzer=latin&pretty' -d '{
  "text": "Αλφαβητικός Κατάλογος" 
}'
Returns Alphabetikos Katalogo


curl -XGET 'localhost:9200/icu_transform_analyzer/_analyze?analyzer=latin&pretty' -d '{
  "text": "биологическом" 
}'
Returns biologiceskom

See also

@harrisonpim harrisonpim changed the title transliterate non-latin text with the elasticsearch ICU plugin transliterate (and query) non-latin text with the elasticsearch ICU plugin Jun 24, 2022
@harrisonpim harrisonpim added the search relevance Tuning and improving ranking and relevance label Jun 28, 2022
@kenoir kenoir removed the status in Digital platform Nov 5, 2024
@kenoir kenoir removed this from Digital platform Nov 5, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
search relevance Tuning and improving ranking and relevance
Projects
None yet
Development

No branches or pull requests

1 participant