No results on searches for languages with codes including a dash #21

decodekult · 2023-06-16T08:13:16Z

The mechanism used to provide proper results in searches done in non-default languages includes comparing against a document field post-lang, which stores a comma-separated list of languages where each post should appear into.

For posts (translations) in a given secondary language, it stores that translation language code.
For posts in the primary language, it also includes language codes for all languages where the relevant post type is set to be displayed as translated but a translation to that secondary language does not exist. See fix: post types set to display as translated #14

Consider languages like zh-hans or pt-pt. Posts (translations) in those languages are failing to be returned in searches fired in their right language.

Regression of #13

The text was updated successfully, but these errors were encountered:

decodekult · 2023-06-16T08:19:32Z

The analyzer for this post-lang field was set to be a custom one, with a default tokenized and no filters - in theory, this should remove filters that were splitting sich languages like zh-hans into zh and hans tokens, but that change was not enough. See 5156190

Turning this field into a keyword will not work either, since it can contain multiple, comma-separated language codes, and we need to tokenized each of them.

The most reliable solution consists in using a char_group tokenizer exploding values on commas. See https://www.elastic.co/guide/en/elasticsearch/reference/current/analysis-chargroup-tokenizer.html

decodekult · 2023-06-16T09:57:45Z

Apparently, I managed to use an Elasticsearch feature introduced in a newer version than the minimum supported on Elasticpress: #20 (comment)

Reopening and adjusting the tokenizer based on the Elasticsearch version, so we can use the faster, more performant tokenizer if it is available.

decodekult mentioned this issue Jun 16, 2023

Fix post filter for lang codes with a dash (zh-hans) #20

Closed

decodekult linked a pull request Jun 16, 2023 that will close this issue

fix: sync languages including a dash #22

Merged

decodekult self-assigned this Jun 16, 2023

decodekult closed this as completed in #22 Jun 16, 2023

decodekult reopened this Jun 16, 2023

decodekult linked a pull request Jun 22, 2023 that will close this issue

fix: use pattern analizer for lang fields #25

Merged

decodekult closed this as completed in #25 Jun 22, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

No results on searches for languages with codes including a dash #21

No results on searches for languages with codes including a dash #21

decodekult commented Jun 16, 2023 •

edited

Loading

decodekult commented Jun 16, 2023

decodekult commented Jun 16, 2023 •

edited

Loading

No results on searches for languages with codes including a dash #21

No results on searches for languages with codes including a dash #21

Comments

decodekult commented Jun 16, 2023 • edited Loading

decodekult commented Jun 16, 2023

decodekult commented Jun 16, 2023 • edited Loading

decodekult commented Jun 16, 2023 •

edited

Loading

decodekult commented Jun 16, 2023 •

edited

Loading