Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Transliteration doesn't respect site default language #2637

Closed
olafgrabienski opened this issue Apr 12, 2017 · 43 comments · Fixed by backdrop/backdrop#3176
Closed

Transliteration doesn't respect site default language #2637

olafgrabienski opened this issue Apr 12, 2017 · 43 comments · Fixed by backdrop/backdrop#3176

Comments

@olafgrabienski
Copy link

olafgrabienski commented Apr 12, 2017

In German, certain characters are supposed to be transliterated language-specific. Example: The letter ä should be transliterated to a for English but to ae for German. Backdrop doesn't deliver the language-specific transliterations unless you add Multilingual support to a content type and select the relevant language on the content items.

Steps to reproduce:

  • Enable Language and Locale modules.
  • Don't enable Content Translation module.
  • Add Language: German
  • Make German default.
  • Disable the Detection Method "URL" to make sure that the default language is used in any case.
  • Enable "Transliterate prior to creating URL aliases" (admin/config/urls/path/patterns/settings)
  • Enable "Reduce URL alias characters to letters and numbers".
  • Don't enable Multilingual Support for Posts.
  • Add a Post with Title: Ärger im Büro

Resulting URL alias: posts/arger-im-buro

Expected alias: posts/aerger-im-buero


Current PR by @indigoxela: backdrop/backdrop#3176 (transliterate neutral paths in current lang, like D7)


Outdated PRs:

PR by @olafgrabienski: backdrop/backdrop#1849 (outdated)
PR by @quicksketch: backdrop/backdrop#1850 (outdated, conflicts)
PR by @indigoxela: backdrop/backdrop#3175 (set node langcode, closed in favor of different approach)


Original issue description:

On a Backdrop 1.6.3 site which uses German as default language, transliteration for URL aliases and files are enabled and work generally, but they do not respect language-specific variations by default.

Certain characters are supposed to be transliterated language-specific. Example:

  • ä => a (English)
  • ä => ae (German)

At the moment, Backdrop doesn't deliver language-specific transliterations of URL aliases, unless you add Multilingual support to a content type and select the relevant language on the content items.

(Note: For localized sites which use only one language there's usually no need to add multilingual support to content types, and it's not an acceptable workaround to do so.)

At the top of core/includes/transliteration.inc I see the following comment which makes me think that Backdrop should use the site's default language for language-specific transliterations:

 * @param $source_langcode
 *   Optional ISO 639 language code that denotes the language of the input and
 *   is used to apply language-specific variations. If the source language is
 *   not known at the time of transliteration, it is recommended to set this
 *   argument to the site default language to produce consistent results.
 *   Otherwise the current display language will be used.

Unfortunately I don't have the skills to check $source_langcode in the code itself. Could someone have a look at it and give feedback how it's supposed to work and/or why it doesn't?

@olafgrabienski
Copy link
Author

Related issue, regarding taxonomy terms: #1995

@olafgrabienski
Copy link
Author

Reminder: Have a look at English as well. There's no en.php in core/includes/transliteration.

@serundeputy
Copy link
Member

serundeputy commented Apr 26, 2017

@olafgrabienski I've assigned this to you feel free to un-assign if you don't have the time or whatever.


NOTES:

@olafgrabienski and I did some prelim research on this; here is the rough draft code we have so far:

file core/modules/path/path.inc in function path_clean_string

  $site_langcode = config_get('system.core', 'language_default');
  if (!empty($site_langcode)) {
    $langcode = $site_langcode;
  }
  elseif (!empty($options['language']->langcode)) {
    $langcode = $options['language']->langcode;
  }
  elseif (!empty($options['langcode'])) {
    $langcode = $options['langcode'];
  }

This works in the case described in the OP, but we need to assure it works in multi-language scenarios as well. Like if the site has English, French and German and certain nodes should work in certain languages etc.

@olafgrabienski
Copy link
Author

@serundeputy I made a PR with your suggestion, putting the code just below $langcode = LANGUAGE_NONE; (line 161; not sure if it's exactly the correct place).

After configuring Localization for the sandbox site, the change fixes the issue with German transliteration if German is set as site default language. That's great! But as assumed, transliteration doesn't work anymore for content of other languages, when the respective language is choosen on the single node (for content types with enabled "Multilingual support"). So, for the time being the situation is contrary to the original issue description:

From the issue description:

Backdrop doesn't deliver language-specific transliterations of URL aliases, unless you add Multilingual support to a content type and select the relevant language on the content items.

Current situation:
Backdrop doesn't deliver language-specific transliterations of URL aliases, when you add Multilingual support to a content type and select the relevant language on the content items. (It does however deliver the expected result for the site's default language which wasn't the case before.)

@olafgrabienski
Copy link
Author

olafgrabienski commented Apr 28, 2017

Writing up the steps to configure the sandbox and to test the PR, just in case we have to start over:

Configure Localization

admin/modules

  • Enable Language and Locale

admin/config/regional/language

  • Add Language: German
  • Add Language: Spanish
  • Make German default

admin/config/regional/language/detection

  • Disable the Detection Method "URL" to make sure that the default site language is used in any case.

admin/config/urls/path/patterns/settings

  • Enable "Transliterate prior to creating URL aliases"
  • Enable "Reduce URL alias characters to letters and numbers"

Add Content

node/add/post

  • Title: Ärger im Büro ist schließlich immer möglich
  • Resulting URL alias:
    • posts/aerger-im-buero-ist-schliesslich-immer-moeglich (ok)

Change post type settings

admin/structure/types/manage/post/configure

  • Publishing settings: Enable Multilingual support

Edit content

node/3/edit

  • Language: Spanish
    • Resulting URL alias: node/3 (strange but let's have a look at it later)
    • Expected result: posts/arger-im-buro-ist-schliesslich-immer-moglich
  • Language: German
    • Resulting alias: posts/aerger-im-buero-ist-schliesslich-immer-moeglich (ok)
  • Language: English
    • Resulting alias: posts/aerger-im-buero-ist-schliesslich-immer-moeglich (not ok)
  • Language: Spanish (again)
    • Resulting alias: posts/aerger-im-buero-ist-schliesslich-immer-moeglich (not ok)

admin/config/regional/language

  • Make English default

node/3/edit

  • Language: none
    • Resulting alias: posts/arger-im-buro-ist-schliesslich-immer-moglich (ok)
  • Language: German
    • Resulting alias: posts/arger-im-buro-ist-schliesslich-immer-moglich (not ok)

@olafgrabienski
Copy link
Author

So, as a next step, we have to look at these conditionals:

$site_langcode = config_get('system.core', 'language_default');
  if (!empty($site_langcode)) {
    $langcode = $site_langcode;
  }
  elseif (!empty($options['language']->langcode)) {
    $langcode = $options['language']->langcode;
  }
  elseif (!empty($options['langcode'])) {
    $langcode = $options['langcode'];
  }

The goal is to let it work as following: To transliterate a URL alias, look first if there is a language setting on the node. If yes, use the node language. If not, use the site's default language.

Any idea?

@Dinornis
Copy link

Hi Olaf, please forgive my ignorance on this topic.
Why certain characters are supposed to be transliterated language-specific?

Shouldn't the german sentence: Ärger im Büro ist schließlich immer möglich be always transliterated as aerger-im-buero-ist-schliesslich-immer-moeglich in all languages? Changing the diphthongs from ae to a, ue to u etc. changes the meaning of the title.

@olafgrabienski
Copy link
Author

@Dinornis Thanks for your question. I'm not expert, but I know that in English you can transliterate German Umlauts without the e. "Müller" can become "Mueller" or "Muller", and so on. Another example: there is the ö which is a German Umlaut and a Swedish letter at the same time. While the German ö can be transliterated to oe, the Swedish letter cannot. So, it's complicated.

Apart from that, I guess my testing examples were misleading. I should have created different posts with different titles for the different examples to make it more clear. In my opinion, the question is however not how a German title is transliterated in different other languages. This can't be the question because the content / the node is not aware of the language you're typing in the title field. Can't explain it better at the moment, hoping to find time in a couple of days.

@quicksketch
Copy link
Member

I looked at the PR and reproduced the problem at hand. Thanks @olafgrabienski for the great instructions.

I found it is possible to get Transliteration working as expected on content, but it's dependent upon the Content Translation module:

  • Enable transliteration of URLs at /admin/config/urls/path/patterns/settings
  • Enable Content Translation module.
  • Enable Multilingual Support (Enabled) under "Publishing settings" at /admin/structure/types/manage/post
  • Create a Post and select "German" as the language
  • Set title to "Post In German Chär"
  • Resulting URL is /de/posts/post-german-chaer (as expected, chaer rather than char)

This works because the Node now has a langcode of de in the node table. If no language is specified, Backdrop assigns content the language und (No language or Undefined). The alias is then based off the language of the content.

Overall, this presents problems for single-language, non-English sites. They shouldn't need to turn on the Content Translation module if all their content is only in a single language.

So I think the correct fix is to make it so that if Language module is enabled and a default language is set, that language should be used when creating content.

@quicksketch
Copy link
Member

I filed a PR at backdrop/backdrop#1850 that changes the default language used by nodes, both with or without Content Translation module. The default of new content is now the site language if another language has been added, or the site language is changed from English.

I think this fixes the primary problem, but the solution may be incomplete. What about other types of entities? Do we need an upgrade path for this (if that's even possible)?

@olafgrabienski
Copy link
Author

@quicksketch Thanks for your PR! I like the approach to use the site's default language as node language. (It may even solve other issues with the language undefined which I remember only vaguely from D7.)

I've had also a quick look at the PR sandbox, configured it to use transliteration, added some languages, made one default and created some nodes to test it. Works as expected, great!

I tested also what happens to transliteration when you enable Multilingual support (Content translation module is not required to do so, btw) and change the language of a node, which dependent of the languages in question should result in a different URL alias. Looking at the browser address bar, in some cases the URL alias doesn't seem to reflect the changed language. But when you edit the node again and have a look at the vertical tab "URL settings", the correctly changed alias is visible. So, the changed URL alias is there, but it's not used in display mode. Strange!

Example:

  • Danish as site's default language
  • Created a post. Result: correct URL alias with Danish transliteration
  • Enabled Multilingual support for posts.
  • Edited the node, and changed the language to German.
  • Browser address bar:
    posts/oeresund... (not ok, still the danish transliteration)
  • Node edit form > URL settings:
    posts/oresund... (ok, not danish)

Can someone try to reproduce the issue?

Finally: I can't estimate the questions regarding other entities or an upgrade path. Reminds me though that we have also transliteration for files. I try to have look at it next week.

If we can't cover other entities with @quicksketch's PR , should we try to improve the first PR which may be less node specific?

@olafgrabienski
Copy link
Author

Just another idea as I had never issues with transliteration for URL aliases in Drupal. Should we have a look at the implementation in Drupal's Transliteration and/or Pathauto modules?

@olafgrabienski
Copy link
Author

So I think the correct fix is to make it so that if Language module is enabled and a default language is set, that language should be used when creating content. (quicksketch)

I like the approach to use the site's default language as node language. (olaf)

Hm, I still like the approach but I'm not sure if it's the correct way for a bugfix. As reported in #2637 (comment), the PR of @quicksketch works for single-language, non-English sites, but it remains unclear if it works on multilingual sites (Feedback welcome). Additionally, there are open questions, e.g. how to handle other entities.

As mentioned in #2637 (comment), 'my' PR works also for single-language, non-English sites, and it does not work on multilingual sites. Apparently even worse. I think however that it could be the right direction for a bugfix, if we just want to make sure transliteration works as it used to work in Drupal (and the Transliteration module).

Comparing the respective files in Backdrop (core/includes/transliteration.inc) and Drupal's transliteration module (transliteration/transliteration.inc), I see a lot of code which handles language related questions, and I've the impression there are some more than formal Backdrop/Drupal differences. Unfortunately, I don't understand the code differences well enough for debugging. @quicksketch and/or @serundeputy : Wanna have a look at them to find what's going wrong in Backdrop?

@olafgrabienski
Copy link
Author

we have functional tests.

Great! The updated PR is still working for me.

Regarding the different approaches (default language vs. interface language) it sounds reasonable to take the one which works fine in D7 and the Transliteration module.

@stpaultim
Copy link
Member

stpaultim commented Aug 25, 2020

I added this to the "Help wanted" blog post as requested by @olafgrabienski. Can anyone provide a quick summary of where this issue is at and which Pull Requests are still being looked at or have we settled on one. It might be helpful to update the original post to clarify this. The PR listed in the original post is not consistent with the one linked in the right column.

EDIT: I looked a little closer and the original post does seem to be up to date regarding PR assuming that backdrop/backdrop#3176 is the correct PR to be looking at.

@olafgrabienski
Copy link
Author

olafgrabienski commented Aug 25, 2020

@stpaultim You're right, backdrop/backdrop#3176 is the correct PR. I've however updated the issue description to clarify the topic of the issue and which one is the current PR.

@quicksketch
Copy link
Member

I think the approach in the latest PR is fine. I read over backdrop/backdrop#3176 and the one thing I think is missing is that we have test coverage specifically for German now, but it's not clear how that transliteration is different than the English one. Could we save the same node title in both languages and then validate the differences between them?

@indigoxela
Copy link
Member

Could we save the same node title in both languages and then validate the differences between them?

I enhanced the test as requested: two nodes, (almost) same title, different default languages - different path aliases.

@indigoxela
Copy link
Member

Friendly reminder: I updated the PR (test) as requested, so it's ready for another review.

@ghost ghost modified the milestones: 1.17.1, 1.17.2 Oct 13, 2020
@klonos
Copy link
Member

klonos commented Oct 31, 2020

Thanks @indigoxela and sorry for taking that long to re-review. This is RTBC now 👍

@klonos
Copy link
Member

klonos commented Oct 31, 2020

...just to be sure, I've also re-tested in the fresh PR sandbox, following the steps from the issue summary. Everything still WFM 👍

@herbdool
Copy link

Code and test looks good to me.

@ghost
Copy link

ghost commented Nov 6, 2020

Thanks @indigoxela, @quicksketch, @klonos, @olafgrabienski, @serundeputy & @herbdool! I've merged backdrop/backdrop#3176 into 1.x and 1.17.x.

This issue was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment