Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to order items in a multilingual list of languages? #20

Open
r12a opened this issue Jan 12, 2022 · 9 comments
Open

How to order items in a multilingual list of languages? #20

r12a opened this issue Jan 12, 2022 · 9 comments

Comments

@r12a
Copy link
Contributor

r12a commented Jan 12, 2022

When a page has multiple translations available it is not clear in which order the various languages should be listed in the list of links pointing to the translated pages. That list will contain link text for each language that is in the language and script used for that language.

A recent suggestion was to start with the languages that are most used on the site, and then order the rest by the English name and sort order. This is a long-standing question, and i'm not sure there is any perfect answer, but i don't think that's it. I also think the answer depends a little on the size and visibility of the list. Anyway, here are some thoughts.

Sorting by most-often-used language is mostly basing the decision on our view of the world, rather than the user's needs. The issue at hand is rather how to help the individual user locate their language as painlessly as possible, and with the minimal amount of implied bias. I don't think it should be an exercise in classification. We should look for a way of ordering items that implies no bias, and is predictable.

Let me make suggestions separately about general ordering and about raising things to the top of the list. We'll start with the former, because the general ordering is needed anyway whether or not the raising occurs.

Let me also preface what i say with the thought that there are different types of use case. Mostly, it seems to me that we'll be dealing with a smallish number of languages which will all be visible to the user, rather than dealing with a very long list of languages in a selection control that requires scrolling. (This may actually make it less important to raise certain languages to the top, but see below.)

Ordering by Latin name is problematic, mainly because it is highly biased to one culture and smacks of either cultural imperialism, or lack of concern. But it also has practical ramifications: a person looking up their own language would have to know how it is written in English, eg. the endonym Surayt is Turoyo in English, Farsi is Persian or Dari depending on the region, Nasa Yuwe is Páez, etc. It may also mean deciding between two or more alternative names that change the order – should a user expect to find their language under Jula or Djoula, Burmese or Myanmar, Swahili or Kiswahili, or Tamazight or Berber (which, although the more common name in English, is a non-preferred name for its speakers because it means 'barbarian')? Again, it doesn't seek to help the user quickly locate their language, but is a method that is simply convenient for the content creators.

Another possibility is to sort the items using the Unicode Collation Algorithm. This produces a fixed and predictable order for any sequence of items, but in this case all items using the same script are presented together – so the user looks for the script first, and then for the language. The appropriate order of languages within a script group is a little odd for the average user, since it won't follow the tailored collation algorithms for their particular language (not least because those alphabetic rules won't address all the characters needed for all the languages). This may not be an issue for typical lists of non-Latin script, since the number of items is likely to remain small, but for Latin-script (and Cyrillic or Arabic script languages) where the number of items might be larger, then it won't correspond to the alphabetic ordering for each language (eg. ä comes after z in Swedish, ch comes after h in Slovak, mb comes before ɓ and then c in Fula, etc.)

The way we order the 22 languages in the selector at https://www.w3.org/International/articlelist is to go by the English alphabetic order of the BCP language subtags for each language. It's not a perfect solution, but at least it produces a predictable order, and with slightly less apparent bias, since it's based on a global standard, rather than on English. For example, Greek is sorted under e for el, and German is under d for de. It also avoids the need to worry about language-specific tailoring of collation.

Now for raising certain languages to the top.

I always find it annoying when a pull-down list puts USA or US-English at the top, and i have to scroll forever to find UK or UK-English. In those situations, i can't help feeling a little as if the content developers thought i was less important than our American cousins. Sometimes, in a long list, if UK-English isn't at the top, i'll waste time scrolling down to find that it isn't there anyway, and i have to waste more time going back to the beginning.

Note that this is not so much of an issue if you're only dealing with up to 10 or 15 languages that are simultaneously visible, however if done well it could still be nice for the user. The question is how to do it well.

Any kind of ordering based on page usage rates sounds either like a non-user-centric view of the world, or implies a ranking of importance. It may also produce different orders from page to page or from time to time, which is also problematic.

I think that raising items to the top of the list needs to be done in a way that is clearly aimed at helping each user access their own language quickly, taking into account their individual point of view on the world.

So here's a suggestion. I think that a whizz bang implementation could look at the browser language preferences of the user and pull those items, in their already ranked order, to the top of the list. This would be very user-centric – adapting the list to reflect who is looking at it. Then the remaining languages would be ordered per one of the default orderings described above (i favour the language subtag approach). (Yes, sometimes, the user's language preferences won't be set in a way that reflects their actual language preferences, but actually much of the time it will, since those preferences tend to be set when the user installs a browser, and can also be changed by the user.)

To make it clear to the user what's going on, it would probably be best to visually show a clear division between the items that are raised to the top, and those that follow.

More thoughts?

@aphillips
Copy link
Contributor

I think this is a good summary. A couple of additional comments.

Implicit but unstated is that endonyms are preferred to displaying in the current language, ie. Deutsch not German

The list should be in a predictable order and mostly stay in the same order (unless one alters the list, say by adding a language). This promotes familiarity. On the other hand, pulling languages such as those in Accept-Language (or defaulting selection so the list starts at the best match item in that list), as Richard suggests, is probably a good idea. Having "pull to front" items also appear in their natural location in the list is a good idea of the list is long (2 or more "pages")

Pulling the "most important" or most common item to the top is reasonably common. If the list of languages is strictly one of language, pulling e.g. English to the front of the list because that is the default language makes sense. If the languages are distinguished more granularly (such as Richard's example of US vs. UK English) this can produce a more annoying experience.

Ordering by Unicode's default ordering gives advantages to certain scripts (Latin in particular) and disadvantages others. If the list is not particularly long, perhaps featuring only one or two items in each script, the resulting list looks as if it is unordered and is both navigationally more difficult and can provoke negative reaction ("why is XXX considered last?")

It is often a good idea to provide affordances, such as tooltips, showing the names in the currently selected language and/or the language tag. Here's an example from our internal developer site:

image

Richard also brings up that long lists have different requirements or considerations than short ones. A lot of my I18N demos feature a locale chooser that can pick from the full list of CLDR supported locales. This list has hundreds of entries and is ordered by language tag. I built a custom control for this (and bear in mind that this is a I18N demo and not meant for real customers to use) which is ordered by language tag (because that is how I usually access the list--real users don't know their language tags necessarily). I make affordances for finding items in the list--particularly type-ahead matching/scrolling, tool tips, etc. This list can still be really hard to work with in spite of that and a real user interface would probably use other means to help organize the list for users.

image

@duerst
Copy link

duerst commented Jan 13, 2022

For the specific question (W3C translations), can we get some actual numbers, in particular the max number of languages?

For a smallish number (~20), the top right of https://www.w3.org/International/articlelist looks good. No need to pull down something or scroll, and users are good at recognizing their own language in such a list of languages, even more so if the language uses a rarer script.

For longer lists, I'd look at what Wikipedia does. Most people use Wikipedia more than the W3C site, so anything close to Wikipedia will help. Not because it's the ultimately best solution (Wikipedia has a lot of experience, so it may be close to that, too), but just because people are already used to it.

As far as I understand, Wikipedia lists languages in native script by approximate order of Latin transcription. So Ελληνικά is sorted with Ell..., 한국어 with Han..., 日本語 with Nih... and so on. This is definitely better than using language tags, which can be way off. Wikipedia also does some collapsing. The full list is only visible very shortly, if at all. I guess to see the full list in an article with many tranlations (e.g. https://en.wikipedia.org/wiki/Mathematics, which claims 235), one has to dig into the page source.

After collapsing, Wikipedia then only shows 'frequent' languages, where 'frequent' is determined by both the current user and the overall usership. For W3C, I think that's overkill. For the remaining languages, there's then a popup organized by region (Asia, Africa,...).

One more point: For accessibility, a structured list (with whatever criterion that creates reasonably balanced parts) may be preferred over a very long linear list.

@xfq
Copy link
Member

xfq commented Jan 13, 2022

Personally, I think it's better to put the user's preferred languages first, and put other languages at the back in some predictable order. That's what Wikipedia uses.

There are two problems to solve: 1) how to find the user's preferred languages, and 2) how to order the languages that follow.

About the first problem, one way is to select the languages according to the browser language preferences of the user, and another way is to let the user configure the preferred languages by themselves.

As for sorting the languages that follow, I think one way is to use the way the language is called in the currently displayed language. If the current page is in English, the list of languages should be sorted alphabetically. If the current page is in Japanese, the list of languages should be sorted according to the order of the Japanese syllabary. Another way is to use some fixed order (such as the BCP 47 language subtags) in all language versions.

@xfq
Copy link
Member

xfq commented Jan 13, 2022

Here's an example from Wikipedia:

wikipedia

@xfq
Copy link
Member

xfq commented Jan 13, 2022

@xfq
Copy link
Member

xfq commented Jan 13, 2022

For the remaining languages, there's then a popup organized by region (Asia, Africa,...).

For this approach, we also need to consider how the regions should be sorted.

@andjc
Copy link

andjc commented Jan 13, 2022 via email

@andjc
Copy link

andjc commented Jan 13, 2022 via email

@himorin
Copy link

himorin commented Jan 13, 2022

I agree with lists languages in native script, that I suppose one of use cases for language list (not sure it could be called as 'major' or not) is jumping (back?) to user's understandable language, and it might be important to list in native presentation (script) of language themselves. If all items/languages in list are written in a script of the page contents, it could be hard for users to find...

For their order, I am quite not sure it is a good selection or not to take a way the language is called in the currently displayed language, like one language are called in different presentations by languages - which makes user confused on around where their preferred language is listed, e.g. Dutch (Nederlands in native?) is called as /orandago/ in Japanese (coming from Holland)...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants