Analysing support for text layout on the Web

The primary mission of the W3C Internationalization work, and of the W3C itself, is to create a Web for All. The W3C works with and relies on other organizations and initiatives to complement this work: the Unicode Consortium creates the characters needed for language support, and algorithms for their use; font designers and developers around the world provide fonts to support the world's languages; ICANN and the IETF are leading work on International Domain Names and Universal Access to local user names.

A particular area of interest and focus for the W3C is styling the layout of content, in web pages and in digital publishing. Much of this can be addressed in CSS, but there are other technologies that also need to take such factors into account, such as Timed Text, WebVTT, SVG, XSL, and to some extent markup models such as HTML, etc. It is particularly concerned with the mechanics of text, such as rules for line-breaking & justification, local approaches to expressing emphasis or decorating text, localising counter styles, supporting bidirectional text in markup, initial-letter styling, hyphenation, page layout, etc.

This area is one where it is typically difficult to find information, especially in English, about user expectations. There are few experts actively involved in ensuring that these typographic mechanisms are well supported on the Web. It is also an area where there may be some degree of fluidity, as people in many cultures are still trying to establish for themselves how their previous traditions translate into the world of Web-based content.

Recently the W3C has been making additional efforts to better understand the needs of the various writing systems and cultures around the world, and communicate those to specification and browser developers. Let's look at some of the things that are currently in progress or beginning, as well as possible future directions.

Lreq documents

A few years ago, a group of experts from the Japanese publishing industry came together with others at the W3C to produce a set of requirements for support of Japanese layout. The resulting document, Requirements for Japanese Text Layout, known as 'jlreq' and published in both English and Japanese, was a resounding success, and was even published in book form in Japan.

Figure 1: Pages from the Japanese layout requirements

Following the publication of jlreq, groups of other experts came together to work on similar documents for Hangul (klreq), Simplified & Traditional Chinese (clreq), Ethiopic (elreq), Devanagari (ilreq), Arabic (alreq), and Hebrew (hlreq). There are also embryonic developments related to Tibetan (tlreq) and Mongolian, and the Digital Publishing WG at the W3C is working on a 'Latin req' document.

It is important to note that an lreq document only describes what you are expected to see when reading text, and not what applications or authors are expected to do. By describing only how a writing system appears to users, the document avoids becoming out of date. Information that describes specific technologies (such as CSS) related to a writing system, or that describes how current browsers or e-readers handle the writing system, belongs in a separate place. Which brings us to...

Gap analysis

These efforts rely on voluntary contributions, and are all currently still in development. It doesn't help that the task of describing all the various aspects of layout for a writing system is a rather daunting prospect. Not only that, but even with jlreq we haven't really seen much more than adhoc activity when it comes to the logical next steps of comparing the requirements to the current state of the specifications at the W3C or to what browsers support. (Although the recently established Advanced Publishing Lab in Tokyo intends to address this more directly for Japanese over the coming two years.) This extra effort needs to clearly identify gaps, and then prioritise features that need support, based on the pain that this creates for the local community.

Recently, we have been considering approaching the task of documenting the requirements somewhat differently. The idea is that it would be best to start with the gap analysis, rather than the open-ended requirements document. The rationale is that this will focus work on the key areas of need, and provide a better sense of progress to move the work on. The lreq document would still exist, but it would be added to as gaps are identified, in order to communicate what is needed to bridge that gap. Gap analysis work should also be supported by small tests or screenshots – to show where features are broken, and help implementers check that fixes are producing what is needed. (See Setting up a Gap Analysis Project for more information.)

The document containing the gap analysis information is expected to prompt changes to W3C technologies or to browser and e-readers. It therefore needs to contain quite specific information, and parts of that information will hopefully be obsoleted and updated as time goes by and issues are addressed.

Another recent development, which is still in the prototype stage, is a matrix that shows where we stand on support for layout features for a range of languages (which we hope will expand as experts contribute data). The matrix uses colour-coding to show areas where work is needed, and to what extent unsupported features affect the user's experience with the Web.

Figure 2: Partial view of the language matrix.

For a number of languages we have piloted a simple format for describing what the gaps are, and their impact on users. This is still very much an early attempt, and the data provided is not validated or even debated by local experts at this point. You can find these gap analysis reports by following the link on the language name, or by clicking on a cell that shows a feature needing attention.

Figure 3: Part of a gap-analysis report for Amharic & Tigriña languages (Ethiopic script).

Text layout index

Across the top of the matrix are a set of feature types associated with text layout. This classification of features has been harmonised with other resources on the W3C International site. The feature names on the matrix link to sections in the International text layout and typography index. This document provides links to requirements information, related specification fragments, and tests related to that feature. It is intended to help developers of specifications and browser or ereader implementers to find information. The CSS specification will point to this document for information that is too detailed for the spec.

Figure 4: A small section in the International text layout and typography index.

Issue tracking

In addition to the resources just mentioned, the layout index has links to github issues in W3C repositories. There are separate links for (a) requests for information, (b) issues raised against a spec, and (c) browser bugs. For example, if you wanted to find out what information developers are currently seeking with regards to text decoration (under- or over lines, etc.) you simply click on the link in that section. This takes you to a list of relevant github issues in the I18n Working Group's tracker. That list can be refined by language, if needed.

Figure 5: Some of the requests for information about how scripts work that can be found in the issue tracker.

Those issues can also be found from the Text Layout Issue Tracker, which also allows you to filter the results by type and by language. For example, to see what information is currently needed for chinese, click on this link. The information is grouped according to the same feature types.

Type samples

Another recently added resource, the Type Samples Repo, provides a place to store pictures or scans of text layout features in the wild. Again, you can filter the results by the standard feature types, by language, and by medium.

Figure 6: An example of results from the type samples repository search.

International issue repository

As an experiment, we set up a repository on GitHub where people can log problems they are encountering when deploying or using the Web globally. This helps the W3C be more aware of the issues people are experiencing. For example, someone could tell us that vertical text isn't working quite as expected, or that webfonts take up too much bandwidth for mobile users in developing countries, or that browsers need to recognise native calendars or time and date formats for a particular community.

Help us do more

The W3C has recently been looking for partners to help develop the work described above. We need experts to come forward from around the world to contribute their knowledge so that we can better understand, quantify and address the remaining work to support a Web for All. We are also looking for organizations would can contribute funding and resources to support an expansion of the internationalisation work.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly