Analysing support for text layout on the Web

The primary mission of the W3C Internationalization work, and of the W3C itself, is to create a Web for All. The W3C works with and relies on other organizations and initiatives to complement this work: the Unicode Consortium creates the characters needed for language support, and algorithms for their use; font designers and developers around the world provide fonts to support the world's languages; ICANN and the IETF are leading work on International Domain Names and Universal Access to local user names.

A particular area of interest and focus for the W3C is styling the layout of content, in web pages and in digital publishing. Much of this can be addressed in CSS, but there are other technologies that also need to take such factors into account, such as Timed Text, WebVTT, SVG, XSL, and to some extent markup models such as HTML, etc. It is particularly concerned with the mechanics of text, such as rules for line-breaking & justification, local approaches to expressing emphasis or decorating text, localising counter styles, supporting bidirectional text in markup, initial-letter styling, hyphenation, page layout, etc.

This area is one where it is typically difficult to find information, especially in English, about user expectations. There are few experts actively involved in ensuring that these typographic mechanisms are well supported on the Web. It is also an area where there may be some degree of fluidity, as people in many cultures are still trying to establish for themselves how their previous traditions translate into the world of Web-based content.

Recently the W3C has been making additional efforts to better understand the needs of the various writing systems and cultures around the world, and communicate those to specification and browser developers. Let's look at some of the things that are currently in progress or beginning, as well as possible future directions.

The Language Matrix

The language matrix is a recent innovation. We plan to use it as a heat map to show how well languages are supported on the Web. We started tracking around 80 languages, but are open to add others if there are experts available to provide the necessary information about them.

The columns of the matrix represent various typographic features that need to be supported by Web technologies in order for people to use the Web. The colours of the matrix show whether, for a given language, those features are well supported, need additional work for advance publications, need additional work for basic web use, or are problematic enough to make it difficult to use the Web in that language.

At the time of writing this, 33 languages need work for advanced publishing; 27 need work for basic features, and 1 doesn't work well on the Web. However, 41% of the cells in the matrix carry question marks: indicating that we need to do some research in order to know the status for that feature.

The matrix should allow us to get an overall idea of how well the Web is supported for local users around the world, and help identify and prioritise areas where work is needed.

Figure 2: Partial view of the language matrix.

Lreq documents

A few years ago, a group of experts from the Japanese publishing industry came together with others at the W3C to produce a set of requirements for support of Japanese layout. The resulting document, Requirements for Japanese Text Layout, known as 'jlreq' and published in both English and Japanese, was a resounding success, and was even published in book form in Japan.

Figure 1: Pages from the Japanese layout requirements

Following the publication of jlreq, groups of other experts came together to work on similar documents for Hangul (klreq), Simplified & Traditional Chinese (clreq), Ethiopic (elreq), Devanagari (ilreq), Arabic (alreq), and Mongolian (hlreq). There are also embryonic developments related to Tibetan (tlreq) and Hebrew, and the Digital Publishing WG at the W3C is working on a 'Latin req' document.

It is important to note that an lreq document only describes what you are expected to see when reading text, and not what applications or authors are expected to do. By describing only how a writing system appears to users, the document avoids becoming out of date. Information that describes specific technologies (such as CSS) related to a writing system, or that describes how current browsers or e-readers handle the writing system, belongs in a separate place. Which brings us to...

Gap analysis

These efforts rely on voluntary contributions, and are all currently still in development. It doesn't help that the task of describing all the various aspects of layout for a writing system is a rather daunting prospect. Not only that, but even with jlreq we haven't really seen much more than adhoc activity when it comes to the logical next steps of comparing the requirements to the current state of the specifications at the W3C or to what browsers support. (Although the recently established Advanced Publishing Lab in Tokyo intends to address this more directly for Japanese over the coming two years.) This extra effort needs to clearly identify gaps, and then prioritise features that need support, based on the pain that this creates for the local community.

Recently, we have been considering approaching the task of documenting the requirements somewhat differently. The idea is that it would be best to start with the gap analysis, rather than the open-ended requirements document. The rationale is that this will focus work on the key areas of need, and provide a better sense of progress to move the work on. The lreq document would still exist, but it would be added to as gaps are identified, in order to communicate what is needed to bridge that gap. Gap analysis work should also be supported by small tests or screenshots – to show where features are broken, and help implementers check that fixes are producing what is needed. (See Setting up a Gap Analysis Project for more information.)

The document containing the gap analysis information is expected to prompt changes to W3C technologies or to browser and e-readers. It therefore needs to contain quite specific information, and parts of that information will hopefully be obsoleted and updated as time goes by and issues are addressed.

Another recent development, which is still in the prototype stage, is a matrix that shows where we stand on support for layout features for a range of languages (which we hope will expand as experts contribute data). The matrix uses colour-coding to show areas where work is needed, and to what extent unsupported features affect the user's experience with the Web.

Figure 2: Partial view of the language matrix.

For a number of languages we have piloted a simple format for describing what the gaps are, and their impact on users. This is still very much an early attempt, and the data provided is not validated or even debated by local experts at this point. You can find these gap analysis reports by following the link on the language name, or by clicking on a cell that shows a feature needing attention.

Figure 3: An example of what part of a gap-analysis report for Japanese might look like.

Text layout index

Across the top of the matrix are a set of feature types associated with text layout. This classification of features has been harmonised with other resources on the W3C International site. The feature names on the matrix link to sections in the International text layout and typography index. This document provides links to requirements information, related specification fragments, and tests related to that feature. It is intended to help developers of specifications and browser or ereader implementers to find information. The CSS specification will point to this document for information that is too detailed for the spec.

Figure 4: A small section in the International text layout and typography index.

Issue tracking

In addition to the resources just mentioned, the layout index has links to github issues in W3C repositories. There are separate links for (a) requests for information, (b) issues raised against a spec, and (c) browser bugs. For example, if you wanted to find out what information developers are currently seeking with regards to text decoration (under- or over lines, etc.) you simply click on the link in that section. This takes you to a list of relevant github issues in the I18n Working Group's tracker. That list can be refined by language, if needed.

Figure 5: Some of the requests for information about how scripts work that can be found in the issue tracker.

Those issues can also be found from the Text Layout Issue Tracker, which also allows you to filter the results by type and by language. For example, to see what information is currently needed for chinese, click on this link. The information is grouped according to the same feature types.

Various mailing lists belonging to the W3C Internationalization Interest Group receive notifications about GitHub activity. For example, whenever a github issue is created, raised, added to, or closed in the Southeast Asian Layout repository, the public-i18n-sealreq mailing list is notified via daily digests. The digest notifications not only capture activity in that repository, but also report changes to issues in the CSS, HTML, etc, repositories that have a sealreq label.

Figure 6: An example of how a daily notification digest might look. (The issue titles are links.)

Type samples

Another recently added resource, the Type Samples Repo, provides a place to store pictures or scans of text layout features in the wild. Again, you can filter the results by the standard feature types, by language, and by medium.

Figure 7: An example of results from the type samples repository search.

International issue repository

As an experiment, we set up a repository on GitHub where people can log problems they are encountering when deploying or using the Web globally. This helps the W3C be more aware of the issues people are experiencing. For example, someone could tell us that vertical text isn't working quite as expected, or that webfonts take up too much bandwidth for mobile users in developing countries, or that browsers need to recognise native calendars or time and date formats for a particular community.

Layout requirements groups

Work on defining gap-analysis and requirements is carried out in a number of 'lreq' groups. See the current list of groups. A typical group would have one or more chairs and meet regularly for teleconferences. Key contributors to the group's deliverables form a core. They include document editors, but also those who regularly contribute time for review and discussion.

Beyond that group lie any number of people who follow the work by subscribing to the notification emails and occasionally contributing comments to issues. The W3C has made participation in these groups as easy as possible in an attempt to build networks of experts. These people can be consulted when questions arise, or can comment on and track developments related to a particular script or group of scripts.

Help us do more

The W3C has recently been looking for partners to help develop the work described above. We need experts to come forward from around the world to contribute their knowledge so that we can better understand, quantify and address the remaining work to support a Web for All. We are also looking for organizations would can contribute funding and resources to support an expansion of the internationalisation work.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly