Skip to content

Unicode style guide

r12a edited this page Apr 7, 2022 · 13 revisions

This constitutes a set of initial proposals for how to mark up The Unicode Standard in HTML. The content of the wiki is a work in progress. It will eventually be published as an HTML document, but we're using a wiki to get started.

The aim is to create HTML markup that is as simple and semantic as possible. All presentational information will be assigned by the style sheet as far as possible.

This style guide will initially focus on providing target markup for data extracted into HTML from FrameMaker. There will be a companion wiki page that provides a script for converting the extracted markup to a form that conforms with the style guide. This guide won't initially address boilerplate text such as headers and footers, and the finalisation of figures and tables will be a separate project. The style guide will also be complemented by a style sheet and some small Javascript function (for things like generating a TOC).

Structural markup

Chapter start

Each page of the main text begins with the following markup (actual text to be replaced as appropriate).

<div class="chapterNumber" id="id_here">Chapter 13</div>

<h1 class="title" id="id_here">South and Central Asia-II</h1>

<div class="chapterSubtitle" id="id_here">Other Modern Scripts</div>

<p id="id_here">This chapter describes the following other modern scripts in South and Central Asia:</p>

<ul id="chapterTOC"></ul>

The #toc list is automatically filled by a JavaScript function that harvests the h2 headings and lists links to them in 3 columns. The number of columns is set in the CSS.

Sections & headings

The main body is divided into sections, for each of which a heading is provided. Each <section> tag has an id. An id is not needed for the heading. The id can be anything unique, but it is helpful to have one that briefly indicates the content of the section, since the id will be used as a placeholder to point back to the section in other parts of the text: informative ids make it easier to ensure that the links are correct. Id's should use underscores or camelCase rather than hyphens to separate words: this makes it easier to pick up the id when copying to a link elsewhere.

The heading tag reflects the level of the section (ie. use h2 or h3 tags, etc. that show an explicit level). This helps for autogenerating the TOC, for checking structure, and for display in some editing environments.

All headings should be surrounded by a <section> tag, and all <section> tags should have a heading. This avoids warnings in the HTML5 validator.

Do not supply number for section headings. These will be provided automatically by scripting, and will therefore always be in synch with changes to the page.

Example of levels 2 and 3 headings:

<section id="my_section_id">
<h2>My_header_goes_here</h2>

<section id="id_here">
<h3 id="id_here">Myanmar: U+1000–U+109F</h3>
...
</section>

<section id="id_here">
<h3 id="id_here">Myanmar Extended-A: U+AA60–U+AA7F</h3>
...
</section>

<section id="id_here">
<h3 id="id_here">Khamti Shan</h3>
...
</section>
</section>

Suggestion: It's a little easier to find your way around the source text if you leave 5-6 blank lines before the start of each section.

Figures

Figures are wrapped in <figure> tags and each tag has an id which begins with "fig_". Ids should use underscores or camel case after the initial part – this is to make it easy to pick up an id when you want to copy it to a place that points back to the figure.

Figures usually have a caption, contained in <figcaption> tags. The caption appears before the body of the figure (although the dispayed location can be changed by the CSS). No id is necessary. No figure numbering appears in the caption – numbering will be applied (and updated) automatically by the CSS.

Example:

<figure id="fig_my_figure">
<figcaption>Caption_goes_here</figcaption>
Body_of_the_figure_here
</figure>

Inline markup

Pointing to sections

The following is an example of markup that points to a section elsewhere in the document. The page name is not needed if the section is on the same page.

... this custom is not very prevalent today. (See <a class="secref">page_name.html#h_arabic</a>.)

JavaScript scripting automatically finds all these items and fills them with link text, such as the following:

... this custom is not very prevalent today. (See Section 9.2, Arabic.)

Pointing to figures

The following is an example of markup that points to a figure elsewhere in the document. The page name is not needed if the figure is on the same page.

<a class="figref">page_name.html#fig_arabic</a> shows the characters in the order in which ...

JavaScript scripting automatically finds all these items and fills them with link text, such as the following:

Figure 13-1 shows the characters in the order in which ...

Abbreviations & acronyms

Use the abbr element for both of these. Spell out the full form in the title attribute.

Marking changes

To show changes to the text use ins and del. However, remember that if you do commits carefully, such markup is not needed. People will be able to see the changes by looking at the diff for the commit on the Github site.

If you are making a large number of formatting changes to a page (such as removing end of line spaces) you should commit those changes separately from any substantive changes. This is because they will obscure the important changes in the diff.

Other inline markup

No presentational elements (such as <font> tags or using tables for layout) should be used. No presentational attributes should be use either (eg. things like border, align, etc.). Keep the markup as simple and semantic as possible, and do all the presentational stuff in the CSS style sheet as far as possible. If styling is needed that is absolutely idiosynchratic to a particular figure, table, etc. then use a style attribute on the relevant tag.

The following markup is recommended for English pages.

Emphasis (general)

<em> Example: In keyboard input it is not always the case that...

Emphasis (stronger)

<strong> Example: You must absolutely not do that!

New terms

<dfn id="def_[termName]" title="[termName]"> Example: Such sets of characters are also called repertoires.

Document titles

<cite> Example: ...see Requirements for String Identity Matching.

Key word for technical use

<code class="kw" translate="no"> Example: The IANA charset value

(There are more like this at https://www.w3.org/International/i18n-activity/guidelines/editing#inline, including some suggestions for CSS styling.)