-
-
Notifications
You must be signed in to change notification settings - Fork 59
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Dutch-specific replacement of IJ #437
Comments
Thanks for the report. This looks like something that shouldn't be too hard to fix, but no promises when I might get around to it. A couple questions for when I do (or somebody else does):
|
Note that any borrowed word that has the sequence would also get the ligature: Fiji, Tijuana, etc. It's not clear how to best handle these. Regarding the accents, emphasis on a word where the stressed syllable is written with ij would be written with íj́ or íj for lack of j́. |
In addition to above remarks, I’d probably not substitute i j by U+0133 but rather create an additional i_j glyph for that purpose. |
I don't speak Dutch, so take my answer with a grain of salt: There is no one size fits all, but since in Dutch IJ isn't particularly rare but rather common, words in which I is followed by an unrelated J are truly rare. Acute accents on IJ could be considered common and should be supported ideally. Any others can be ignored, I guess. For ij with acute accent, there are various probable encodings:
Here, 1 and 2 should both be supported. They're the Correct Way. 3 can probably be dismissed, but I wanted to mention it. 4 and especially 5 are typical replacements that can be entered by a keyboard with dead keys. 6 uses the IJ Unicode glyph that, while being discouraged in use, probably also should work with an accent. Capital IJ with acute accent is far rarer than lower case one. In German, my native language, I need to use the LaTeX commands Fixing U+0132/-3 shouldn't be too hard if everything else can be done. |
@laszlonemeth have already implemented IJ ligature in his fonts. |
One simple way to eliminate the ligature in such words is to insert a ZWNJ between i and j. OpenType also has 'NLD ' and 'FLM ' language tags for Dutch/Flemish locales, so even such features are enabled by default, they couldn’t have many more affects to other languages. |
Could you please develop this? The U+0133 glyph is already present in all Libertinus fonts, and it has a convenient name ij/IJ. As a layman, it seems to me that making another i_j glyph outside of the Unicode range (and most likely just referencing U+0133 there) is redundant. It would seem that this digraph is often treated as a single letter in the Netherlands, sometimes taking the form of a |
There can be a vowel letter like e before the i, as in Heijn, but can there also be one after the j? – except in non-applicable words like the mentioned bijectie, of course. |
@khaledhosny is explaining it here a bit: Substituting encoded glyphs for other encoded glyphs is a potential source for unexpected and buggy behaviour. It might not be as obvious as in other cases but this is a well known principle to avoid bugs which is quite simple to follow so I'd not risk anything here. |
FontForge has the feature Copy Reference, after you press Ctrl+V to paste, you’ll get a direct clone of the glyph, it’s a good choice to avoid substituting encoded glyphs for other encoded glyphs and makes maintenance easier. |
Should there also be an opt-in stylistic character variant ( |
I see; Unicode glyphs should be reserved for the intended user input, and a font should not be swapping between them by itself. Any substitutions should be unencoded glyphs. It is much more obvious for the cases in #455, that the font features should not be treated as an autocorrect. Apparently this is a generally recognized good practice, so I will make the appropriate changes right away.
According to the wiki, ÿ is actually a separate letter that is different from ij, even though they look the same in handwiting. Afrikaans uses y, but that is a separate language. In any way, as discussed above, it should be up to the user what they write. |
I meant that |
The problem is not substituting Unicode, but the general idea that font features are intended for stylistic options, not substituting typable characters with other typable characters. Simply put, if the user wants to use y, they will type y. Features like "lowercase to uppercase" or "turn ö into oe" should be left to word processors. In theory, a valid alternate style would be the "broken U" glyph, but I assume that would not fit in this typefacfe, even in Libertinus Sans. |
I don’t fully agree here. When you replace IJ by the encoded IJ digraph you are substituting with a (theoretically) typable character. The difference is much more between typographic variation on the glyph level and orthographic choice. The former is well targetted by font features, doing it in font features for the latter is doubtful.
These two could fall in the category typographic variation. I know that dutch y and ij are equally pronounced but I can’t say anything about their orthography, so that question should be answered by somebody else. The case of Ö vs. Oe in German, however, is no question of orthography. They are explicitly considered equivalent. So, I would accept a stylistic font feature replacing Odieresis with a glyph Odireresis.alt which looks like 'Oe' or like 'Œ' (as a ligature of Oe) or like 'O ͤ' (this should be an uppercase O with a combining small e above) or some variation on this. |
OK, IJ is theoretically typable, but I see the matter of ij/y as an orthographical choice, as you put it. I draw my information from the Internet, so I'm not completely sure, but I understand ij to be the correct spelling, even though historically it might have evolved from y. Substituting ij with y is a good English transcription, because the pronunciation is clearer. However, there is no y in Dutch alphabet. A quick Google reveals:
I stand corrected. |
I don't think so, especially for names. However, placing a small e on the letter qualifies as an (archaic) stylistic choice for Ä, Ö, and Ü. There is no clear-cut real-world example that comes to my mind where using ö and oe makes a difference, but that's probably due to the fact that it's not particularly easy to search for something like that. One could make up words like Koerbe from the English word co-heir (syn. parcener) which, in my opinion, isn't too far fetched. Körbe is the plural of Korb meaning basket. |
In #460, Flemish is not included, which I think is a mistake. What matters isn't what school teachers say, but how the digraph is perceived. I have a custom stylesheet for Wikipedia and, well, have a look: Still, thank you for the NLD fix. Hope to see it soon in the release. |
Is this just a matter of adding FLM anywhere we have special handling for NLD? Or are there other pitfalls to be aware of? |
West Flemish does not use ij as standard Dutch does, it uses y instead, so having the same lookups for ij with the language system FLE may not make sense. Note that the OT 1.5 language system tags did not specify "Flemish" FLE was corresponding to vls, that started in OT 1.6 and it only started being called "Dutch (Flemish)" in OT 1.7. |
Thanks for the background @moyogo! From that info it sounds like doing nothing special for |
@moyogo I didn't really get the
I've looked for some more on the topic and it seems to me that it's not really clear cut since West Flemish does not have a standardized spelling. Apparently, speakers use Y, but consider it a variant or replacement of Dutch IJ. The claim that IJ is absolutely nothing special in West Flemish is nonsense. As far as I came, IJ might be used by a minority of West Flemish speakers where others would use Y. I guess adding IJ treatment is not incorrect for people using Y, but lacking for those using IJ. |
@Bolpat The paragraph before your sample sentence says "The following differences are listed by their Dutch spelling as some different letters have merged their sounds in Standard Dutch but remained separate sounds in West Flemish. " |
Language-specific OTF features are active in Libertinus and work as far as I tested*.
However, in Dutch, there is a digraph/letter/ligature consisting of I and J. There are Unicode points defined for both the upper and lower case form (U+0132/-3,
IJ
andij
) that look (as far as I can tell) identical in the upright form in Libertinus, i.e. it makes visually no difference enteringIJ
orIJ
(ij
orij
) in XeLaTeX.Problem: The italic Libertinus fonts display U+0132/-3 differently than separate I and J as can be seen in the image above.
Expected: When the text is marked Dutch, the italic font should look like the second forms regardless whether
IJ
orIJ
(ij
orij
) is used (i.e. one should not have to resort to enter U+0132/-3 in ones text to get a pleasing output). As the IJ is common in Dutch, it really should look perfectly of the box.It might be worth mentioning that there is no combined version (like Ij) since at the beginning of a word, both letters are set upper case: IJssel.
There are (rare) words that have a coincidental i+j clash such as bijectie (from bi- and ject, Latin for "throw") that one would have to handle manually, but that's a general issue with ligatures, end forms and similar features.
Furthermore, IJ/ij can carry an accent (mostly the acute) and when it does, ideally, both letters should carry it.
Upright IJ/ij with any variant of placing an accent on the letters I and J separately looks as expected. Using U+0132/-3, however, looks terrible. (This state is acceptable, virtually no one uses U+0132/-3 when typing text.)
Italic IJ/ij/U+0132/-3 look all terrible with an accent on them. (The fact that separate I and J with accent look bad is purely due to J/j not working with accents; that italic U+0132/-3 don't work with accents is acceptable.)
*) I tested: Turkish small caps (keep the dot on a small i) and Serbian italic Cyrillic (б г д п т shaped as expected).
The text was updated successfully, but these errors were encountered: