Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Some more characters missing for codepage 437 and 850 console compatibility #205

Closed
PhMajerus opened this issue Nov 23, 2019 · 7 comments · Fixed by #213
Closed

Some more characters missing for codepage 437 and 850 console compatibility #205

PhMajerus opened this issue Nov 23, 2019 · 7 comments · Fixed by #213

Comments

@PhMajerus
Copy link
Contributor

This version does not yet support codepages 437 and 850 completely.

I guess I should have included more details in #142, but the missing characters are not easy to generate with easy to reproduce steps in cmd.exe.

The last missing glyphs are in the characters for values 0x01 to 0x1F, which are typically used as C0 control characters, but can be output as characters by CUI apps that directly control the screen buffer.

CP437-850 missing glyphs

The same characters with Consolas font:
CP437-850 Consolas

So for complete compatibility with older CUI apps, the following characters are also required:

Hex 0123456789ABCDEF
0_ ☺☻♥♦♣♠•◘○◙♂♀♪♫☼
1_ ►◄↕‼¶§▬↨↑↓→←∟↔▲▼

Their Unicode codepoints are as follows:
0_ : (NULL), U+263A, U+263B, U+2665, U+2666, U+2663, U+2660, U+2022, U+25D8, U+25CB, U+25D9, U+2642, U+2640, U+266A, U+266B, U+263C
1_ : U+25BA, U+25C4, U+2195, U+203C, U+00B6, U+00A7, U+25AC, U+21A8, U+2191, U+2193, U+2192, U+2190, U+221F, U+2194, U+25B2, U+25BC

As far as I can tell, the characters in 0x01, 0x02, 0x0E and 0x0F are missing, codepoints U+263A, U+263B, U+266B and U+263C, which are the two smiley faces, the double music note and the sun.
It would be great to have these included for full console compatibility.

@PhMajerus PhMajerus changed the title Bug Report (IF I DO NOT CHANGE THIS THE ISSUE WILL BE CLOSED) Some more characters missing for codepage 437 and 850 console compatibility Nov 23, 2019
@PhMajerus PhMajerus mentioned this issue Nov 23, 2019
17 tasks
@aaronbell
Copy link
Collaborator

I asked this question on the other thread, but probably better to continue discussion here:

For the control characters—I've never actually included those in a font—and am not sure I fully understand their purpose. Can you provide more info? Thanks!

@ExE-Boss
Copy link

CP437 and CP850 map U+263A, U+263B, U+2665, U+2666, U+2663, U+2660, U+2022, U+25D8, U+25CB, U+25D9, U+2642, U+2640, U+266A, U+266B, U+263C, U+25BA, U+25C4, U+2195, U+203C, U+00B6, U+00A7, U+25AC, U+21A8, U+2191, U+2193, U+2192, U+2190, U+221F, U+2194, U+25B2 and U+25BC to 0x01‑0x1F when written directly to the output buffer (rather than being processed by the console and terminal as control characters).

@PhMajerus
Copy link
Contributor Author

(Copy of the answer in the other thread, just in case someone else is reading)

@aaronbell You can leave #142 closed as I didn't include much details about the 0x01 to 0x1F range besides "providing all 256 characters from CP437 would highly improve compatibility with original PC console/terminal outputs" in that bug report.
I now moved the details for the missing characters to #205 to avoid the long post here.

About the C0 control characters, these are not glyphs in a font, and not characters in the ASCII standard, but in-band control for CUI apps.
When a CUI app outputs text, the 0x01 to 0x1F characters are typically not displayed, but are instead interpreted by the console as being control commands such as bell (make the terminal make a sound), backspace, tab, return, ... as well as a bunch of commands or data separators meant for serial communication.
This means they do not display anything, and instead make the console or the serial device change some state.
However, MS-DOS apps could also write values directly into the text screen buffer memory, so it was technically possible to use these values for characters that more advanced apps could display.

All this means that these characters would not be usable by CUI apps that simply write to the console like a stream, they are even less common than high-ASCII, but could be used by apps that want to control the whole screen buffer for text-based GUI apps for example. They are not part of ASCII, but were usable since the early IBM-PC days.

From a font point of view, you don't need to care about the C0 control characters, but only to know that these values have been reused by the original MDA and CGA adapters to provide a few more characters to apps that handle the text screen buffer directly, and therefore these glyphs need to be included in your set if you want your font to be compatible with the original console.

@aaronbell
Copy link
Collaborator

Thanks @PhMajerus. I'm a bit confused though since you say both "these are not glyphs in a font" and "therefore these glyphs need to be included in your set". Unless you mean that they don't need to have any content, and can just be blank zero-width glyphs?

Also, per @ExE-Boss' comment, it sounds like the specific codepages already map to those control characters—so this is more of general console support than CP support?

@PhMajerus
Copy link
Contributor Author

PhMajerus commented Nov 23, 2019

@aaronbell
The values 0x01 to 0xFF have two different uses:

If sent to a terminal over a serial link or WriteConsoleA API function, they are in-band control that will be processed and will not render anything, so your font doesn't need anything for C0 control characters because they do not display anything, and you don't really need to understand how C0 control characters work any more than you'd need to understand how VT control sequences work to set the colors and such.

However, when writing directly to the text screen memory of a display adapter, every value can be a character, so the original IBM-PC MDA reused these values to provide 31 more characters (not 32 because 0x00 is a special one, rendered as a space because it is used as a string delimiter in the C language, even if technically they could have used it as well), and every graphics adapter and text-screen emulator like the NT console have inherited that behavior since then.

So you don't need to worry about C0 control characters as these are not characters from a font point of view, but you do need to care about the MDA extra characters reusing the same values, as apps that access video memory directly can take advantage of them since the very first IBM PC.

As far as I know, localized codepages only change the high-ASCII 0x80 to 0xFF characters, so the 31 characters added in the C0 area (0x00 to 0x1F) should be the same for all codepages.
To be clear, you need to add the 4 missing characters for complete console support.

Did I manage to make it a bit more clear? don't hesitate to let me know, it's a tricky thing to understand even for software developers, legacy stuff based on physical serial links and video memory.

@aaronbell
Copy link
Collaborator

I think I understand 😅. Anyway, sounds like with the addition of the last remaining symbol codepoints that everything will be good!

@PhMajerus
Copy link
Contributor Author

@aaronbell
Yeah, just 4 more glyphs to go 😄
Thanks for your work on supporting those codepages by the way, I really like being able to switch from Lucida Console and Consolas to Cascadia Mono.

Also, remember that these are just to complete the support for CUI apps that use 8-bit characters for the codepages you choose to support.
Since the beginning of Windows NT, CUI apps are not limited to 8-bit codepages, and can use the whole set of Unicode characters by using the Unicode (UCS-2 that became UTF-16) API instead.
This means CUI apps for Windows NT have been using all characters that are available in Lucida Console, as it was the default (and for a long time, only) font used by the Windows console, and software have just been written to work in that setup.

If you want to make sure all CUI apps work as intended with Cascadia, you probably should try to include all the characters found in Lucida Console at some point.
Since one of the goal of Lucida Console was also to support a few more codepages than you currently do, it might be a interesting goal to achieve, once you have these few more codepages, I don't think there will be a lot of extra characters to include.

See https://docs.microsoft.com/en-us/typography/font-list/lucida-console for the details and list of codepages supported by Lucida Console.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants