Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

-echo option can print correct looking characters in terminal #112

Closed
wants to merge 28 commits into from
Closed

Conversation

mobluse
Copy link
Contributor

@mobluse mobluse commented Sep 22, 2019

I thought it would be good if what is echoed doesn't contain inverted question marks (unknown character). The goal is that this would print correct looking characters for every mode and in every terminal. Now it only works if the terminal has the same encoding as the system this was compiled on.

@mist64
Copy link
Collaborator

mist64 commented Sep 22, 2019

I like the feature, but:

  • I find using literal UTF-8 characters in C files too fragile. These should be byte sequences instead; the C file should be plain ASCII.
  • That said, I think this can be done more simply: Here is code to encode a Unicode uint32_t into an UTF-8 byte stream. (There are many other implementations as well.) To get from ISO-8859-15 to Unicode, all you need to do is replace the yellow entries in this table with their Unicode code points, otherwise just take the byte value, as it already is Unicode. So no but switch table, just special case 8 characters and feed the rest into a UTF-8 encoder.

@mobluse
Copy link
Contributor Author

mobluse commented Sep 23, 2019

I will remove UTF8 characters from the C code when I get farther, but it's useful to see the real characters when you develop. I now also support some PETSCII (both unshifted and shifted) and some control codes that map to VT100/xterm.
https://style64.org/petscii/
https://docs.microsoft.com/en-us/windows/console/console-virtual-terminal-sequences

My idea was to adapt to the users locale and use iconv-library, but I might only support UTF8 terminals in the beginning.

I also think one should be able to use the original system by having different suboptions to -echo, e.g.
-echo [raw]
-echo [auto]

raw would be the original system.

@mist64
Copy link
Collaborator

mist64 commented Sep 23, 2019

I'm not sure it is useful to convert PETSCII control codes to ANSI control codes. What is the use case?

I would say the main use case for -echo is to convert between BAS and PRG. And I think the emulator should

  • convert ISO 8859-15 <-> Unicode
  • maybe PETSCII <-> Unicode. There is a good mapping [here].(https://dflund.se/~triad/krad/recode/petscii.html)
  • convert control codes (00-1F, 80-9F) <-> \xnn, i.e. the CHR($93) character will be \x93 in the ASCII file. So the BAS file can say PRINT"\x93" for example.

@mist64
Copy link
Collaborator

mist64 commented Sep 23, 2019

With this, we could add an argument (-convert) to convert between BAS and PRG. We either inject the PRG into RAM, put LIST into the keyboard buffer and save the output into a file, or put the BAS into the keyboard buffer, plus a "SAVE" command. Without the external script.

@mobluse
Copy link
Contributor Author

mobluse commented Sep 23, 2019

The use case for mapping the control codes to VT100/xterm would be to have the same output in the terminal as on the X16 screen. I believe it could be a sub option if you want to convert PETSCII control codes to VT100/xterm escape codes, or have them converted to hex (i.e. \x13, \x93, or {$13}, {$93}), or petcat (VICE 3.3) control codes (i.e. {home}, {clr} etc.) I think it might be better to use the PetCat control codes because they could be printed in type-in programs and found on the keyboard (or using a keyboard map).

The reason I put this in the emulator and not in an external filter program is because I could not get filter programs to work with x16emu -echo. I.e. this does not work in real time, but it does work if you close the emulator:
x16emu -echo | iconv --from-code=ISO8859-15 --to-code=UTF-8
x16emu -echo 2>/dev/null | petcat -c -nh
If we could get filter programs to work in real time this system could be external to the emulator.

I believe there are more PETSCII characters in Unicode now than when ~triad wrote those documents in 2002.

@mist64
Copy link
Collaborator

mist64 commented Sep 23, 2019

Piping the output of x16emu should work if you add fflush(stdout); after every printf(). I would not be opposed to such a change.

echochar.c Outdated
20 PRINT "case";I;CHR$($9D)": printf(";CHR$(34);CHR$(I);CHR$(34);"); break;"
30 NEXT
*/
case 160: printf(" "); break;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For 88 of these 96 values, the code on the left is the code on the right anyway. You have just made your text editor pre-convert the output into UTF-8. As I have stated before, for the codes $A0-$FF (except for 8 values), Unicode is the same as 8859-15, so can just pass the values into a (quite simple) UTF-8 encoder, instead of having a huge table of pre-encoded UTF-8 values.

@mobluse
Copy link
Contributor Author

mobluse commented Sep 23, 2019

I replaced the huge switch table with a small function. I still should remove Unicode characters. Now the start up screen looks OK in the terminal, but I discovered that the logo code doesn't always use reverse off, but still it works in the SDL window, and that is strange. If I do extra reverse off at line feed, as I do now, it looks OK.

It works using filters now. E.g.
./x16emu -echo | sed 's/BASIC/FORTH/g;s/READY\./OK/g'
2019-09-24-001400_657x547_scrot

@mobluse
Copy link
Contributor Author

mobluse commented Sep 24, 2019

I removed Unicode characters from code, but they are in end of line comments. I used programs to write the code from triad files.

echochar.c Outdated
case 0xDF: prtflush("?"); break;
case 255: prtflush("▒"); break;
default: prtnumflush("%c", c);
case 0x5C: prtflush("\xC2\xA3"); break; // 0xA3, £
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would be much better if it were a table that maps 8 bit PETSCII to 16 bit Unicode numbers, and you do the UTF-8 encoding separately.

(The UTF-8 in the comments is fine, and useful.)

Copy link
Contributor Author

@mobluse mobluse Sep 24, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know exactly what is the advantage with a table over a switch. A switch is more flexible and easier to maintain. I can shrink the switches further since there are still repetitions in the PETSCII. An array would contain a lot of elements that would have some special value because there is no translation for all PETSCII codes. I could change the cases in the switch so that they don't contain strings in most cases:
From:
case 0x5E: prtflush("\xE2\x86\x91"); break; // 0x2191, ↑
To:
case 0x5E: prtuptflush(0x2191); break; // ↑ upt = Unicode point

I did use int utf8_encode(char *out, uint32_t utf) to convert from Unicode point to UTF-8. I thought it would make the emulator faster if I did the conversion outside the emulator.

Maybe there are PETSCII characters outside 16 bit Unicode. A switch can handle these characters better.

@greg-king5
Copy link

I discovered that the logo code doesn't always use reverse off; but still, it works in the SDL window, and that is strange. If I do extra reverse off at line feed, as I do now, it looks OK.

That is exactly what Commodore 8-bit computers do! Their screens have three special modes: reverse color mode, quote mode (control characters are displayed instead of obeyed), and insert mode (similar to quote mode). Printing a carriage return turns off all those modes.

@mobluse
Copy link
Contributor Author

mobluse commented Sep 25, 2019

I discovered that the logo code doesn't always use reverse off; but still, it works in the SDL window, and that is strange. If I do extra reverse off at line feed, as I do now, it looks OK.

That is exactly what Commodore 8-bit computers do! Their screens have three special modes: reverse color mode, quote mode (control characters are displayed instead of obeyed), and insert mode (similar to quote mode). Printing a carriage return turns off all those modes.

Fascinating! I will remove my comments in the code that I compensate for bugs.

@mist64
Copy link
Collaborator

mist64 commented Sep 26, 2019

The following compile fix is necessary for macOS:

diff --git a/echochar.c b/echochar.c
index 71177f3..9759e0f 100644
--- a/echochar.c
+++ b/echochar.c
@@ -57,9 +57,9 @@ echochar(uint8_t c)
        } else {
                switch (mode) {
                        case 0: /* PETSCII - when no PETSCII character exists a ISO8859-15 character is shown */
-                           if ('\xC0' <= c && c <= '\xDF') {
+                         if (0xC0 <= c && c <= 0xDF) {
                                        c -= '\xC0' - '\x60';
-                           } else if ('\xE0' <= c && c <= '\xFE') {
+                         } else if (0xE0 <= c && c <= 0xFE) {
                                        c -= '\xE0' - '\xA0';
                                }
                                switch (c) {
@@ -186,7 +186,7 @@ prtuptflush(uint32_t utf)
 void
 prtflush(const char *s)
 {
-   printf(s);
+ printf("%s", s);
        fflush(stdout);
 }
 

@mist64
Copy link
Collaborator

mist64 commented Sep 26, 2019

I tried it, and I see some problems:

  • On macOS, the default Terminal background color is white, so I see white-on-white text.
  • At least on macOS, there is something wrong with the encoding. There is a "�" character before and after every READY, and in ISO mode, characters on the keys right of P and L with the SV-SE keyboard don't print right either.
  • ISO/PETSCII mode is auto-detected, but a RESET switches back to PETSCII, which is not accounted for.

Unfortunately I am unconvinced about this whole approach. :/

At least by default, I think -echo should print the characters verbatim. And the user can use an external tool to pipe the data into and convert it. iconv already does this for ISO-8859-15 to UTF-8 or whatever, and I would not be opposed to adding a C tool to the repo to do the PETSCII->UTF-8/ANSI conversion.

@mobluse
Copy link
Contributor Author

mobluse commented Sep 26, 2019

I did fixes for macOS, but I don't have it so I couldn't test them.

  • I set background color to blue now.
  • I printed the �, but I changed it to Ⓐ now, since the ROM should not be able to print control characters that doesn't exist in C64 or X16, see ROM seems to send \x0A to screen even though it's not a valid control character x16-rom#37 . Normally when an unknown control character comes I print the hexcode.
  • This reset problem could be solved in the ROM by printing $8F (i.e. \x8F) as the first character after reset.

I think -echo should have suboptions such as raw, light, and auto. light would be the current where CR is changed to LF. raw would not change anything (changing CR to LF hides errors in Asm or BASIC programs). auto would be this system in current or future echochar().

I could make this as a filter program instead, but then it would be best if raw mode of -echo existed.

@mobluse
Copy link
Contributor Author

mobluse commented Sep 27, 2019

I have made a filter program, petscii2utf8, now, but that would be helped if #135 (or similar) was accepted:
https://github.com/mobluse/x16-petscii2utf8

@mobluse
Copy link
Contributor Author

mobluse commented Sep 27, 2019

I close this as a simpler solution is possible.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants