Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to _not_ use UTF-8 (connection to server with LANG="en_US.iso885915")? #1855

Open
GitMensch opened this issue Nov 10, 2021 · 13 comments
Open

Comments

@GitMensch
Copy link

"OpenSSH for Windows" version
OpenSSH_for_Windows_8.6p1, LibreSSL 3.3.3

Server OperatingSystem
RHEL7, CentOS8 (likely doesn't matter)

Client OperatingSystem
Windows Server 2019 Datacenter

What is failing
The server and a bunch of tools installed must use iso885915 encoding, so that is set globally via LANG="en_US.iso885915.
When connecting to the server with PuTTY and configured "Remote character set" ISO-8859-15: 1999 (Latin-9, "euro") (found under Configuration->Window->Translation) then everything works fine - tools generate output in that character set, the client can display them properly and when entering data it gets in as expected.

But that isn't the case with OpenSSH :-(

Expected output
With PuTTY

> echo ä| hexdump
0000000 0ae4
0000002
> printf \\xe4\\n
ä
>

Actual output
With OpenSSH

> echo ä | hexdump
0000000 a4c3 000a
0000003
> printf \\xe4\\n

>

Is there any way with this OpenSSH version to not use UTF-8?

@mgkuhn
Copy link

mgkuhn commented Dec 7, 2021

@GitMensch
Copy link
Author

😕 ... and how do you apply that to an ssh connection to a Linux host?

  • local client with chcp not tinkered - so matches system default -> very fine
  • session via Putty configured for the same to a shell environment that has LANG set appropriate -> everything fine
  • open the local client from the first bullet point, connecting to the sever from the second one - all inputs are UTF8 and all output of non-usascii are garbage

... @mgkuhn and your solution is?

@mgkuhn
Copy link

mgkuhn commented Dec 8, 2021

Ah sorry, I had misunderstood your scenario (too quick reading).

OpenSSH for Windows only supports UTF-8, so you need to use an encoding translation tool on Linux. There were some developed about 15-20 years ago, when Linux migrated from ISO 8859 and other 8-bit encodings to UTF-8. The main one I recall from that time is luit (Juliusz Chroboczek, ~2001). I haven't used that in more than a decade, but it still seems to work fine on Ubuntu:

$ luit -encoding ISO8859-15 bash
$ echo £ | hd
00000000  a3 0a                                             |..|
00000002

(Instead of using the -encoding option, better install a proper ISO 8859-15 locale and set e.g. LANG to that.)

So I don't think OpenSSH for Windows needs to implement anything here.

@GitMensch
Copy link
Author

OpenSSH for Windows only supports UTF-8

I've guessed so - and the point was how to let OpenSSH for Windows correctly recognize the encoding/LANG from the server it connects to or the client it connects from (so just set it, then run ssh.exe).

Would it be possible for OpenSSH to check LANG and use for example luit itself?

So I don't think OpenSSH for Windows needs to implement anything here.

As you seem to have experience with that: What should I tell ssh.exe to use luit to convert its "only supports UTF8" encoding so that the client get ISO-8859-15 when OpenSSH for Windows sends an UTF-8 character?

I currently use it as follows:

  1. actually use a terminal on the server (which wants ISO-8859-15 input and also sends this to the terminal):
    ssh.exe -K %USERNAME%@%SERVER% "variant=%variant% bash -l"
    --> changing that to ssh.exe -K %USERNAME%@%SERVER% "variant=%variant% luit bash -l" works like a charm, no need to specify any encoding the LANG seems to be used

  2. executing a command on the server (which wants ISO-8859-15 input and also sends this to the terminal):
    ssh.exe -K %USERNAME%@%SERVER% "/my/worker.sh %work1% %work2%"
    --> I could not get that working, neither of the following worked:

    • ssh.exe -K %USERNAME%@%SERVER% luit /my/worker.sh %work1% %work2%"
    • ssh.exe -K %USERNAME%@%SERVER% bash -l -c "luit /my/worker.sh %work1% %work2%"
    • ssh.exe -K %USERNAME%@%SERVER% luit bash -l -c '/my/worker.sh %work1% %work2%'

Any ideas / insights?
Could luit handling be "integrated" into OpenSSH for Windows?

@mgkuhn
Copy link

mgkuhn commented Dec 8, 2021

I would check (with ssh.exe -K %USERNAME%@%SERVER% locale) if in your last three examples the LANG environment variable is actually set to an ISO 8859-15 locale. If you set it only in .profile or .bashrc, that may not get executed when you provide a command that starts with luit, and you may have to set LANG in ~/.ssh/environment as well for that.

Also, you may want to play with adding ssh.exe option -t in the last three examples, to force allocation of a pseudo-TTY device (pty) on the server, such that luit still thinks it runs inside a real terminal.

@mgkuhn
Copy link

mgkuhn commented Dec 8, 2021

Could luit handling be "integrated" into OpenSSH for Windows?

Not a good idea for at least two reasons:

a) This forum is just about the Win32 port of OpenSSH. It is not good practice for the Windows porters to add new functionality here that goes beyond what is specifically only required for Windows. Adding anything else here to the port would just create an ongoing maintenance overhead, i.e. more work when merging in each new upstream release. So if you wanted to make that suggestion, you should take it upstream. I would give the suggestion little chance though, as "luit ssh" probably does already what is needed on Unix-style operating systems.

b) Good software architecture keeps things modular and avoids cramming all possible functionality into every single tool. For example, it would seem to me far neater to port luit to Windows (or write an equivalent tool), and then you can call luit.exe ssh.exe ... to do the conversion client side. You may have to look deeper into ConPTY to understand what is involved.

Or you could talk to the authors of Windows Terminal or other Windows terminal emulators and ask if one of them might be interested in adding ISO 8859 support (for "retro computing" ;-).

Keep in mind that PuTTY is both a terminal emulator and an SSH client in one program, and the multiple-character-encoding support is really more part of the terminal emulator part of PuTTY. The SSH part just transports bytes across the wire and therefore cares little about which flavour of 8-bit ASCII extension you might be using.

@szaszg
Copy link

szaszg commented Oct 14, 2023

Or you could talk to the authors of Windows Terminal or other Windows terminal emulators and ask if one of them might be interested in adding ISO 8859 support (for "retro computing" ;-).

Recent Windows Terminal now support a lot of 8bit character code encoding, including all iso8859 cp.

Even we change to e.g. iso8859-2 (chcp 28592) OpenSSH switch back to UTF-8 :-(

@GitMensch
Copy link
Author

GitMensch commented Jan 9, 2024

@tgauth Can you please check if this can make it to a backlog entry for further integration, especially as @zadjii-msft closed other entries in favor of this one?

@tgauth
Copy link
Collaborator

tgauth commented Jan 9, 2024

Recent Windows Terminal now support a lot of 8bit character code encoding, including all iso8859 cp.

Even we change to e.g. iso8859-2 (chcp 28592) OpenSSH switch back to UTF-8 :-(

Was chcp 28592 run before and after running the ssh command?

@GitMensch
Copy link
Author

GitMensch commented Jan 9, 2024

Was chcp 28592 run before and after running the ssh command?

That question is to be answered by @szaszg.

The original issue was connecting from Windows Server (tested with several chcp values) to a server that has LANG="en_US.iso885915 set (also before and after the ssh connection), then seeing with a redirection of an extended character to file and a hexdump on this, that the characters input are always sent as UTF-8.

@mgkuhn
Copy link

mgkuhn commented Jan 10, 2024

The OpenSSH for Windows tools currently always set their console output encoding to UTF-8, by calling SetConsoleOutputCP(CP_UTF8) in contrib/win32/win32compat/win32-utf8.c:msetlocale, which gets called unconditionally early on in ssh.c:main. As a result, any alternative code page that you had set with chcp prior to starting ssh.exe will be overridden during the SSH session.

This currently makes a lot of sense: the vast majority of SSH communication today uses UTF-8, however the two terminal emulators that Microsoft provides (cmd.exe and Windows Terminal) both currently default to CP437, for historic (MS-DOS) backwards-compatibility reasons, and since that encoding has never been used by any other platform, it is not at all a useful default for SSH users.

Once some future Windows version changes its terminal emulators to run in UTF-8 by default, the above call can hopefully be removed again. At that point, SSH could again remain agnostic of which ASCII extension is used over its connection, and it would just pass on bytes transparently between application and terminal emulator, like it always has on Unix-like systems.

There are of course always more HACKs possible that could be added to msetlocale() in the meantime. For example, one could change

	// save previous codepage
	g_previous_codepage = GetConsoleOutputCP();

	// allow console output of unicode characters
	SetConsoleOutputCP(CP_UTF8);

to something like

        // save previous codepage
	g_previous_codepage = GetConsoleOutputCP();

	// allow console output of unicode characters, unless the user
        // has already chosen (e.g. via chcp) another code page than
        // the historic default of CP437
        if (g_previous_codepage == CP_437) {
	    SetConsoleOutputCP(CP_UTF8);
	}

One would, of course, also have to look at how input is dealt with. And I don't know if there are any other code pages than CP_437 used by default in some localized versions of the platform, in which case they should be added as well. I'm not suggesting actually doing this.

@GitMensch
Copy link
Author

Thank you for the time to analyze and share this issue.
I personally think "hard coded unconditional changes" are similar bad to "hard coded changes on some magic values".

Wouldn't it be possible to change the unconditional call to first check an environment variable and/or win32 specific OpenSSH setting (I don't know if there are others already) an only if nothing is set do the call to SetConsoleOutputCP()?

This would allow to:

  • let the user set console encoding via chcp to whatever matches the setting one has on the server (easy when connecting to a Windows Box, but also not that hard when connecting to another OS)
  • set the new environment variable before starting openssh / include the setting in the start or in the configuration file

and it should "just work", no?

One would, of course, also have to look at how input is dealt with.

Yes, that's an open point - but with the change suggested above someone can actually test how this works (it is not unlikely that this is already "enough").

@mgkuhn
Copy link

mgkuhn commented Jan 10, 2024

If you look yourself through the code in contrib/win32/win32compat, there are quite lot (>60) of calls to utf8_to_utf16() and utf16_to_utf8(), for interaction with lots of Win32 UTF-16 wide-character APIs. So I somewhat doubt that simply suppressing the SetConsoleOutputCP(CP_UTF8) call will magically do what you want, as conversion to and from UTF-8 is currently hardwired in at many places.

Many of these UTF-16 wide-character APIs will also have 8-bit equivalents, but (not being a seasoned Win32 developer myself) I have no idea what fraction of these could deal with a multi-byte encoding like UTF-8. So I assume there may be good reasons for why OpenSSH for Windows currently does a lot of character encoding conversion itself, as opposed to just passing on UTF-8 (or whatever other 8-bit encoding you want) to a multi-byte API.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants