How to _not_ use UTF-8 (connection to server with LANG="en_US.iso885915")? #1855

GitMensch · 2021-11-10T19:53:30Z

"OpenSSH for Windows" version
OpenSSH_for_Windows_8.6p1, LibreSSL 3.3.3

Server OperatingSystem
RHEL7, CentOS8 (likely doesn't matter)

Client OperatingSystem
Windows Server 2019 Datacenter

What is failing
The server and a bunch of tools installed must use iso885915 encoding, so that is set globally via LANG="en_US.iso885915.
When connecting to the server with PuTTY and configured "Remote character set" ISO-8859-15: 1999 (Latin-9, "euro") (found under Configuration->Window->Translation) then everything works fine - tools generate output in that character set, the client can display them properly and when entering data it gets in as expected.

But that isn't the case with OpenSSH :-(

Expected output
With PuTTY

> echo ä| hexdump
0000000 0ae4
0000002
> printf \\xe4\\n
ä
>

Actual output
With OpenSSH

> echo ä | hexdump
0000000 a4c3 000a
0000003
> printf \\xe4\\n

>

Is there any way with this OpenSSH version to not use UTF-8?

The text was updated successfully, but these errors were encountered:

mgkuhn · 2021-12-07T19:51:39Z

https://ss64.com/nt/chcp.html

GitMensch · 2021-12-07T21:01:39Z

😕 ... and how do you apply that to an ssh connection to a Linux host?

local client with chcp not tinkered - so matches system default -> very fine
session via Putty configured for the same to a shell environment that has LANG set appropriate -> everything fine
open the local client from the first bullet point, connecting to the sever from the second one - all inputs are UTF8 and all output of non-usascii are garbage

... @mgkuhn and your solution is?

mgkuhn · 2021-12-08T16:38:03Z

Ah sorry, I had misunderstood your scenario (too quick reading).

OpenSSH for Windows only supports UTF-8, so you need to use an encoding translation tool on Linux. There were some developed about 15-20 years ago, when Linux migrated from ISO 8859 and other 8-bit encodings to UTF-8. The main one I recall from that time is luit (Juliusz Chroboczek, ~2001). I haven't used that in more than a decade, but it still seems to work fine on Ubuntu:

$ luit -encoding ISO8859-15 bash
$ echo £ | hd
00000000  a3 0a                                             |..|
00000002

(Instead of using the -encoding option, better install a proper ISO 8859-15 locale and set e.g. LANG to that.)

So I don't think OpenSSH for Windows needs to implement anything here.

GitMensch · 2021-12-08T20:05:41Z

OpenSSH for Windows only supports UTF-8

I've guessed so - and the point was how to let OpenSSH for Windows correctly recognize the encoding/LANG from the server it connects to or the client it connects from (so just set it, then run ssh.exe).

Would it be possible for OpenSSH to check LANG and use for example luit itself?

So I don't think OpenSSH for Windows needs to implement anything here.

As you seem to have experience with that: What should I tell ssh.exe to use luit to convert its "only supports UTF8" encoding so that the client get ISO-8859-15 when OpenSSH for Windows sends an UTF-8 character?

I currently use it as follows:

actually use a terminal on the server (which wants ISO-8859-15 input and also sends this to the terminal):
ssh.exe -K %USERNAME%@%SERVER% "variant=%variant% bash -l"
--> changing that to ssh.exe -K %USERNAME%@%SERVER% "variant=%variant% luit bash -l" works like a charm, no need to specify any encoding the LANG seems to be used
executing a command on the server (which wants ISO-8859-15 input and also sends this to the terminal):
ssh.exe -K %USERNAME%@%SERVER% "/my/worker.sh %work1% %work2%"
--> I could not get that working, neither of the following worked:
- ssh.exe -K %USERNAME%@%SERVER% luit /my/worker.sh %work1% %work2%"
- ssh.exe -K %USERNAME%@%SERVER% bash -l -c "luit /my/worker.sh %work1% %work2%"
- ssh.exe -K %USERNAME%@%SERVER% luit bash -l -c '/my/worker.sh %work1% %work2%'

Any ideas / insights?
Could luit handling be "integrated" into OpenSSH for Windows?

mgkuhn · 2021-12-08T20:28:57Z

I would check (with ssh.exe -K %USERNAME%@%SERVER% locale) if in your last three examples the LANG environment variable is actually set to an ISO 8859-15 locale. If you set it only in .profile or .bashrc, that may not get executed when you provide a command that starts with luit, and you may have to set LANG in ~/.ssh/environment as well for that.

Also, you may want to play with adding ssh.exe option -t in the last three examples, to force allocation of a pseudo-TTY device (pty) on the server, such that luit still thinks it runs inside a real terminal.

mgkuhn · 2021-12-08T20:47:45Z

Could luit handling be "integrated" into OpenSSH for Windows?

Not a good idea for at least two reasons:

a) This forum is just about the Win32 port of OpenSSH. It is not good practice for the Windows porters to add new functionality here that goes beyond what is specifically only required for Windows. Adding anything else here to the port would just create an ongoing maintenance overhead, i.e. more work when merging in each new upstream release. So if you wanted to make that suggestion, you should take it upstream. I would give the suggestion little chance though, as "luit ssh" probably does already what is needed on Unix-style operating systems.

b) Good software architecture keeps things modular and avoids cramming all possible functionality into every single tool. For example, it would seem to me far neater to port luit to Windows (or write an equivalent tool), and then you can call luit.exe ssh.exe ... to do the conversion client side. You may have to look deeper into ConPTY to understand what is involved.

Or you could talk to the authors of Windows Terminal or other Windows terminal emulators and ask if one of them might be interested in adding ISO 8859 support (for "retro computing" ;-).

Keep in mind that PuTTY is both a terminal emulator and an SSH client in one program, and the multiple-character-encoding support is really more part of the terminal emulator part of PuTTY. The SSH part just transports bytes across the wire and therefore cares little about which flavour of 8-bit ASCII extension you might be using.

szaszg · 2023-10-14T19:57:00Z

Or you could talk to the authors of Windows Terminal or other Windows terminal emulators and ask if one of them might be interested in adding ISO 8859 support (for "retro computing" ;-).

Recent Windows Terminal now support a lot of 8bit character code encoding, including all iso8859 cp.

Even we change to e.g. iso8859-2 (chcp 28592) OpenSSH switch back to UTF-8 :-(

GitMensch · 2024-01-09T16:44:28Z

@tgauth Can you please check if this can make it to a backlog entry for further integration, especially as @zadjii-msft closed other entries in favor of this one?

tgauth · 2024-01-09T20:05:24Z

Recent Windows Terminal now support a lot of 8bit character code encoding, including all iso8859 cp.

Even we change to e.g. iso8859-2 (chcp 28592) OpenSSH switch back to UTF-8 :-(

Was chcp 28592 run before and after running the ssh command?

GitMensch · 2024-01-09T20:31:38Z

Was chcp 28592 run before and after running the ssh command?

That question is to be answered by @szaszg.

The original issue was connecting from Windows Server (tested with several chcp values) to a server that has LANG="en_US.iso885915 set (also before and after the ssh connection), then seeing with a redirection of an extended character to file and a hexdump on this, that the characters input are always sent as UTF-8.

mgkuhn · 2024-01-10T12:00:22Z

The OpenSSH for Windows tools currently always set their console output encoding to UTF-8, by calling SetConsoleOutputCP(CP_UTF8) in contrib/win32/win32compat/win32-utf8.c:msetlocale, which gets called unconditionally early on in ssh.c:main. As a result, any alternative code page that you had set with chcp prior to starting ssh.exe will be overridden during the SSH session.

This currently makes a lot of sense: the vast majority of SSH communication today uses UTF-8, however the two terminal emulators that Microsoft provides (cmd.exe and Windows Terminal) both currently default to CP437, for historic (MS-DOS) backwards-compatibility reasons, and since that encoding has never been used by any other platform, it is not at all a useful default for SSH users.

Once some future Windows version changes its terminal emulators to run in UTF-8 by default, the above call can hopefully be removed again. At that point, SSH could again remain agnostic of which ASCII extension is used over its connection, and it would just pass on bytes transparently between application and terminal emulator, like it always has on Unix-like systems.

There are of course always more HACKs possible that could be added to msetlocale() in the meantime. For example, one could change

	// save previous codepage
	g_previous_codepage = GetConsoleOutputCP();

	// allow console output of unicode characters
	SetConsoleOutputCP(CP_UTF8);

to something like

        // save previous codepage
	g_previous_codepage = GetConsoleOutputCP();

	// allow console output of unicode characters, unless the user
        // has already chosen (e.g. via chcp) another code page than
        // the historic default of CP437
        if (g_previous_codepage == CP_437) {
	    SetConsoleOutputCP(CP_UTF8);
	}

One would, of course, also have to look at how input is dealt with. And I don't know if there are any other code pages than CP_437 used by default in some localized versions of the platform, in which case they should be added as well. I'm not suggesting actually doing this.

GitMensch · 2024-01-10T13:41:16Z

Thank you for the time to analyze and share this issue.
I personally think "hard coded unconditional changes" are similar bad to "hard coded changes on some magic values".

Wouldn't it be possible to change the unconditional call to first check an environment variable and/or win32 specific OpenSSH setting (I don't know if there are others already) an only if nothing is set do the call to SetConsoleOutputCP()?

This would allow to:

let the user set console encoding via chcp to whatever matches the setting one has on the server (easy when connecting to a Windows Box, but also not that hard when connecting to another OS)
set the new environment variable before starting openssh / include the setting in the start or in the configuration file

and it should "just work", no?

One would, of course, also have to look at how input is dealt with.

Yes, that's an open point - but with the change suggested above someone can actually test how this works (it is not unlikely that this is already "enough").

mgkuhn · 2024-01-10T17:38:06Z

If you look yourself through the code in contrib/win32/win32compat, there are quite lot (>60) of calls to utf8_to_utf16() and utf16_to_utf8(), for interaction with lots of Win32 UTF-16 wide-character APIs. So I somewhat doubt that simply suppressing the SetConsoleOutputCP(CP_UTF8) call will magically do what you want, as conversion to and from UTF-8 is currently hardwired in at many places.

Many of these UTF-16 wide-character APIs will also have 8-bit equivalents, but (not being a seasoned Win32 developer myself) I have no idea what fraction of these could deal with a multi-byte encoding like UTF-8. So I assume there may be good reasons for why OpenSSH for Windows currently does a lot of character encoding conversion itself, as opposed to just passing on UTF-8 (or whatever other 8-bit encoding you want) to a multi-byte API.

eabase mentioned this issue Jan 13, 2022

nanorc not found in elevated shell lhmouse/nano-win#34

Open

konemsnq mentioned this issue Mar 15, 2022

Support of EUC-JP code page setting microsoft/terminal#12679

Closed

lhecker mentioned this issue May 16, 2024

Is there any convenient way to modify the terminal's encoding? microsoft/terminal#17273

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to _not_ use UTF-8 (connection to server with LANG="en_US.iso885915")? #1855

How to _not_ use UTF-8 (connection to server with LANG="en_US.iso885915")? #1855

GitMensch commented Nov 10, 2021

mgkuhn commented Dec 7, 2021

GitMensch commented Dec 7, 2021

mgkuhn commented Dec 8, 2021

GitMensch commented Dec 8, 2021

mgkuhn commented Dec 8, 2021

mgkuhn commented Dec 8, 2021

szaszg commented Oct 14, 2023 •

edited

Loading

GitMensch commented Jan 9, 2024 •

edited

Loading

tgauth commented Jan 9, 2024

GitMensch commented Jan 9, 2024 •

edited

Loading

mgkuhn commented Jan 10, 2024

GitMensch commented Jan 10, 2024

mgkuhn commented Jan 10, 2024 •

edited

Loading

How to _not_ use UTF-8 (connection to server with LANG="en_US.iso885915")? #1855

How to _not_ use UTF-8 (connection to server with LANG="en_US.iso885915")? #1855

Comments

GitMensch commented Nov 10, 2021

mgkuhn commented Dec 7, 2021

GitMensch commented Dec 7, 2021

mgkuhn commented Dec 8, 2021

GitMensch commented Dec 8, 2021

mgkuhn commented Dec 8, 2021

mgkuhn commented Dec 8, 2021

szaszg commented Oct 14, 2023 • edited Loading

GitMensch commented Jan 9, 2024 • edited Loading

tgauth commented Jan 9, 2024

GitMensch commented Jan 9, 2024 • edited Loading

mgkuhn commented Jan 10, 2024

GitMensch commented Jan 10, 2024

mgkuhn commented Jan 10, 2024 • edited Loading

szaszg commented Oct 14, 2023 •

edited

Loading

GitMensch commented Jan 9, 2024 •

edited

Loading

GitMensch commented Jan 9, 2024 •

edited

Loading

mgkuhn commented Jan 10, 2024 •

edited

Loading