Remove use of codecvt in PAL #44098

ThadHouse · 2020-10-31T02:31:10Z

codecvt was deprecated as of C++ 17, and likely should be removed for future compatibility.

In Windows, its included only in pal.windows.cpp, however it doesn't actually seem to be used anywhere.

In Unix, its included only in pal.linux.cpp, and is used to convert a utf8 string to utf16 string, which there for sure is another way to do this.

I noticed this, because this was added in April, and now causes the armel non tizen build to fail, as it does not have access to codecvt.

Dotnet-GitSync-Bot · 2020-10-31T02:31:16Z

I couldn't figure out the best area label to add to this issue. If you have write-permissions please help me learn by adding exactly one area label.

ghost · 2020-10-31T02:53:11Z

Tagging subscribers to this area: @vitek-karas, @agocke
See info in area-owners.md if you want to be subscribed.

am11 · 2020-11-02T22:59:58Z

There is only one usage of codecvt in host for bundle_probe. On Unix, we convert utf8 to utf16 in runtime before invoking the bundle probe; then convert it back to utf8 in host. On Windows, we use utf16 end-to-end for this scenario.

I think there are two approaches to solve this:

flip bundle_probe contract from utf16 to utf8 parameter, and use a well-known MultiByteToWideChar win32 conversion function in pal.windows.cpp. The runtime will invoke bundle_probe with utf8 argument char*, instead of char16_t*.
add a build-time fallback using c16rtomb in pal.unix.cpp based on cmake introspection results: HAVE_CODECVT, HAVE_CUCHAR.

I tried #2, but it does not convert all characters on Linux; I think it relies on the locales installed on the build machine, unlike codecvt which is self-sufficient, when available. The reason for cmake introspection was that the header <cuchar> (which provides c16rtomb) is not available on macOS.

Therefore, while #1 might be riskier; as host might no longer be backwards compatible for some custom scenario which is using bundle_probe, but more reliable solution.. not sure what other risks it might entail.

janvorli · 2020-11-03T13:01:05Z

Since it is used just to convert between utf-8 and utf-16, we could possibly use the code we have in the coreclr PAL - the UnicodeToUTF8 and UTF8ToUnicode methods.

janvorli · 2020-11-03T13:03:32Z

However, it would not be nice to share as the code lives in different locations and it is heavily used by the coreclr PAL (via the MultiByteToWideChar and WideCharToMultiByte).
The option #1 sounds reasonable to me.

vitek-karas · 2020-11-03T13:32:32Z

I'm hesitant about #1 - it's changing the interface between runtime and hostpolicy. While we can make the change (the two components are shipped and versioned together), we typically try to avoid making breaking changes to the interface - just to be safe.

I'm no expert on Linux so a "stupid" question: Linux doesn't have a UTF16->UTF8 conversion function available?

If there's no better option #1 would work..

janvorli · 2020-11-03T13:43:49Z

Hmm, I didn't know the UTF16 is a public contract. Too bad we didn't catch this when that was created, as UTF16 is kind of a Windowsism that we were trying to avoid on Unix (that's why the coreclr hosting APIs all use UTF-8).
As for conversion function on Linux, there is an iconv library available for Linux that can do all kinds of charset conversions. But I'd prefer not to add a dependency on a library if we don't have to.

vitek-karas · 2020-11-03T13:47:53Z

I was just trying to push back to make sure we don't make the interface change lightly. I do agree that it should have been UTF8 from the beginning (my bad as well).

I think it should be OK to change the interface (as I said, those two components are shipped together and AFAIK there's no "extension" point - as in nobody implements its own hostpolicy, nor I've heard about anybody using the bundle probe when directly hosting coreclr.dll)

Dotnet-GitSync-Bot added the untriaged New issue has not been triaged by the area owner label Oct 31, 2020

jkotas added the area-Host label Oct 31, 2020

This was referenced Oct 31, 2020

Fix armel jessie rootfs atomic error #44100

Closed

Fix armel jessie rootfs atomic error dotnet/arcade#6502

Merged

janvorli closed this as completed Nov 3, 2020

janvorli reopened this Nov 3, 2020

am11 mentioned this issue Nov 10, 2020

Use multibyte character path for bundle_probe #44466

Merged

janvorli closed this as completed in #44466 Nov 19, 2020

ghost locked as resolved and limited conversation to collaborators Dec 19, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Remove use of codecvt in PAL #44098

Remove use of codecvt in PAL #44098

ThadHouse commented Oct 31, 2020

Dotnet-GitSync-Bot commented Oct 31, 2020

ghost commented Oct 31, 2020

am11 commented Nov 2, 2020

janvorli commented Nov 3, 2020

janvorli commented Nov 3, 2020

vitek-karas commented Nov 3, 2020

janvorli commented Nov 3, 2020

vitek-karas commented Nov 3, 2020

Remove use of codecvt in PAL #44098

Remove use of codecvt in PAL #44098

Comments

ThadHouse commented Oct 31, 2020

Dotnet-GitSync-Bot commented Oct 31, 2020

ghost commented Oct 31, 2020

am11 commented Nov 2, 2020

janvorli commented Nov 3, 2020

janvorli commented Nov 3, 2020

vitek-karas commented Nov 3, 2020

janvorli commented Nov 3, 2020

vitek-karas commented Nov 3, 2020