Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

archlinux: Unhandled exception 'Invalid locale for LC_CTYPE' #912

Closed
rsteube opened this issue Mar 22, 2021 · 16 comments
Closed

archlinux: Unhandled exception 'Invalid locale for LC_CTYPE' #912

rsteube opened this issue Mar 22, 2021 · 16 comments

Comments

@rsteube
Copy link

rsteube commented Mar 22, 2021

Error occurs on archlinux/manjaro but works on ubuntu.

FROM archlinux

RUN pacman -Sy --noconfirm base-devel

ARG version=0.8.8
RUN curl https://www.oilshell.org/download/oil-${version}.tar.gz | tar -xvz \
 && cd oil-*/ \
 && ./configure \
 && make \
 && ./install

RUN curl -L "https://github.com/rsteube/carapace/releases/download/v0.4.2/example_0.4.2_Linux_x86_64.tar.gz" | tar --directory /usr/local/bin -xvz

RUN mkdir -p ~/.config/oil \
 && echo 'source <(example _carapace)' >> ~/.config/oil/oshrc

CMD [ "osh" ]
osh$ _example_completion
Traceback (most recent call last):
  File "/home/andy/git/oilshell/oil/bin/oil.py", line 287, in _cpython_main_hook
  File "/home/andy/git/oilshell/oil/bin/oil.py", line 260, in main
  File "/home/andy/git/oilshell/oil/bin/oil.py", line 220, in AppBundleMain
  File "/home/andy/git/oilshell/oil/core/shell.py", line 601, in Main
  File "/home/andy/git/oilshell/oil/core/main_loop.py", line 124, in Interactive
  File "/home/andy/git/oilshell/oil/osh/cmd_eval.py", line 1476, in ExecuteAndCatch
  File "/home/andy/git/oilshell/oil/osh/cmd_eval.py", line 1323, in _Execute
  File "/home/andy/git/oilshell/oil/osh/cmd_eval.py", line 636, in _Dispatch
  File "/home/andy/git/oilshell/oil/osh/cmd_eval.py", line 477, in _RunSimpleCommand
  File "/home/andy/git/oilshell/oil/core/executor.py", line 224, in RunSimpleCommand
  File "/home/andy/git/oilshell/oil/osh/cmd_eval.py", line 1621, in RunProc
  File "/home/andy/git/oilshell/oil/osh/cmd_eval.py", line 1323, in _Execute
  File "/home/andy/git/oilshell/oil/osh/cmd_eval.py", line 983, in _Dispatch
  File "/home/andy/git/oilshell/oil/osh/cmd_eval.py", line 1379, in _ExecuteList
  File "/home/andy/git/oilshell/oil/osh/cmd_eval.py", line 1323, in _Execute
  File "/home/andy/git/oilshell/oil/osh/cmd_eval.py", line 996, in _Dispatch
  File "/home/andy/git/oilshell/oil/osh/cmd_eval.py", line 1323, in _Execute
  File "/home/andy/git/oilshell/oil/osh/cmd_eval.py", line 684, in _Dispatch
  File "/home/andy/git/oilshell/oil/osh/sh_expr_eval.py", line 935, in EvalB
SystemError: Invalid locale for LC_CTYPE
FATAL: couldn't import from app bundle '/usr/local/bin/oil.ovm' (1)
Stripping the oil.ovm binary may cause this error.
See https://github.com/oilshell/oil/issues/47
@andychu
Copy link
Contributor

andychu commented Mar 22, 2021

Thanks for the report, this is similar to #868 which has stymied us.

Thread here:

https://oilshell.zulipchat.com/#narrow/stream/266977-shell-gui/topic/dev.20build.20locales.20issue.20(Lobste.2Ers.20followup)

Can you run locale on this machine and paste the output?

Oil is trying to use C.UTF-8, but sometimes that doesn't work.

https://github.com/oilshell/oil/blob/master/native/libc.c#L25

That was motivated by Debian!

99647b5

We used to use en_US.UTF-8, but it didn't work on Debian. I don't actually understand the difference :-(

Possible solution: try BOTH, but that seems hackish... It would be nice to get to the bottom of this

@andychu
Copy link
Contributor

andychu commented Mar 22, 2021

andychu pushed a commit that referenced this issue Mar 22, 2021
@andychu
Copy link
Contributor

andychu commented Mar 22, 2021

OK I added some comments in native/libc.c.

What I think is going on: UTF-8 support in libc is NOT required by POSIX? man locale on my system says that C/POSIX, where everything is a byte string, is the "portable locale".

And some systems do it differently? They use either C.UTF-8 or en_US.UTF-8.

Possible solution:

  • Try C.UTF-8 first. If it's not available, print a warning at startup
  • However none of the string functions like regex or glob matching will fail with the SystemError. They should just fall back to C instead of UTF-8. That's what you get for not having the C.UTF-8 locale on the system?

I'm not sure if that is the best behavior, but the warning will at least elicit complaints from people and we can see if there is a different solution for them. It's better than the crash.

The other solution is to try both, but it seems weird to try en_US.UTF_8 and not other languages...

@abathur also hit the dev build version of this

@rsteube
Copy link
Author

rsteube commented Mar 22, 2021

osh$ locale
LANG=en_US.UTF-8
LC_CTYPE="en_US.UTF-8"
LC_NUMERIC="en_US.UTF-8"
LC_TIME="en_US.UTF-8"
LC_COLLATE="en_US.UTF-8"
LC_MONETARY="en_US.UTF-8"
LC_MESSAGES="en_US.UTF-8"
LC_PAPER="en_US.UTF-8"
LC_NAME="en_US.UTF-8"
LC_ADDRESS="en_US.UTF-8"
LC_TELEPHONE="en_US.UTF-8"
LC_MEASUREMENT="en_US.UTF-8"
LC_IDENTIFICATION="en_US.UTF-8"

btw. when i set the values in the ubuntu container to mine it didn't break it.

@andychu
Copy link
Contributor

andychu commented Mar 22, 2021

Oh also locale -a , which shows what the system has installed.

Actually here's some better logic: if the current locale already contains UTF-8 in it, then don't touch it? (And also utf8 ? )

But if it doesn't, AND we can't set it to C.UTF-8, then print a warning at startup?

$ locale -a
C
C.UTF-8
en_AG
en_AG.utf8
en_AU.utf8
en_BW.utf8
en_CA.utf8
en_DK.utf8
en_GB.utf8
en_HK.utf8
en_IE.utf8
en_IN
en_IN.utf8
en_NG
en_NG.utf8
en_NZ.utf8
en_PH.utf8
en_SG.utf8
en_US.utf8
en_ZA.utf8
en_ZM
en_ZM.utf8
en_ZW.utf8
POSIX

@rsteube
Copy link
Author

rsteube commented Mar 22, 2021

osh$ locale -a
C
en_US.utf8
POSIX

@rsteube
Copy link
Author

rsteube commented Mar 22, 2021

ubuntu container:

root@dbc8e72c24b0:/# locale
LANG=
LANGUAGE=
LC_CTYPE="POSIX"
LC_NUMERIC="POSIX"
LC_TIME="POSIX"
LC_COLLATE="POSIX"
LC_MONETARY="POSIX"
LC_MESSAGES="POSIX"
LC_PAPER="POSIX"
LC_NAME="POSIX"
LC_ADDRESS="POSIX"
LC_TELEPHONE="POSIX"
LC_MEASUREMENT="POSIX"
LC_IDENTIFICATION="POSIX"
LC_ALL=
root@dbc8e72c24b0:/# locale -a
C
C.UTF-8
POSIX

@andychu
Copy link
Contributor

andychu commented Mar 22, 2021

OK thanks, that's what I suspected. I think the logic I outlined will work on both Arch and Debian then.

I have no idea why the C locale scheme is so weird. The whole thing is fundamentally broken in a networked world, because you can obviously have filenames encoded differently on a single file system, but the encoding is a global variable! Same issue if you pipe "curl" to "sh" or something -- the global is fundamentally wrong; it depends on what's at the other end of the pipe

@andychu
Copy link
Contributor

andychu commented Mar 22, 2021

Actually I realized that a core reason for this bug is CPython, which has its own scheme for unicode ... Hopefully there is some way to prevent the Python interpreter from calling setlocale() AT ALL, so that Oil can manage or not.

(This won't be a problem with oil-native, but I want to keep both builds working for the time being)

andy@lisa:~/git/oilshell/oil$ ltrace -e setlocale python2 -c 'print(1)'
python2->setlocale(LC_CTYPE, nil)                                                                                                                  = "C"
python2->setlocale(LC_CTYPE, "")                                                                                                                   = "en_US.UTF-8"
python2->setlocale(LC_CTYPE, "C")                                                                                                                  = "C"
1
+++ exited (status 0) +++
andy@lisa:~/git/oilshell/oil$ ltrace -e setlocale python3 -c 'print(1)'
python3->setlocale(LC_ALL, nil)                                                                                                                    = "C"
python3->setlocale(LC_ALL, "")                                                                                                                     = "en_US.UTF-8"
python3->setlocale(LC_CTYPE, nil)                                                                                                                  = "en_US.UTF-8"
python3->setlocale(LC_ALL, "C")                                                                                                                    = "C"
python3->setlocale(LC_CTYPE, "")                                                                                                                   = "en_US.UTF-8"
python3->setlocale(LC_CTYPE, nil)                                                                                                                  = "en_US.UTF-8"
1
+++ exited (status 0) +++

Reading over Py_InitializeEx() in pythonrun.c, I think setting PYTHONIOENCODING does this ... does it have other side effects though?

https://docs.python.org/3/using/cmdline.html#envvar-PYTHONIOENCODING

@andychu
Copy link
Contributor

andychu commented Mar 22, 2021

Gah, it seems impossible to avoid Python calling setlocale() (without patching the code). By default, Py_FileSystemEncoding is NULL, which means that Py_InitializeEx() will call setlocale() ...

https://github.com/oilshell/oil/blob/master/Python-2.7.13/Python/bltinmodule.c#L25

This is lame .. ! global variables everywhere, and libraries fighting over them

andychu pushed a commit that referenced this issue Mar 23, 2021
CPython DOES call it, so there's a workaround in the dev build only:
libc.cpython_reset_locale().

- Remove explicit setlocale() calls from native/libc.c
- Remove setlocale() calls when OVM_MAIN is defined
- Add the cpython_reset_locale() hack for the remaining case
- Refactor OVM_MAIN check into into pyutil.IsAppBundle()

Spec tests:

- spec/glob: Test started passing!  Woohoo we're more consistent.
- spec/oil-regex: Documented that we need LANG=C support.  This is issue
  #529.

Addresses issue #912.
@andychu
Copy link
Contributor

andychu commented Mar 23, 2021

OK I did a big overhaul of this. I worked around the REAL Issue, which is that CPython calls setlocale().

Conceptually, Oil isn't tied to CPython, so that side effect is wrong. Now Oil never calls setlocale() except in order to "right that wrong".

Related to #529

@andychu andychu changed the title archlinux: Invalid locale for LC_CTYPE archlinux: Unhandled exception 'Invalid locale for LC_CTYPE' Mar 23, 2021
@andychu
Copy link
Contributor

andychu commented Apr 13, 2021

Thanks for the report!

http://www.oilshell.org/blog/2021/04/release-0.8.9.html

@andychu andychu closed this as completed Apr 13, 2021
@rsteube
Copy link
Author

rsteube commented Apr 13, 2021

Looking good. Encountered a different error on a specific completion (some drawing errors), but generally it works.

@andychu
Copy link
Contributor

andychu commented Apr 13, 2021

This is the main drawing bug I know of, and is somewhat fundamental:

#795

(Oil tries to do better than bash, but it still uses GNU readline. I think GNU readline can't really handle it.)

If you notice anything that doesn't seem related to that, let me know

We might have to go back to bash-style completion UI, and then defer fancier UI to others #738

@rsteube
Copy link
Author

rsteube commented Apr 13, 2021

Ah yes, i forgot to start it with --completion-display minimal (which works).

@andychu
Copy link
Contributor

andychu commented Apr 13, 2021

OK great, yeah unfortunately we might have to make that the default :-/ I think the drawing stuff is fundamental due to using readline

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants