Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Generating lockfiles fails with: unknown error (_ssl.c:3161) #20467

Closed
mjimlittle opened this issue Jan 28, 2024 · 19 comments · Fixed by #20502
Closed

Generating lockfiles fails with: unknown error (_ssl.c:3161) #20467

mjimlittle opened this issue Jan 28, 2024 · 19 comments · Fixed by #20502
Labels

Comments

@mjimlittle
Copy link

Describe the bug
When trying to generate lockfiles command fails with the following error: Failed to spawn a job for /home/manos/Workspace/pants-repo/.conda/bin/python3.9: unknown error (_ssl.c:3161)

pants --print-stacktrace -ldebug generate-lockfiles ::
18:15:17.57 [INFO] Initialization options changed: reinitializing scheduler...
18:15:22.39 [INFO] Scheduler initialized.
18:15:23.84 [INFO] Completed: Generate lockfile for python-default
18:15:23.84 [ERROR] 1 Exception encountered:

Engine traceback:
  in select
    ..
  in pants.core.goals.generate_lockfiles.generate_lockfiles_goal
    `generate-lockfiles` goal

Traceback (most recent call last):
  File "/home/manos/.cache/nce/3d6643e46b53e4cc0b2a0d5c768866226ddce3de1f57f80c4a02d8d39800fa8e/bindings/venvs/2.18.0/lib/python3.9/site-packages/pants/engine/internals/selectors.py", line 626, in native_engine_generator_send
    res = rule.send(arg) if err is None else rule.throw(throw or err)
  File "/home/manos/.cache/nce/3d6643e46b53e4cc0b2a0d5c768866226ddce3de1f57f80c4a02d8d39800fa8e/bindings/venvs/2.18.0/lib/python3.9/site-packages/pants/core/goals/generate_lockfiles.py", line 557, in generate_lockfiles_goal
    results = await MultiGet(
  File "/home/manos/.cache/nce/3d6643e46b53e4cc0b2a0d5c768866226ddce3de1f57f80c4a02d8d39800fa8e/bindings/venvs/2.18.0/lib/python3.9/site-packages/pants/engine/internals/selectors.py", line 361, in MultiGet
    return await _MultiGet(tuple(__arg0))
  File "/home/manos/.cache/nce/3d6643e46b53e4cc0b2a0d5c768866226ddce3de1f57f80c4a02d8d39800fa8e/bindings/venvs/2.18.0/lib/python3.9/site-packages/pants/engine/internals/selectors.py", line 168, in __await__
    result = yield self.gets
  File "/home/manos/.cache/nce/3d6643e46b53e4cc0b2a0d5c768866226ddce3de1f57f80c4a02d8d39800fa8e/bindings/venvs/2.18.0/lib/python3.9/site-packages/pants/engine/internals/selectors.py", line 626, in native_engine_generator_send
    res = rule.send(arg) if err is None else rule.throw(throw or err)
  File "/home/manos/.cache/nce/3d6643e46b53e4cc0b2a0d5c768866226ddce3de1f57f80c4a02d8d39800fa8e/bindings/venvs/2.18.0/lib/python3.9/site-packages/pants/backend/python/goals/lockfile.py", line 110, in generate_lockfile
    result = await Get(
  File "/home/manos/.cache/nce/3d6643e46b53e4cc0b2a0d5c768866226ddce3de1f57f80c4a02d8d39800fa8e/bindings/venvs/2.18.0/lib/python3.9/site-packages/pants/engine/internals/selectors.py", line 118, in __await__
    result = yield self
pants.engine.process.ProcessExecutionFailure: Process 'Generate lockfile for python-default' failed with exit code 1.
stdout:

stderr:
Failed to spawn a job for /home/manos/Workspace/pants-repo/.conda/bin/python3.9: unknown error (_ssl.c:3161)



Use `--keep-sandboxes=on_failure` to preserve the process chroot for inspection.

Pants version
Tested with versions:

  • 2.16.0
  • 2.17.0
  • 2.18.0
  • 2.18.2
  • 2.19.0rc5

(same result for all tested versions)

OS
Tested with

  • Fedora 38 (Linux 6.6.13-100.fc38.x86_64)
  • Fedora 39 (Linux 6.6.13-200.fc39.x86_64)

(same result for all tested versions)

Additional info
I think this issue started happening after a kernel update from Fedora. Has anyone else run into this issue before?
Any suggestions on how to resolve this would be very appreciated!

@mjimlittle mjimlittle added the bug label Jan 28, 2024
@jsirois
Copy link
Contributor

jsirois commented Jan 28, 2024

The source is free to read. My reading says the underlying SSL lib CPython is linking against is not supported (you're probably on the right track): https://github.com/python/cpython/blob/8fc8c45b6717be58ad927def1bf3ea05c83cab8c/Modules/_ssl.c#L3161

I'd ldd /home/manos/Workspace/pants-repo/.conda/bin/python3.9 to see the linkage and work from there. This is a much lower level issue than Pants and it would be good to cut Pants out of the debugging.

@jsirois jsirois added question and removed bug labels Jan 28, 2024
@xlevus
Copy link
Contributor

xlevus commented Jan 28, 2024

I'm experiencing the same issues, Also using Fedora. I'm only ever able to replicate the issue when using what's in the sandbox.

I can't:

  • Get any helpful error messages
  • Get the python3.9 urllib from the scie-pants venv to error the same way alone.

Using distrobox to try and run in older versions of fedora seems to have the same issue, but it doesn't seem to be entirely isolating things from the rest of the system.

@jsirois
Copy link
Contributor

jsirois commented Jan 28, 2024

The sandbox blanks out env vars and that can be important. @xlevus can you ldd and investigate your env vs sandbox env to help isolate if this is an LD_LIBRARY_PATH or other env var required but blocked by Pants issue? The whole scie-pants thing is almost certainly way off track. If Pants launches at all, scie-pants is long out of the picture entirely.

@xlevus
Copy link
Contributor

xlevus commented Jan 28, 2024

📦[xlevus@pants-debug2 gymkhana]$ pants --keep-sandboxes=on_failure generate-lockfiles ::
10:54:06.16 [INFO] Preserving local process execution dir /tmp/pants-sandbox-ouAMHY for Generate lockfile for python-default
10:54:06.16 [INFO] Completed: Generate lockfile for python-default
10:54:06.16 [ERROR] 1 Exception encountered:

Engine traceback:
  in `generate-lockfiles` goal

ProcessExecutionFailure: Process 'Generate lockfile for python-default' failed with exit code 1.
stdout:

stderr:
Failed to spawn a job for /usr/bin/python3.10: unknown error (_ssl.c:3161)


📦[xlevus@pants-debug2 gymkhana]$ ldd /usr/bin/python3.10
	linux-vdso.so.1 (0x00007ffeaefee000)
	libpython3.10.so.1.0 => /lib64/libpython3.10.so.1.0 (0x00007fc5a9b64000)
	libc.so.6 => /lib64/libc.so.6 (0x00007fc5a9987000)
	libm.so.6 => /lib64/libm.so.6 (0x00007fc5a98a7000)
	/lib64/ld-linux-x86-64.so.2 (0x00007fc5a9ebe000)

📦[xlevus@pants-debug2 gymkhana]$ python3.10 
Python 3.10.13 (main, Aug 28 2023, 00:00:00) [GCC 12.3.1 20230508 (Red Hat 12.3.1-1)] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import _ssl
>>> _ssl.__file__
'/usr/lib64/python3.10/lib-dynload/_ssl.cpython-310-x86_64-linux-gnu.so'
>>> _ssl.OPENSSL_VERSION
'OpenSSL 3.0.9 30 May 2023'


📦[xlevus@pants-debug2 gymkhana]$ ldd /usr/lib64/python3.10/lib-dynload/_ssl.cpython-310-x86_64-linux-gnu.so
	linux-vdso.so.1 (0x00007ffec9df8000)
	libssl.so.3 => /lib64/libssl.so.3 (0x00007ffbd04b3000)
	libcrypto.so.3 => /lib64/libcrypto.so.3 (0x00007ffbd0088000)
	libc.so.6 => /lib64/libc.so.6 (0x00007ffbcfeab000)
	libz.so.1 => /lib64/libz.so.1 (0x00007ffbcfe91000)
	/lib64/ld-linux-x86-64.so.2 (0x00007ffbd0592000)
📦[xlevus@pants-debug2 gymkhana]$ 



📦[xlevus@pants-debug2 pants-sandbox-z3rdnp]$ ldd /home/xlevus/.cache/nce/29319df9a6ca02e838617675b5b8dd7e5b18a393c27e74979823158b85c015d9/bindings/venvs/2.18.0/bin/python3.9
	linux-vdso.so.1 (0x00007ffc1a626000)
	/home/xlevus/.cache/nce/29319df9a6ca02e838617675b5b8dd7e5b18a393c27e74979823158b85c015d9/bindings/venvs/2.18.0/bin/../lib/libpython3.9.so.1.0 => not found
	libpthread.so.0 => /lib64/libpthread.so.0 (0x00007f91208e7000)
	libdl.so.2 => /lib64/libdl.so.2 (0x00007f91208e2000)
	libutil.so.1 => /lib64/libutil.so.1 (0x00007f91208dd000)
	libm.so.6 => /lib64/libm.so.6 (0x00007f91207fd000)
	librt.so.1 => /lib64/librt.so.1 (0x00007f91207f6000)
	libc.so.6 => /lib64/libc.so.6 (0x00007f9120619000)
	/lib64/ld-linux-x86-64.so.2 (0x00007f91208f8000)

📦[xlevus@pants-debug2 pants-sandbox-z3rdnp]$ /home/xlevus/.cache/nce/29319df9a6ca02e838617675b5b8dd7e5b18a393c27e74979823158b85c015d9/bindings/venvs/2.18.0/bin/python3.9
Python 3.9.18 (main, Jan  8 2024, 05:40:12) 
[Clang 17.0.6 ] on linux
Type "help", "copyright", "credits" or "license" for more information.
Cannot read termcap database;
using dumb terminal settings.
>>> import _ssl
>>> _ssl.__file__
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
AttributeError: module '_ssl' has no attribute '__file__'
>>> _ssl.OPENSSL_VERSION
'OpenSSL 3.0.12 24 Oct 2023'

the original __run.sh contents:

env -i CPPFLAGS= LANG=en_NZ.UTF-8 LDFLAGS= PATH=$'/home/xlevus/.nix-profile/bin:/nix/var/nix/profiles/default/bin:/home/xlevus/.local/bin:/home/xlevus/bin:/usr/local/bin:/usr/bin:/bin:/usr/local/sbin:/usr/sbin:/sbin' PEX_IGNORE_RCFILES=true PEX_PYTHON=/home/xlevus/.cache/nce/29319df9a6ca02e838617675b5b8dd7e5b18a393c27e74979823158b85c015d9/bindings/venvs/2.18.0/bin/python3.9 PEX_ROOT=.cache/pex_root PEX_SCRIPT=pex3 /home/xlevus/.cache/nce/29319df9a6ca02e838617675b5b8dd7e5b18a393c27e74979823158b85c015d9/bindings/venvs/2.18.0/bin/python3.9 ./pex lock create --tmpdir .tmp --python-path $'/home/xlevus/.pyenv/versions/3.10.13/bin:/home/xlevus/.pyenv/versions/3.12.1/bin:/home/xlevus/.nix-profile/bin:/nix/var/nix/profiles/default/bin:/home/xlevus/.local/bin:/home/xlevus/bin:/usr/local/bin:/usr/bin:/bin:/usr/local/sbin:/usr/sbin:/sbin' $'--output=lock.json' --no-emit-warnings $'--style=universal' --pip-version 23.1.2 --resolver-version pip-2020-resolver --target-system linux --target-system mac $'--indent=2' --no-pypi $'--index=https://pypi.org/simple/' --manylinux manylinux2014 --interpreter-constraint $'CPython==3.10.*' django

when changing PEX_PYTHON to PEX_PYTHON=/usr/bin/python3.10 or the system installed python3.9 __run.sh runs OK and generates a lockfile.

@xlevus
Copy link
Contributor

xlevus commented Jan 28, 2024

Further:

Unpacking pex and changing __run.sh to invoke __main__.py instead, I can trace the error to : https://github.com/pantsbuild/pex/blob/v2.1.137/pex/fetcher.py#L48

(Pdb) sys.executable '/home/xlevus/Projects/xlvs/gymkhana/TMPHOME/.cache/nce/67912efc04f9156d8f5b48a0348983defb964de043b8c13ddc6cc8a002f8e691/cpython-3.9.18+20240107-x86_64-unknown-linux-gnu-install_only.tar.gz/python/bin/python3.9

  /home/xlevus/Projects/xlvs/gymkhana/TMPHOME/.cache/nce/67912efc04f9156d8f5b48a0348983defb964de043b8c13ddc6cc8a002f8e691/cpython-3.9.18+20240107-x86_64-unknown-linux-gnu-install_only.tar.gz/python/lib/python3.9/threading.py(937)_bootstrap()
-> self._bootstrap_inner()
  /home/xlevus/Projects/xlvs/gymkhana/TMPHOME/.cache/nce/67912efc04f9156d8f5b48a0348983defb964de043b8c13ddc6cc8a002f8e691/cpython-3.9.18+20240107-x86_64-unknown-linux-gnu-install_only.tar.gz/python/lib/python3.9/threading.py(980)_bootstrap_inner()
-> self.run()
  /home/xlevus/Projects/xlvs/gymkhana/TMPHOME/.cache/nce/67912efc04f9156d8f5b48a0348983defb964de043b8c13ddc6cc8a002f8e691/cpython-3.9.18+20240107-x86_64-unknown-linux-gnu-install_only.tar.gz/python/lib/python3.9/threading.py(917)run()
-> self._target(*self._args, **self._kwargs)
  /tmp/pants-sandbox-WRfUMY/.deps/pex-2.1.137-py2.py3-none-any.whl/pex/jobs.py(525)spawn_jobs()
-> result = Spawn(item, spawn_func(item))
  /tmp/pants-sandbox-WRfUMY/.deps/pex-2.1.137-py2.py3-none-any.whl/pex/resolver.py(130)_spawn_download()
-> self.observer.observe_download(target=target, download_dir=download_dir)
  /tmp/pants-sandbox-WRfUMY/.deps/pex-2.1.137-py2.py3-none-any.whl/pex/resolve/lockfile/create.py(201)observe_download()
-> url_fetcher=URLFetcher(
> /tmp/pants-sandbox-WRfUMY/.deps/pex-2.1.137-py2.py3-none-any.whl/pex/fetcher.py(51)__init__()
-> ssl_context = ssl.create_default_context()
  /home/xlevus/Projects/xlvs/gymkhana/TMPHOME/.cache/nce/67912efc04f9156d8f5b48a0348983defb964de043b8c13ddc6cc8a002f8e691/cpython-3.9.18+20240107-x86_64-unknown-linux-gnu-install_only.tar.gz/python/lib/python3.9/ssl.py(738)create_default_context()
-> context = SSLContext(PROTOCOL_TLS)
  /home/xlevus/Projects/xlvs/gymkhana/TMPHOME/.cache/nce/67912efc04f9156d8f5b48a0348983defb964de043b8c13ddc6cc8a002f8e691/cpython-3.9.18+20240107-x86_64-unknown-linux-gnu-install_only.tar.gz/python/lib/python3.9/ssl.py(484)__new__()
-> self = _SSLContext.__new__(cls, protocol)

The protocol version being passed in is: <_SSLMethod.PROTOCOL_TLS: 2>

Buuuuut, changing the __run.sh script to (i.e. call that function, using the same environment & interpreter) it works fine ???:

env -i CPPFLAGS= LANG=en_NZ.UTF-8 LDFLAGS= PATH=$'/home/xlevus/Projects/xlvs/gymkhana/TMPHOME/.local/bin:/home/xlevus/Projects/xlvs/gymkhana/TMPHOME/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/home/xlevus/.nix-profile/bin:/nix/var/nix/profiles/default/bin:/home/xlevus/.local/bin:/home/xlevus/bin' PEX_IGNORE_RCFILES=true PEX_PYTHON=/home/xlevus/Projects/xlvs/gymkhana/TMPHOME/.cache/nce/29319df9a6ca02e838617675b5b8dd7e5b18a393c27e74979823158b85c015d9/bindings/venvs/2.18.0/bin/python3.9 PEX_ROOT=.cache/pex_root PEX_SCRIPT=pex3 /home/xlevus/Projects/xlvs/gymkhana/TMPHOME/.cache/nce/29319df9a6ca02e838617675b5b8dd7e5b18a393c27e74979823158b85c015d9/bindings/venvs/2.18.0/bin/python3.9 -c "import ssl; ssl.create_default_context()"

@jsirois
Copy link
Contributor

jsirois commented Jan 29, 2024

@xlevus it looks like you have everything at hand you need to dig. If you can come up with a docker-based repro, perhaps someone can help out, but as it stands you have the files, paths, etc.

@xlevus
Copy link
Contributor

xlevus commented Jan 29, 2024

I've created a docker-based reproduction here: https://github.com/xlevus/pants-issue-20467 and it seems to work in a Fedora VM, trying to get an ubuntu VM up to confirm it works in one of those too.

Will poke around more tonight. But i'm a little stumped tbqh.

It appears to only be an issue when specifically combining the distributed python3.9 venv for pants, and pex. The venv's ssl-context code works fine, and the pex code works fine. but combine the two and ???

@jsirois
Copy link
Contributor

jsirois commented Jan 29, 2024

Great - thanks. I'll try to poke around with the repro case. That said, this is crazy-making "It appears to only be an issue when specifically combining the distributed python3.9" since your OP is this:

$ pants --keep-sandboxes=on_failure generate-lockfiles ::
10:54:06.16 [INFO] Preserving local process execution dir /tmp/pants-sandbox-ouAMHY for Generate lockfile for python-default
10:54:06.16 [INFO] Completed: Generate lockfile for python-default
10:54:06.16 [ERROR] 1 Exception encountered:

Engine traceback:
  in `generate-lockfiles` goal

ProcessExecutionFailure: Process 'Generate lockfile for python-default' failed with exit code 1.
stdout:

stderr:
Failed to spawn a job for /usr/bin/python3.10: unknown error (_ssl.c:3161)

That is definitely not python3.9 let alone the scie-pants hermetic python3.9.

@xlevus
Copy link
Contributor

xlevus commented Jan 29, 2024

Great - thanks. I'll try to poke around with the repro case. That said, this is crazy-making "It appears to only be an issue when specifically combining the distributed python3.9" since your OP is this:

$ pants --keep-sandboxes=on_failure generate-lockfiles ::
10:54:06.16 [INFO] Preserving local process execution dir /tmp/pants-sandbox-ouAMHY for Generate lockfile for python-default
10:54:06.16 [INFO] Completed: Generate lockfile for python-default
10:54:06.16 [ERROR] 1 Exception encountered:

Engine traceback:
  in `generate-lockfiles` goal

ProcessExecutionFailure: Process 'Generate lockfile for python-default' failed with exit code 1.
stdout:

stderr:
Failed to spawn a job for /usr/bin/python3.10: unknown error (_ssl.c:3161)

That is definitely not python3.9 let alone the scie-pants hermetic python3.9.

The error message is misleading. The 'failed to spawn a job' is from Pex's Job runner.

I'm 100% 89% confident the error comes from within a python3.9 executable.

Here's my hacked up sandbox with a pdb.breakpoint stuck right before the failing ssl call:

📦[xlevus@pants-debug2 pants-sandbox-WRfUMY]$ ./__run.sh 
Cannot read termcap database;
using dumb terminal settings.
> /tmp/pants-sandbox-WRfUMY/.deps/pex-2.1.137-py2.py3-none-any.whl/pex/fetcher.py(51)__init__()
-> ssl_context = ssl.create_default_context()
(Pdb) w
  /home/xlevus/Projects/xlvs/gymkhana/TMPHOME/.cache/nce/67912efc04f9156d8f5b48a0348983defb964de043b8c13ddc6cc8a002f8e691/cpython-3.9.18+20240107-x86_64-unknown-linux-gnu-install_only.tar.gz/python/lib/python3.9/threading.py(937)_bootstrap()
-> self._bootstrap_inner()
  /home/xlevus/Projects/xlvs/gymkhana/TMPHOME/.cache/nce/67912efc04f9156d8f5b48a0348983defb964de043b8c13ddc6cc8a002f8e691/cpython-3.9.18+20240107-x86_64-unknown-linux-gnu-install_only.tar.gz/python/lib/python3.9/threading.py(980)_bootstrap_inner()
-> self.run()
  /home/xlevus/Projects/xlvs/gymkhana/TMPHOME/.cache/nce/67912efc04f9156d8f5b48a0348983defb964de043b8c13ddc6cc8a002f8e691/cpython-3.9.18+20240107-x86_64-unknown-linux-gnu-install_only.tar.gz/python/lib/python3.9/threading.py(917)run()
-> self._target(*self._args, **self._kwargs)
  /tmp/pants-sandbox-WRfUMY/.deps/pex-2.1.137-py2.py3-none-any.whl/pex/jobs.py(525)spawn_jobs()
-> result = Spawn(item, spawn_func(item))
  /tmp/pants-sandbox-WRfUMY/.deps/pex-2.1.137-py2.py3-none-any.whl/pex/resolver.py(130)_spawn_download()
-> self.observer.observe_download(target=target, download_dir=download_dir)
  /tmp/pants-sandbox-WRfUMY/.deps/pex-2.1.137-py2.py3-none-any.whl/pex/resolve/lockfile/create.py(201)observe_download()
-> url_fetcher=URLFetcher(
> /tmp/pants-sandbox-WRfUMY/.deps/pex-2.1.137-py2.py3-none-any.whl/pex/fetcher.py(51)__init__()
-> ssl_context = ssl.create_default_context()
(Pdb) !import sys
(Pdb) pp sys.executable
'/home/xlevus/Projects/xlvs/gymkhana/TMPHOME/.cache/nce/67912efc04f9156d8f5b48a0348983defb964de043b8c13ddc6cc8a002f8e691/cpython-3.9.18+20240107-x86_64-unknown-linux-gnu-install_only.tar.gz/python/bin/python3.9'
(Pdb) n
ssl.SSLError: unknown error (_ssl.c:3161)
> /tmp/pants-sandbox-WRfUMY/.deps/pex-2.1.137-py2.py3-none-any.whl/pex/fetcher.py(51)__init__()
-> ssl_context = ssl.create_default_context()
(pdb)

@jsirois
Copy link
Contributor

jsirois commented Jan 29, 2024

Ok, thanks for the repro case @xlevus - super helpful.

I have not figured out why PBS Python 3.9 is different here, and apparently only different in a Fedora context to boot, but the issue is related to threading. If you use a PBS 3.9 repl to import ssl; ssl.create_default_context() - no issue as you found out. The relevant difference in the Pex case is this function is called not in the main application thread, but in a job spawn thread used for spawning parallel (subprocess) jobs. If I create an SSL context early in the main thread, all is well and the lock succeeds:

[root@3d2dd3ceaa5c pants-sandbox-qAIWax]# diff -u .deps/pex-2.1.137-py2.py3-none-any.whl/pex/fetcher.py pex-venv/lib/python3.9/site-packages/pex/fetcher.py
--- .deps/pex-2.1.137-py2.py3-none-any.whl/pex/fetcher.py       1980-01-01 00:00:00.000000000 +0000
+++ pex-venv/lib/python3.9/site-packages/pex/fetcher.py 2024-01-28 21:18:01.662789434 +0000
@@ -4,6 +4,8 @@
 from __future__ import absolute_import

 import ssl
+ssl.create_default_context()
+
 import time
 from contextlib import closing, contextmanager

[root@3d2dd3ceaa5c pants-sandbox-qAIWax]#

There the diff represents some sandbox mucking about, but the upshot is trying to grab the context on import of pex/fetcher.py is enough to ensure this happens in the main thread and all is well.

The remaining work to do is to see what is buggy here. Is this a PBS Python build buggy somehow? Is it a bug in Pex code - should SSLContext only ever be created in the application main thread? Is this a Fedora glibc modern (which includes libpthread) vs libpthread.so.0 which PBS links to (unlike the system Python 3.9)? I have no clue at the moment.

@jsirois
Copy link
Contributor

jsirois commented Jan 29, 2024

I'll note that I'm dropping work for the evening and I'm AFK likely until the 1st.

@xlevus
Copy link
Contributor

xlevus commented Jan 29, 2024

Further Investigation:

  • Swapping PBS 2024 build with 20230826 works (but did require me to install libxcrypt on Fedora)
  • Swapping PBS 2024 build with 20231002 errors in the same place.

Possible key change between the two is:

OpenSSL 1.1 -> 3.0 on supported platforms. Linux and macOS now use OpenSSL 3.0.x. Windows uses OpenSSL 3.0.x on CPython 3.11+.

@jsirois
Copy link
Contributor

jsirois commented Feb 7, 2024

@xlevus I'm working on a short-term fix in pex-tool/pex#2355. I'd still love to know what's really going on here, but 1st to stop the bleeding.

@jsirois jsirois added bug and removed question labels Feb 7, 2024
@jsirois
Copy link
Contributor

jsirois commented Feb 7, 2024

I've flipped this back to a bug - apologies @mjimlittle, you ended up being right there. With @xlevus's help debugging, a fix for this issue in Pex is now released in 2.1.163: https://github.com/pantsbuild/pex/releases/tag/v2.1.163

A Pants maintainer will take it from here and upgrade Pants / instruct you how to do so for your Pants version.

@mjimlittle
Copy link
Author

Hey @jsirois thanks for the update. Also, I am sorry I could not help out in tracing the source of the issue.
I'm relatively new to the python/pants ecosystem so I could not keep up with @xlevus :D

As a workaround, I have Dockerized pants using an Ubuntu base image and I can successfully generate needed lock files.

cburroughs added a commit to cburroughs/pants that referenced this issue Feb 7, 2024
Changelogs:
 * https://github.com/pantsbuild/pex/releases/tag/v2.1.163

```
Lockfile diff: 3rdparty/python/user_reqs.lock [python-default]

==                    Upgraded dependencies                     ==

  pex                            2.1.162      -->   2.1.163
```

Fixes pantsbuild#20467
@cburroughs
Copy link
Contributor

To use the new version of Pex without waiting on a Pants release

[pex-cli]
version = "v2.1.163"
known_versions = [
  "v2.1.163|macos_arm64 |21cb16072357af4b1f4c4e91d2f4d3b00a0f6cc3b0470da65e7176bbac17ec35|3677552",
  "v2.1.163|macos_x86_64|21cb16072357af4b1f4c4e91d2f4d3b00a0f6cc3b0470da65e7176bbac17ec35|3677552",
  "v2.1.163|linux_x86_64|21cb16072357af4b1f4c4e91d2f4d3b00a0f6cc3b0470da65e7176bbac17ec35|3677552",
  "v2.1.163|linux_arm64 |21cb16072357af4b1f4c4e91d2f4d3b00a0f6cc3b0470da65e7176bbac17ec35|3677552",
]

(That's the sha256 and size of the pex artifact, which you can calculate your self by downloading from the release page.)

@mjimlittle
Copy link
Author

Thanks @cburroughs works fine now!

kaos pushed a commit that referenced this issue Feb 7, 2024
Changelogs:
 * https://github.com/pantsbuild/pex/releases/tag/v2.1.163

```
Lockfile diff: 3rdparty/python/user_reqs.lock [python-default]

==                    Upgraded dependencies                     ==

  pex                            2.1.162      -->   2.1.163
```

Fixes #20467
@jsirois
Copy link
Contributor

jsirois commented Feb 9, 2024

@xlevus it turns out the issue is the custom RedHat OpenSSL option "rh-allow-sha1-signatures", seen here for example: https://gitlab.com/redhat/centos-stream/rpms/openssl/-/blob/c9s/0049-Selectively-disallow-SHA1-signatures.patch

If I do this on a fedora:37 image:

[root@d13f087cea45 /]# diff -u /etc/crypto-policies/back-ends/opensslcnf.config.orig /etc/crypto-policies/back-ends/opensslcnf.config
--- /etc/crypto-policies/back-ends/opensslcnf.config.orig       2024-02-09 00:54:33.569271689 +0000
+++ /etc/crypto-policies/back-ends/opensslcnf.config    2024-02-09 00:54:54.309267497 +0000
@@ -6,8 +6,3 @@
 DTLS.MaxProtocol = DTLSv1.2
 SignatureAlgorithms = ECDSA+SHA256:ECDSA+SHA384:ECDSA+SHA512:ed25519:ed448:rsa_pss_pss_sha256:rsa_pss_pss_sha384:rsa_pss_pss_sha512:rsa_pss_rsae_sha256:rsa_pss_rsae_sha384:rsa_pss_rsae_sha512:RSA+SHA256:RSA+SHA384:RSA+SHA512:ECDSA+SHA224:RSA+SHA224

-[openssl_init]
-alg_section = evp_properties
-
-[evp_properties]
-rh-allow-sha1-signatures = yes

Then a test rig works without main thread vs non shenanigans. As to why the thread makes a difference I have no clue yet, but a custom PBS build that enables openssl debug symbols and many gdb sessions later, I was able to narrow in on reading rh-allow-sha1-signatures, which is not a standard openssl config option, as the action leading to an error return path eventually bubbling out to _ssl.c:3161.

I'll update indygreg/python-build-standalone#207 with all the details of the debug session later tonight. This is not Gregory's problem, but others may bump into RedHat shenanigans and need the ~FAQ on what goes on when vanilla openssl in PBS tries to read RedHat custom config.

@jsirois
Copy link
Contributor

jsirois commented Feb 9, 2024

The explanation is contained in a comment in pex-tool/pex#2358 which I've pinged folks in this thread on.

jsirois added a commit to jsirois/lift that referenced this issue Feb 10, 2024
jsirois added a commit to a-scie/lift that referenced this issue Feb 12, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants