Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

import nacl.bindings hangs #327

Open
colinmcintosh opened this issue Jul 24, 2017 · 23 comments
Open

import nacl.bindings hangs #327

colinmcintosh opened this issue Jul 24, 2017 · 23 comments

Comments

@colinmcintosh
Copy link

I ran into an issue with another library hanging on import that appears be caused by PyNaCl. I can reproduce the issue with

$ python -c 'import nacl.bindings'

Expected Behavior: command should complete in, at most, a few seconds.
Actual Behavior: commands takes roughly 90 seconds to complete

Running strace on the command shows that the import is hanging here

open("/dev/random", O_RDONLY)           = 4
poll([{fd=4, events=POLLIN}], 1, 4294967295

I found this closed issue and tried following the recommendation of running pip install pynacl -I --no-binary pynacl but the issue still occurs.
paramiko/paramiko#1023

OS: Ubuntu 14.04.5
Python Version: 2.7.6

@reaperhulk
Copy link
Member

Are you sure it successfully reinstalled without using the manylinux1 binary? Ubuntu 14.04.5 should have kernel 4.4, which definitely has getrandom and therefore shouldn't see this since libsodium shouldn't take that code path... Could you give the output of uname -a?

@colinmcintosh
Copy link
Author

Verbatim commands I'm using to reproduce the issue

$ virtualenv env
$ . env/bin/activate
$ pip install pynacl -I --no-binary pynacl
$ python -c 'import nacl.bindings'

uname -a

$ uname -a
Linux my-dev-vm 4.2.0-27-generic #32~14.04.1-Ubuntu SMP Fri Jan 22 15:32:26 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux

This was a 14.04.1 install that was upgraded to 14.04.5 but it looks like it still has the 4.2 kernel

$ cat /etc/os-release
NAME="Ubuntu"
VERSION="14.04.5 LTS, Trusty Tahr"
ID=ubuntu
ID_LIKE=debian
PRETTY_NAME="Ubuntu 14.04.5 LTS"
VERSION_ID="14.04"
HOME_URL="http://www.ubuntu.com/"
SUPPORT_URL="http://help.ubuntu.com/"
BUG_REPORT_URL="http://bugs.launchpad.net/ubuntu/"
$ lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description:    Ubuntu 14.04.5 LTS
Release:        14.04
Codename:       trusty

@colinmcintosh
Copy link
Author

Upon further research it looks like although 14.04.5 ships with 4.4 by default a dist-upgrade from a previous version of 14.04 will not include the 4.4 kernel update automatically. This means that 14.04.5 is not guaranteed to be running on the 4.4 kernel.

Release notes (for new installs):

By default, the 14.04.5 point release will ship with a newer 4.4 Linux kernel from Ubuntu 16.04 [...]

https://wiki.ubuntu.com/TrustyTahr/ReleaseNotes

LTS Enablement notes stating that older installs need to be explicitly upgraded

The 14.04.2 and newer point releases will ship with an updated kernel and X stack by default. If you have installed with older media you can use the following to install the newer HWE kernel derived from 16.04 (Xenial):

[...]

SERVER

Install the HWE kernel derived from 16.04 (xenial):

 sudo apt-get install --install-recommends linux-generic-lts-xenial 

https://wiki.ubuntu.com/Kernel/LTSEnablementStack#Ubuntu_14.04_LTS_-_Trusty_Tahr

Can this be adjusted to account for the lack of 4.4 kernel on 14.04.x installs?

@reaperhulk
Copy link
Member

Thanks for the followup, that's good to know.

Altering the way libsodium attempts to determine if the entropy pool is initialized on the /dev/urandom path would require discussion upstream with the libsodium project itself, but we could potentially explore not running sodium_init as an import side effect. That would be tricky though 😢

@lmctv
Copy link
Contributor

lmctv commented Sep 2, 2017

@colinmcintosh : just curious, which kind of environment is this? Could you try to run a

$ sudo find /var/lib/ -name random-seed -ls

to verify if the file exists and is regularly updated at system restarts?

@lmctv
Copy link
Contributor

lmctv commented Sep 3, 2017

Another question, if this is a virtual machine, which kind of hypervisor is running it?

@colinmcintosh
Copy link
Author

@lmctv here is the output of that command. I rebooted and it appears to be updated after the system restart.

$ sudo find /var/lib/ -name random-seed -ls
2354699    4 -rw-------   1 root     root          512 Sep  2 18:12 /var/lib/urandom/random-seed
$ date
Sat Sep  2 18:13:28 PDT 2017
$ uptime
 18:13:29 up 1 min,  1 user,  load average: 0.61, 0.70, 0.24

This is a virtual machine running in VMware Workstation 12.

@lmctv
Copy link
Contributor

lmctv commented Sep 3, 2017

From past experience, I was somewhat expecting we were talking about a VM; if running under qemu, my suggestion would have been to enable the virtio_rng emulated device adding the -device virtio-rng-pci option to the qemu command line in use, or the corresponding <rng model='virtio'>. . .</rng> stanza in the libvirt xml configuration file for the VM.

I remember stumbling in some notes about available options to gather entropy from inside the VM ranging from pollinate/pollen to haveged and egd from the time I ran my VMs under a xen hypervisor, but since I've no experience with any of them, all I can do is suggest those names to "seed" your search for a entropy gathering system fit for your environment.

I think this issue can be closed, and maybe referred from a new one about documenting the runtime requirement for a fully seeded entropy pool.

@lmctv
Copy link
Contributor

lmctv commented Jan 13, 2018

Added a documentation tag.

@nwertzberger
Copy link

Shouldn't there be an option to disable this behavior and use /dev/urandom? libsodium seems to have a compiler flags for this: setting NO_BLOCKING_RANDOM_POLL, and NOT setting USE_BLOCKING_RANDOM.

From what i read online, it would seem to suggest that /dev/urandom should almost always be used instead of /dev/random, and that there's only a few seconds on startup where /dev/urandom isn't cryptographically secure.

@reaperhulk
Copy link
Member

By default libsodium only uses the /dev/random path to determine if the CSPRNG is initialized before it starts using /dev/urandom. It also prefers to use getrandom, which gives the same behavior for free (blocking only until the CSPRNG is ready). On normal systems this happens extremely rapidly, but there exists a weird long tail of systems that can have bad entropy for long periods of time (especially small things like raspberry pi devices). We do not want to supply bad randomness as that is a major security problem, so blocking is the right decision.

This obviously results in a bad user experience, but the underlying problem is not one pynacl/libsodium can fix, we can only wait.

sadsfae added a commit to redhat-performance/quads that referenced this issue Mar 16, 2019
There seems to be a bug with entropy pool gathering with libsodium in
which calls to /dev/random fail.  This manifests itself in QEMU because
it has no hardware entropy (and as a result containers run in VMs).

As /dev/random and /dev/urandom are blocking devices this manifests
itself in slowdowns for random, sporadic calls from QUADS.  libsodium
which is used by paramiko cannot use /dev/urandom (non-blocking entropy)
so this moves us to use a software-based entropy service which is
optional.

#221

More about the nacl bindings / libsodium bug:
PyNACL / Sodium:

pyca/pynacl#327

Symptoms / cause:

======
By default libsodium only uses the /dev/random path to determine if
the CSPRNG is initialized before it starts using /dev/urandom.
It also prefers to use getrandom, which gives the same behavior for
free (blocking only until the CSPRNG is ready). On normal systems this
happens extremely rapidly, but there exists a weird long tail of systems
that can have bad entropy for long periods of time
======

The pragmatic solution here is to instead use the haveged service to
provide software entropy.

https://issihosts.com/haveged/

* Make haveged a dependency on QUADS via RPM installation
* For containers include and start it in the Dockerfile
* haveged is optional, just turn it off if you want to use /dev/random
if you're on bare-metal or feel you have sufficient entropy.

Included are other fixes/changes below:

* Add python3-ipdb to rpm spec requirements
* Specify version requirements for ipmitool and git
* Remove contactbank wp plugin, we're not using it and it may not even
work right.
* Since HTTPD is required for serving visuals/instackenv.json make it
also be a dependency of quads-server.service
* Demote docker compose from recommended status
* Correct RPM spec warnings about incorrect date format
* Remove waffle.io badge / references as they are closing up shop.

https://waffle.io/closing-its-doors

Fixes: #221

Change-Id: I5c20defe12871e6399cf6b1ada659caf1a5e1b94
@zefoo
Copy link

zefoo commented May 11, 2019

I've experienced this issue several times, at random. I've spent hours troubleshooting, debugging, checking things, talking with various experts. Whenever we have hiccups in the system I most often blame thus bug. It can hang for many many seconds, 30+.

It hangs with this, from python -v...

/venv/lib/python3.6/site-packages/nacl/bindings/crypto_pwhash.py
# code object from '/home/dev/app/live/venv/lib/python3.6/site-packages/nacl/bindings/__pycache__/crypto_pwhash.cpython-36.pyc'
import 'nacl.bindings.crypto_pwhash' # <_frozen_importlib_external.SourceFileLoader object at 0x7fb133530d68>
# /home/dev/app/live/venv/lib/python3.6/site-packages/nacl/bindings/__pycache__/crypto_scalarmult.cpython-36.pyc matches /home/dev/app/live/venv/lib/python3.6/site-packages/nacl/bindings/crypto_scalarmult.py
# code object from '/home/dev/app/live/venv/lib/python3.6/site-packages/nacl/bindings/__pycache__/crypto_scalarmult.cpython-36.pyc'
import 'nacl.bindings.crypto_scalarmult' # <_frozen_importlib_external.SourceFileLoader object at 0x7fb13353a2e8>
# /home/dev/app/live/venv/lib/python3.6/site-packages/nacl/bindings/__pycache__/crypto_secretbox.cpython-36.pyc matches /home/dev/app/live/venv/lib/python3.6/site-packages/nacl/bindings/crypto_secretbox.py
# code object from '/home/dev/app/live/venv/lib/python3.6/site-packages/nacl/bindings/__pycache__/crypto_secretbox.cpython-36.pyc'
import 'nacl.bindings.crypto_secretbox' # <_frozen_importlib_external.SourceFileLoader object at 0x7fb13353a400>
# /home/dev/app/live/venv/lib/python3.6/site-packages/nacl/bindings/__pycache__/crypto_shorthash.cpython-36.pyc matches /home/dev/app/live/venv/lib/python3.6/site-packages/nacl/bindings/crypto_shorthash.py
# code object from '/home/dev/app/live/venv/lib/python3.6/site-packages/nacl/bindings/__pycache__/crypto_shorthash.cpython-36.pyc'
import 'nacl.bindings.crypto_shorthash' # <_frozen_importlib_external.SourceFileLoader object at 0x7fb13353a518>
# /home/dev/app/live/venv/lib/python3.6/site-packages/nacl/bindings/__pycache__/crypto_sign.cpython-36.pyc matches /home/dev/app/live/venv/lib/python3.6/site-packages/nacl/bindings/crypto_sign.py
# code object from '/home/dev/app/live/venv/lib/python3.6/site-packages/nacl/bindings/__pycache__/crypto_sign.cpython-36.pyc'
import 'nacl.bindings.crypto_sign' # <_frozen_importlib_external.SourceFileLoader object at 0x7fb13353a5c0>
# /home/dev/app/live/venv/lib/python3.6/site-packages/nacl/bindings/__pycache__/randombytes.cpython-36.pyc matches /home/dev/app/live/venv/lib/python3.6/site-packages/nacl/bindings/randombytes.py
# code object from '/home/dev/app/live/venv/lib/python3.6/site-packages/nacl/bindings/__pycache__/randombytes.cpython-36.pyc'
import 'nacl.bindings.randombytes' # <_frozen_importlib_external.SourceFileLoader object at 0x7fb13353a860>
# /home/dev/app/live/venv/lib/python3.6/site-packages/nacl/bindings/__pycache__/sodium_core.cpython-36.pyc matches /home/dev/app/live/venv/lib/python3.6/site-packages/nacl/bindings/sodium_core.py
# code object from '/home/dev/app/live/venv/lib/python3.6/site-packages/nacl/bindings/__pycache__/sodium_core.cpython-36.pyc'
import 'nacl.bindings.sodium_core' # <_frozen_importlib_external.SourceFileLoader object at 0x7fb13353a8d0>
# /home/dev/app/live/venv/lib/python3.6/site-packages/nacl/bindings/__pycache__/utils.cpython-36.pyc matches /home/dev/app/live/venv/lib/python3.6/site-packages/nacl/bindings/utils.py
# code object from '/home/dev/app/live/venv/lib/python3.6/site-packages/nacl/bindings/__pycache__/utils.cpython-36.pyc'
import 'nacl.bindings.utils' # <_frozen_importlib_external.SourceFileLoader object at 0x7fb13353a9e8>

Is there any movement or any other things to test on this?

Thank you, running out of ideas.

@lmctv
Copy link
Contributor

lmctv commented May 12, 2019

@zefoo if this is from a virtual machine as was in @colinmcintosh case, did you try if any of the suggestions in #327 (comment) could help?

@hypercodex
Copy link

This blog makes some good points: https://www.2uo.de/myths-about-urandom

@reaperhulk
Copy link
Member

That blog post has nothing to say about this issue, which is that libsodium blocks in the presence of an uninitialized CSPRNG. This is desirable behavior, but frustrating if you happen to be on a device with a problem bootstrapping initial entropy. Once you get that initial seed urandom (and getrandom) will never block and are the sole source libsodium uses.

@zefoo
Copy link

zefoo commented May 26, 2019

@lmctv @hypercodex @reaperhulk I really appreciate the two cents. This is not running on a virtual machine that I have host access to. I am seeing this on a Digital Ocean droplet. I've been trying to track it down for months. I see processes overlapping instead of properly dying off due to the hanging of loading nacl (per my paste above). I don't exactly understand all of the moving parts here and not sure what to do next. Any suggestions appreciated, or anything more I can do would be helpful. I started noticing after upgrading to Ubuntu 18.04. Thank you.

@zefoo
Copy link

zefoo commented May 28, 2019

I was advised to apt-get install haveged. Before, I did cat /proc/sys/kernel/random/entropy_avail and I'd get 115. Now I do cat /proc/sys/kernel/random/entropy_avail and I get 2431. I also edited /etc/default/haveged and set it to 2048. So far I haven't seen any lags in the logs... but I'll be more confident in about a week. Not fully understanding everything here, but sharing in case it helps someone else.

@j3pic
Copy link

j3pic commented Oct 10, 2019

If it's inherently insecure to not read /dev/random at startup, does that mean that OpenSSH is insecure? I can spawn dozens of ssh processes and none of them will hang, which implies that they're not reading /dev/random.

If I try to spawn two Python processes that both import nacl.bindings, at least one of them will hang, since both will try to read /dev/random, and the first one to succeed will drain off all the entropy for the next 10 minutes or so on a Google Cloud VM instance. Starting dozens of Python processes that all import nacl.bindings is impossible.

@hellt
Copy link

hellt commented Apr 7, 2020

I am experiencing the same issue running PyNaCl in a docker container in a VM under qemu/kvm

root@e5ed8639c62f:/# cat /proc/sys/kernel/random/entropy_avail
120

so what is the suggested workarounds? Install haveged?

@hellt
Copy link

hellt commented Apr 8, 2020

This one fixed it for me:

pip install pynacl -I --no-binary pynacl

@bitlogik
Copy link

bitlogik commented Oct 1, 2020

We are facing this issue on a Linux VPS used for test, the app is restarted after each code modification, which triggers a new init call to pynacl/libsodium. Sometimes (~30% of the time) it takes 40-90 seconds to start, because this "pause" in libsodium start.

Here's the point when we CTRL+C when this hangs. Note that the prompt shell is given back only after the hang ending, so this is a real blocking thing.

  File "/root/guardata/guardata/crypto/__init__.py", line 8, in <module>
    from nacl.public import SealedBox, PrivateKey as _PrivateKey, PublicKey as _PublicKey
  File "/usr/local/lib/python3.7/dist-packages/nacl/public.py", line 17, in <module>
    import nacl.bindings
  File "/usr/local/lib/python3.7/dist-packages/nacl/bindings/__init__.py", line 423, in <module>
    sodium_init()
  File "/usr/local/lib/python3.7/dist-packages/nacl/bindings/sodium_core.py", line 32, in sodium_init
    ffi.init_once(_sodium_init, "libsodium")
  File "/usr/local/lib/python3.7/dist-packages/nacl/bindings/sodium_core.py", line 22, in _sodium_init
    ensure(lib.sodium_init() != -1,

This is not happenning with "no-binary" and compiling from source with wheel.
Why this hang doesn't happen with the program from source? what is the difference between source and precompiled whl? Precompiled and re-compiled should behave the same, so maybe investing in the differences can lead to some clues about this issue.

@bitlogik
Copy link

bitlogik commented Oct 1, 2020

Some info about the VPS system :
x86_64 GNU/Linux Debian 10 "Buster", kernel 4.19.0-10-cloud-amd64 #1 SMP Debian 4.19.132-1 (2020-07-24)
Python 3.7.3
PyNaCl 1.4.0

@reaperhulk
Copy link
Member

As I noted in this issue in May 2019 this is entirely a CSPRNG initialization situation within libsodium itself. There is nothing pynacl can do here. Pynacl ships with a newer version of libsodium than buster does so it’s possible they’ve changed some details of that initialization though.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

No branches or pull requests

9 participants