Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unhandled OpenSSL.SSL.WantReadError and WantWriteError raised by PyOpenSSL #245

Open
1 of 3 tasks
Rohitbels opened this issue Oct 22, 2019 · 35 comments Β· May be fixed by #332
Open
1 of 3 tasks

Unhandled OpenSSL.SSL.WantReadError and WantWriteError raised by PyOpenSSL #245

Rohitbels opened this issue Oct 22, 2019 · 35 comments Β· May be fixed by #332
Labels
bug Something is broken

Comments

@Rohitbels
Copy link

Rohitbels commented Oct 22, 2019

❓ I'm submitting a ...

  • 🐞 bug report
  • 🐣 feature request
  • ❓ question about the decisions made in the repository

🐞 Describe the bug. What is the current behavior?
This issue is reproducible against huge post request like multiple lines of textarea in form.
We are upgrading from python 2.7 to 3.7 and hence also upgraded the cheroot 8.2.0.
Cherrypy : 18+

πŸ“‹ Details

Traceback (most recent call last):
  File "/usr/local/pyenv/versions/3.7.0/lib/python3.7/site-packages/cherrypy/_cprequest.py", line 628, in respond
    self._do_respond(path_info)
  File "/usr/local/pyenv/versions/3.7.0/lib/python3.7/site-packages/cherrypy/_cprequest.py", line 680, in _do_respond
    self.body.process()
  File "/usr/local/pyenv/versions/3.7.0/lib/python3.7/site-packages/cherrypy/_cpreqbody.py", line 982, in process
    super(RequestBody, self).process()
  File "/usr/local/pyenv/versions/3.7.0/lib/python3.7/site-packages/cherrypy/_cpreqbody.py", line 559, in process
    proc(self)
  File "/usr/local/pyenv/versions/3.7.0/lib/python3.7/site-packages/cherrypy/_cpreqbody.py", line 145, in process_urlencoded
    qs = entity.fp.read()
  File "/usr/local/pyenv/versions/3.7.0/lib/python3.7/site-packages/cherrypy/_cpreqbody.py", line 817, in read
    data = self.fp.read(chunksize)
  File "/usr/local/pyenv/versions/3.7.0/lib/python3.7/site-packages/cheroot/server.py", line 383, in read
    data = self.rfile.read(size)
  File "/usr/local/pyenv/versions/3.7.0/lib/python3.7/site-packages/cheroot/makefile.py", line 420, in read
    val = super().read(*args, **kwargs)
  File "/usr/local/pyenv/versions/3.7.0/lib/python3.7/_pyio.py", line 1014, in read
    return self._read_unlocked(size)
  File "/usr/local/pyenv/versions/3.7.0/lib/python3.7/_pyio.py", line 1054, in _read_unlocked
    chunk = self.raw.read(wanted)
  File "/usr/local/pyenv/versions/3.7.0/lib/python3.7/socket.py", line 589, in readinto
    return self._sock.recv_into(b)
  File "/usr/local/pyenv/versions/3.7.0/lib/python3.7/site-packages/OpenSSL/SSL.py", line 1335, in recv_into
    self._raise_ssl_error(self._ssl, result)
  File "/usr/local/pyenv/versions/3.7.0/lib/python3.7/site-packages/OpenSSL/SSL.py", line 1149, in _raise_ssl_error
    raise WantReadError()
OpenSSL.SSL.WantReadError

πŸ“‹ Environment

  • Cheroot version: 8.2.0
  • CherryPy version: 18.1.1 (if applicable)
  • Python version: 3.6.7
  • OS: linux
  • Browser: [all | Chrome XX | Firefox XX | IE XX | Safari XX | Mobile Chrome XX | Android X.X Web Browser | iOS XX Safari | iOS XX UIWebView | iOS XX WKWebView ]

πŸ“‹ Additional context
I have went through this #113
but still was unable to figure out the solution

@webknjaz
Copy link
Member

@webknjaz webknjaz added the bug Something is broken label Oct 23, 2019
@jpeckham
Copy link

jpeckham commented Dec 11, 2019

I was getting WantWriteError on very large js files and we did this monkey patch


def _flush_unlocked(self):
    self._checkClosed('flush of closed file')
    while self._write_buf:
        try:
            # ssl sockets only except 'bytes', not bytearrays
            # so perhaps we should conditionally wrap this for perf?
            n = self.raw.write(self._write_buf)
        except io.BlockingIOError as e:
            n = e.characters_written
        except (OpenSSL.SSL.WantReadError,OpenSSL.SSL.WantWriteError, OpenSSL.SSL.WantX509LookupError) as e:
            n = 0
        del self._write_buf[:n]

BufferedWriter._flush_unlocked = _flush_unlocked

@Rohitbels
Copy link
Author

Rohitbels commented Dec 15, 2019

@jpeckham Thanks a lot for this!

@webknjaz and for any one else:

I was able to fix this by replacing to read call at

File "/cheroot/makefile.py", line 420, in read val = super().read(*args, **kwargs)

with

val = self._safe_call(True, super().read, *args, **kwargs)

safe_call is specifically defined in pyopenssl.py for this reason.

@webknjaz This fix looks good?

@jussike
Copy link

jussike commented Sep 4, 2020

Works for me.

Is that Rohitbels's workaround safe for fixing the issue @webknjaz ?

@vashek
Copy link

vashek commented Nov 2, 2020

I suggest changing the description of this issue to "OpenSSL.SSL.WantReadError and WantWriteError", because both errors occur for similar reasons. (I encountered the one with WantWriteError, which is probably the more likely of the two because you're most likely to send big data than receive, but I'm only guessing.)

Thanks a lot @jpeckham, your workaround helped me.

If anyone can't reproduce the issue, this indicates that the problem might be OS-dependent.

And, by the way, pyca/pyopenssl#176 says that sendall() should not be used at all until fixed because the same exception can occur and will cause loss of state.

@webknjaz, I was originally using the BuiltinSSLAdapter, but the server then failed some tlsfuzzer tests (specifically I think it was test-fuzzed-ciphertext.py) and eventually stopped responding after these fuzzing tests or after a Qualys test. That's why I switched to pyOpenSSLAdapter, which works better in this regard but has the issue with these exceptions, which seems much easier to fix than the fuzzing failures. (Of course, if anyone happens to know why that happens or how to fix it, I'd welcome that information.)

vashek added a commit to vashek/cheroot that referenced this issue Nov 2, 2020
@webknjaz webknjaz changed the title OpenSSL.SSL.WantReadError Unhandled OpenSSL.SSL.WantReadError and WantWriteError raised by PyOpenSSL Nov 2, 2020
@webknjaz
Copy link
Member

webknjaz commented Nov 2, 2020

If anyone can't reproduce the issue, this indicates that the problem might be OS-dependent.

I'd guess, it's more of an OpenSSL backend dependent. It may be pyOpenSSL is a wrapper around cryptography and cryptography is a wrapper around OpenSSL. So it may be compiled against many different OpenSSL versions that may behave differently, or just have different compilation settings in general.

@webknjaz, I was originally using the BuiltinSSLAdapter, but the server then failed some tlsfuzzer tests (specifically I think it was test-fuzzed-ciphertext.py) and eventually stopped responding after these fuzzing tests or after a Qualys test. That's why I switched to pyOpenSSLAdapter, which works better in this regard but has the issue with these exceptions, which seems much easier to fix than the fuzzing failures. (Of course, if anyone happens to know why that happens or how to fix it, I'd welcome that information.)

Interesting. We may want to integrate a tool like that into our test suite one day...

As for why WantReadError and WantWriteError happen: it's usually because some buffers get full and the expectation is that the calling code will do retries.

vashek added a commit to vashek/cheroot that referenced this issue Nov 4, 2020
@vodnokoki
Copy link

Faced with above issue, builtin module is working perfectly, but when audit suggest to use only tlsv1.2 and tls1.3, i didn't now how to implement with builtin, so chanhed logic with pyopenssl and using context allowing only tlsv1.2 and tls1.3. Unhandled exceptions raised when server response big js files, architecture is react+Django.

After applying changes from @vashek everything works perfectly and there is no error on server side. When do you plan to accept above change from @vashek and create release.

@webknjaz
Copy link
Member

webknjaz commented Mar 4, 2021

Faced with above issue, builtin module is working perfectly, but when audit suggest to use only tlsv1.2 and tls1.3, i didn't now how to implement with builtin

I suppose you could use the context setter on the adapter: https://github.com/cherrypy/cheroot/blob/473c546/cheroot/ssl/builtin.py#L258. Just make sure to repeat some logic from the initializer.

so chanhed logic with pyopenssl and using context allowing only tlsv1.2 and tls1.3. Unhandled exceptions raised when server response big js files, architecture is react+Django.

After applying changes from @vashek everything works perfectly and there is no error on server side. When do you plan to accept above change from @vashek and create release.

Good to know!

@vodnokoki
Copy link

I suppose you could use the context setter on the adapter: https://github.com/cherrypy/cheroot/blob/473c546/cheroot/ssl/builtin.py#L258. Just make sure to repeat some logic from the initializer.

In cherrypy documentation it says that context is valid only for pyopenssl module. thanks for your advice and help, it works great with builtin ssl.

@julianz-
Copy link

I moved an old cherrypy web app to a new environment and started getting both WantReadError and WantWriteError errors and had to apply two different fixes to get things to work:

  1. for the read errors I used Rohitbels's fix above.
  2. for the write errors I tried the same idea (safe_call) but this created a new problem - I started getting SSL bad write retry errors. So I used jpeckham's fix for that instead as this resets the number of written bytes back to 0 before the retry (something the SSL routines ask for).

So far this seems to have fixed the problems. Am surprised these things haven't been fixed already considering the age of this issue. Am I missing something?

cherrypy v18.8.0
cheroot v9.0.0
Ubuntu 22.04.2 LTS

@vashek
Copy link

vashek commented Mar 16, 2023

I'm wondering about that, too. Perhaps a Tidelift subscription would help. (I actually checked that out but there's no pricing on the website.)

@julianz-
Copy link

I'm wondering about that, too. Perhaps a Tidelift subscription would help. (I actually checked that out but there's no pricing on the website.)

Yeah I checked after you mentioned it, but Tidelift said their minimum subscription was for a team of 25 developers and even then they wouldn't tell me how that costs but I am guessing $$$$. So not really an option as far as I am concerned. And even then, what is the probability they have fixed this particular issue?

@webknjaz
Copy link
Member

I'm wondering about that, too. Perhaps a Tidelift subscription would help. (I actually checked that out but there's no pricing on the website.)

Yeah I checked after you mentioned it, but Tidelift said their minimum subscription was for a team of 25 developers and even then they wouldn't tell me how that costs but I am guessing $$$$. So not really an option as far as I am concerned.

Tidelift's customers are corporate, not individuals. Their model spins around redistributing some funds acquired from the subscribers across the participating projects' maintainers. AFAIK, they ask the clients to run some software that collects what they use. And the distribution of funds to the maintainers highly depends on the use of said OSS project. My impression is that most projects with low-to-middle popularity get under $50 or, more often, zero (there's some threshold for starting getting some cash for a project). Those funds are per-project and if a project has more maintainers, they are split (usually equally). Say, a project has 2 maintainers, and just crossed the threshold for being lifted, that means they'd be getting like $25 a month each.
Cheroot is in this range, being niche in a way. More popular projects get more, but not huge amounts.

This is to say that Tidelift is a say for businesses to give back to OSS, but (1) it doesn't replace people's salaries, nor (2) is it a good fit for individuals. If you want to support maintainers as a non-business, direct donations may be more impactful. Though, it's typical for people to treat them as little $1 tokens of appreciation rather than full-fledged support.

Another thing that might be unobvious with Tidelift β€” they let the maintainers lead their projects without influencing them too much. For example, they don't file feature requests or stuff like that. Though, they ask us to verify some packaging metadata, licenses, the security of accesses to package indexes etc., as well as communicate some vulnerability details (we can optionally delegate accepting security reports to them).

Note that both maintainers for the CherryPy projects have many other projects going on. Some get more attention, then others, naturally.

I also burn out occasionally, up to various degrees. When this happens, I have much less (or zero) energy for FOSS. Last year, it was rather intense (still is, really) β€” you can see the activity drops in the contribution graph in my profile and how they correspond with pings that accumulate in the old PRs. I'm trying to get to every PR, but sometimes they get lost or stuck for different reasons.

And even then, what is the probability they have fixed this particular issue?

Now, back to this particular issue. I don't remember exactly what blocked @vashek's #332, but I think that originally, it was introducing unrelated changes. They were reverted half a year later, but the notification got lost (maybe, because the PR was marked as a draft). I'll take another look and maybe merge as is β€” I've just clicked β€œrebase” there to see if the CI explodes. Looks like there's a linting violation to fix at least.

I recall that unrelated change sending me off trying to make some fixes upstream, but I didn't succeed implementing tests for them β€” pyca/pyopenssl#954 / pyca/pyopenssl#955. It'd be nice if somebody could help out with those.

I'm also aware of a few seemingly similar issues that would be great to fix. But testing TLS stuff automatically doesn't always work or behaves weird in CI. I've been mostly working on ways of making the CI and local tests as stable as possible.

I think that we need more contributions improving testing of TLS. It's rather hard to make a full matrix since some components are coming from the GHA VMs and not PyPI. With that in mind, it's hard to judge if PRs like #332 would contribute to the flakiness or stability. Hence, this causes indecisiveness on my part, sometimes.

webknjaz pushed a commit to vashek/cheroot that referenced this issue Mar 17, 2023
@julianz-
Copy link

julianz- commented Mar 18, 2023

So further to my previous post, I am still seeing "bad write retry" occasionally although not as often as I was. So, evidently, the fixes I applied have not solved all of the problems although they seem to be less severe than they were. One problem with debugging these errors is that they are hard to recreate reliably. Their occurrence is infrequent and somewhat random. Any suggestions? Here's a typical error trace:

Traceback (most recent call last):
  File "/usr/local/lib/python3.10/dist-packages/cheroot/server.py", line 1313, in communicate
    req.respond()
  File "/usr/local/lib/python3.10/dist-packages/cheroot/server.py", line 1082, in respond
--
  File "/usr/local/lib/python3.10/dist-packages/cheroot/wsgi.py", line 139, in respond
    self.write(chunk)
  File "/usr/local/lib/python3.10/dist-packages/cheroot/wsgi.py", line 221, in write
    self.req.write(chunk)
  File "/usr/local/lib/python3.10/dist-packages/cheroot/server.py", line 1140, in write
    self.conn.wfile.write(chunk)
  File "/usr/local/lib/python3.10/dist-packages/cheroot/makefile.py", line 86, in write
    res = super().write(val, *args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/cheroot/makefile.py", line 29, in write
    self._flush_unlocked()
  File "/usr/local/lib/python3.10/dist-packages/cheroot/makefile.py", line 38, in _flush_unlocked
    n = self.raw.write(bytes(self._write_buf))
  File "/usr/lib/python3.10/socket.py", line 723, in write
    return self._sock.send(b)
  File "/usr/lib/python3/dist-packages/OpenSSL/SSL.py", line 1723, in send
    self._raise_ssl_error(self._ssl, result)
  File "/usr/lib/python3/dist-packages/OpenSSL/SSL.py", line 1637, in _raise_ssl_error
    _raise_current_error()
  File "/usr/lib/python3/dist-packages/OpenSSL/_util.py", line 57, in exception_from_error_queue
    raise exception_type(errors)
OpenSSL.SSL.Error: [('SSL routines', '', 'bad write retry')]

@webknjaz
Copy link
Member

@julianz- it's useful to see a traceback with different call stack. It would be helpful, though, if you could have a branch in a fork with the exact patches applied along with such traces. And if you make changes, maybe make branches with corresponding patches each time so that we know what each traceback corresponds to.
Oh, and the runtime/deps/versions would be good to know too.

It is very important to understand how to reproduce the problem because it would allow the maintainers to validate possible fixes / PRs, especially when there are no tests bundled.

One of the things people here could help with is getting pyca/pyopenssl#955 / pyca/pyopenssl#954 working β€” switching to sendall() would probably let us simplify the codebase, reducing the maintenance burden. The TLS-related code was written over a decade ago, I think. This means that it had to deal with things that might not be relevant today but nobody's around to remember why certain things work the way they do. Having the code entangled doesn't help either.

I'm not experienced with debugging TLS' internals deeply, unfortunately, but I recall @mxii-ca contributing related code and being helpful with debugging/pointers. I'm tagging them in hopes that they may be open to helping out here.
Malcolm, do you know how to properly process WANT_READ_ERROR and WANT_WRITE_ERROR? Any help understanding this would be much appreciated if you have a minute…

@julianz-
Copy link

Thanks webknjaz. Very helpful. Will try to set up a branch/fork with my changes. However, in the meantime I did want to let you know a couple of observations I made doing some experiments. In the modified version of _flush_unlocked() I am using (see below - based on jpeckham's suggestion in this thread, I have also added a short sleep time before the retry which borrows from the way safe_call() in pyopenssl.py handles this error -- although I should add that the safe_call version doesn't include an instruction to restore the buffer which is presumably why I was seeing bad write retry errors when I tried using it?

In my version, though, I tried commenting out the line n=0 in the exception handler, and then I occasionally see bad length errors coming from openSSL instead of bad write retry errors. Restoring that line, and increasing the sleep time to 0.1 seconds instead of 0.01 seconds, I see occasional retries but so far no errors. So it seems pretty clear that recreating these errors depends on some kind of stress testing which is what I think you were doing in pyca/pyopenssl#955 although the title implies it might only be testing for WANT_READ errors? I see some WantWriteError exception handlers in your code so I wasn't sure.

def _flush_unlocked(self):
    self._checkClosed('flush of closed file')
    while self._write_buf:
        try:
            # ssl sockets only except 'bytes', not bytearrays
            # so perhaps we should conditionally wrap this for perf?
            n = self.raw.write(self._write_buf)
        except io.BlockingIOError as e:
            n = e.characters_written
        except (OpenSSL.SSL.WantReadError,OpenSSL.SSL.WantWriteError, OpenSSL.SSL.WantX509LookupError) as e:
            n = 0
            print(e)
            print('retry')
            time.sleep(0.01)
        del self._write_buf[:n]

@mxii-ca
Copy link
Contributor

mxii-ca commented Mar 27, 2023

my understanding is that when SSL_WANT_READ or SSL_WANT_WRITE is encountered, the previous call MUST be repeated with the exact same arguments, hence requiring setting n back to 0 to preserve the buffer for write. openSSL further enforces that the address of the buffer passed to write is the exact same by default. buffers in python can change location in some circumstances so the built-in SSL library disables this check by setting SSL_MODE_ACCEPT_MOVING_WRITE_BUFFER on connections.

pyopenssl does not appear to do this. nor does it expose a constant for this value, however, it should be possible to enable it via Context.set_mode(2). this may help get rid of the remaining bad write retry errors

for more details see:

@webknjaz
Copy link
Member

Thanks for the pointers! I'll have to explore them further...

@julianz-
Copy link

julianz- commented Mar 27, 2023

Thank you mxii-ca! That's great to know. Contrary to my initial impressions, my change to the sleep time in _flush_unlocked() did not get rid of bad write retry errors. While it did seem to make them less frequent, that could have just been the placebo effect. Your explanation makes perfect sense so I have now implemented a change to see whether changing the mode to accept a moving buffer clears the problem once and for all. I wasn't sure the best way to do this but made the following change in OpenSSL/SSL.py:

def __init__(self, method):
        . . .  _(removed lines for clarity)_

        self.set_mode(_lib.SSL_MODE_ENABLE_PARTIAL_WRITE | _lib.SSL_MODE_ACCEPT_MOVING_WRITE_BUFFER)

which replaces the line:

        self.set_mode(_lib.SSL_MODE_ENABLE_PARTIAL_WRITE )

So far this change looks promising (I see occasional retries with zero bad write retry errors) but maybe there is better way or place to do this?

@julianz-
Copy link

Just wanted to follow up by saying the fixes I have applied described above have completely cured the bad rewrite errors I was getting. Can these be incorporated into the official code somehow?

@Kodiologist
Copy link

@julianz- That file is part of pyOpenSSL, so you should probably open a pull request there.

julianz- added a commit to julianz-/cryptography that referenced this issue Aug 18, 2023
This constant was removed in pyca@895de04 but it is still needed to deal with an issue in PyOpenSSL described here cherrypy/cheroot#245 and PR pyca/pyopenssl#1242.
alex pushed a commit to pyca/cryptography that referenced this issue Aug 18, 2023
This constant was removed in 895de04 but it is still needed to deal with an issue in PyOpenSSL described here cherrypy/cheroot#245 and PR pyca/pyopenssl#1242.
julianz- added a commit to julianz-/pyopenssl that referenced this issue Aug 18, 2023
…NG_WRITE_BUFFER

In order to fix issue described here cherrypy/cheroot#245, we need to use this constant that was removed from https://github.com/pyca/cryptography but now restored
@liquidaty
Copy link

liquidaty commented Jan 16, 2024

@Kodiologist @julianz- @mxii-ca

That file is part of pyOpenSSL, so you should probably open a pull request there

Yes, that is a PyOpenSSL file, but the suggestion that this problem lies with PyOpenSSL and not cheroot is absolutely, 100% misguided. It is not PyOpenSSL's job to expose a retry option-- it is cheroot's.

PyOpenSSL just implements OpenSSL, which in turn exposes this error as described at https://www.openssl.org/docs/man1.1.1/man3/SSL_get_error.html, which in turn states: "There is no fixed upper limit for the number of iterations that may be necessary until progress becomes visible at application protocol level"

The correct solution is for cheroot to support a configuration option for retry-- possibly similar to how curl provides retry options-- that, as appropriate, implements a retry of the self.raw.write() call in _flush_unlocked() of makefile.py.

@liquidaty
Copy link

I've figured out a workable solution for handling SSL_WANT_WRITE errors which implements a retry-- similar to what ssl/pyopenssl.py does (note however that although the code in ssl/pyopenssl.py appears to address SSL_WANT_WRITE, it does not, at least in some cases, because it is entirely bypassed when the above self.raw.write() call is made) and consistent with the pyopenssl documentation. I'd be happy to share this, but it's not sufficient-- see more below.

However, I have not been successful in handling SSL_WANT_READ for very large files. It would appear that the problem resides in either cheroot or pyopenssl, given that the issue does not arise when using the built-in SSL (rather than pyopenssl).

So to summarize:

  1. cheroot currently has a SERIOUS BUG that occurs when the 'pyopenssl' adapter is used. In particular, large data transfers for SSL write operations may fail due to improperly handled exceptions. Given that
  2. Either cheroot or pyopenssl currently has a SERIOUS BUG. In particular, when the 'pyopenssl' adapter is used with cheroot, large data transfers for SSL read operations fail due to either i) improperly handled exceptions in cheroot or ii) improperly thrown exceptions in openssl
  3. There appears to be no bug when 'builtin' is used

To address this, either of these should be done:

  1. cheroot should simply disallow 'pyopenssl' to be chosen as the adapter; doing so should raise an error
  2. alternatively, fix cheroot to properly handle SSL_WANT_READ and SSL_WANT_WRITE exceptions, and in the process, expose the related options (retry count, delay, etc) to the configuration interface

It seems like the first option above is easily done today. Any reason not to go ahead with that? Are there any features of using the 'pyopenssl' adapter that have no corresponding equivalent when 'builtin' is used?

@julianz-
Copy link

That file is part of pyOpenSSL, so you should probably open a pull request there

Yes, that is a PyOpenSSL file, but the suggestion that this problem lies with PyOpenSSL and not cheroot is absolutely, 100% misguided. It is not PyOpenSSL's job to expose a retry option-- it is cheroot's.

PyOpenSSL just implements OpenSSL, which in turn exposes this error as described at https://www.openssl.org/docs/man1.1.1/man3/SSL_get_error.html, which in turn states: "There is no fixed upper limit for the number of iterations that may be necessary until progress becomes visible at application protocol level"

The correct solution is for cheroot to support a configuration option for retry-- possibly similar to how curl provides retry options-- that, as appropriate, implements a retry of the self.raw.write() call in _flush_unlocked() of makefile.py.

Don't think we are talking about exposing a new retry option in PyOpenSSL. The retry logic remains in Cheroot. The only change I proposed was to allow PyOpenSSL to call OpenSSL with a buffer that might have moved to a new location. Currently OpenSSL doesn't allow this although it used to. Getting Cheroot to retry sending the buffer is pointless once it's moved to a new location given the way PyOpenSSL currently talks to OpenSSL. Hence the need for small fixes in both PyOpenSSL and OpenSSL and not in Cheroot.

@liquidaty
Copy link

liquidaty commented Jan 20, 2024

@julianz- perhaps changes to pyOpenSSL are needed but without a doubt, changes to cheroot are needed to fix these bugs. The existing cheroot retry code does not even get touched when raw.write() is invoked and so the PyOpenSSL exception just gets immediately raised without any form of handling.

In other words, it appears that the authors of the cheroot retry code might be under the incorrect impression that their retry code is covering all the places it should be covering. It isn't.

@julianz-
Copy link

julianz- commented Jan 20, 2024

@liquidaty I had forgotten that I also made some changes to cheroot/makefile.py along the lines suggested by @jpeckham and @Rohitbels earlier in this thread so yes you are right on this point. The bug has been around so long that it took me a while to figure out what I had done to fix it locally. Specifically, the changes in makefile.py I made are as follows:

def _flush_unlocked(self):  
        self._checkClosed('flush of closed file')  
        while self._write_buf:
            try:
                # ssl sockets only except 'bytes', not bytearrays
                # so perhaps we should conditionally wrap this for perf?
                n = self.raw.write(bytes(self._write_buf))
            except io.BlockingIOError as e:
                n = e.characters_written
            
            #added retry logic
            except (SSL.WantReadError,SSL.WantWriteError, SSL.WantX509LookupError) as e:
                n = 0
            del self._write_buf[:n]

def read(self, *args, **kwargs):
        """Capture bytes read."""
        #val = super().read(*args, **kwargs)
        
        #according to https://github.com/cherrypy/cheroot/issues/245 to prevent WantReadError()
        # use the following alternative
        val = self._safe_call(True, super().read, *args, **kwargs)
        self.bytes_read += len(val)
        return val

Maybe the _safe _call method can be used for the write method too but this didn't work when I tried it originally. With the moving buffer fix maybe it might though I think we still need to reset the number of written bytes back to 0.

The moving buffer fix depends on the release of version 42.0.0 of cryptography which is still under active development with no release date yet. So we are kind of stuck until this becomes available.

@liquidaty
Copy link

liquidaty commented Jan 20, 2024

@julianz- Got it. I can see where those changes are coming from but I think they need further adjustment to work correctly. I think I have half the puzzle solved, but not the whole thing.

On the "write" side, using your changes under my tests:

  • the retry will fail with a "bad write retry" error. This is easy to fix by saving the bytes() result in a variable, and not changing that when retrying
  • I may be wrong but afaict, makefile.py is not specific to OpenSSL or built-in SSL. Therefore, it doesn't know which SSL exception to check for. This is easily fixed by changing the exception handler to just look at the exception type string
  • there is no delay. this isn't strictly necessary, but will result in a LOT of unnecessary cycles. In the example I'm testing on, the average no-delay scenario looped 350k times before proceeding. I would suggest a backoff with an initial delay that defaults to ~0.5 or 1 second, similar to what is described at https://everything.curl.dev/usingcurl/downloads/retry

Putting these together, here is what worked for me:

    def _flush_unlocked(self):
        self._checkClosed('flush of closed file')

        bytes_ = bytes(self._write_buf)    # <=== ADDED to avoid 'bad write retry' in case of retry
        delay_init = 1                     # <=== ADDED. TO DO: make this configurable
        max_delay = 128                    # <=== ADDED. TO DO: make this configurable
        delay = delay_init
        while self._write_buf:
            try:
                # ssl sockets only except 'bytes', not bytearrays
                # so perhaps we should conditionally wrap this for perf?
                n = self.raw.write(bytes_) # <=== CHANGED
            except io.BlockingIOError as e:
                n = e.characters_written
            #added retry logic
            except Exception as e:    	   # <=== CHANGED
                if 'Want' in str(type(e)): # <=== ADDED (probably should make more specific)
                    from time import sleep # <=== ADDED (probably should move to top)
                    if delay <= max_delay: # <=== ADDED
                        sleep(delay)       # <=== ADDED
                        delay = delay * 2  # <=== ADDED
                        continue           # <=== ADDED
                raise e                    # <=== ADDED
                # n = 0                   # <=== REMOVED
            del self._write_buf[:n]
            delay = delay_init             # <=== ADDED
            bytes_ = bytes(self._write_buf) # <=== ADDED

Re read:

  • using your changes, it still fails in my tests because it simply never proceeds to overcome the error, no matter how many retries are performed, no matter how long it sleeps, no matter how long the timeout is. I'm not sure what the correct solution is. I do know that, without your change, using the exact same code but only changing to 'builtin' instead of 'pyopenssl', works fine. So based on that, there seem to be some problem that resides in either cheroot, pyopenssl or both, when it comes to WantReadError.
  • unfortunately I have not found a solution to this problem (other than avoiding pyopenssl and only using builtin).

@julianz-
Copy link

@liquidaty It seems like we are experiencing very different phenomenology. The fixes I described completely solved the errors I was seeing. Under what circumstances do you see these WantReadErrors and WantWriteErrors? How often do you get these errors? For me, the errors occur seemingly randomly and not very often - which made this issue hard to troubleshoot. I haven't found a way to produce them on command. Are you able to reliably reproduce these errors? If so, maybe you could propose a test script?

Regarding your write code, why did you remove the line that sets n to 0? I found this necessary as the retry needs to resend the whole buffer again. Also your retry is not handled by an exception, so if the retry fails your code is going to break no? And having a delay on the order of half a second seems way too long for the kind of retries I am experiencing - the default delay in safe_call is 0.01 seconds which seems more reasonable.

@liquidaty
Copy link

liquidaty commented Jan 21, 2024

@julianz- I can reliably reproduce both errors, but this is in the context of proprietary code so it may take me a while to come up with code to reproduce that I can share. That said, the conditions seem pretty simple so maybe that will be easy

The write error occurs when a request is made by a process that is consuming the response in chunks, and the caller (consumer) cannot quite keep up with the speed of the response. I'm using a file size of 170MB and I'm guessing there is some initial buffer that can absorb the consumer process speed variability up to a point, after which it has to throttle the SSL connection which leads to the WantWrite errors.

The read error occurs with a 170MB file that is POSTed by the client and then attempted to be read in chunks via KnownLengthRFile.read(buffer_size). The WantRead error is thrown during the process of parsing the multi-part format. Oddly, the retry mechanism works fine throughout the reading of the file-- the WantRead error first comes up maybe 40% of the way through, and then continues to come up multiple times, and each time the Read picks back up fine after retry, until the very end, and it appears that at the final chunk it just can't get out of the WantRead exception. I wonder if there is some sticky exception that needs to be cleared, but have not found any documentation to support that. I suspect there should be some sort of SSL_close() or similar on-end handling that is not happening; normally I would expect that to point to a client issue, but the fact that it works fine with 'builtin' does not support that conclusion, so I'm not sure where to look next to examine that possibility.

Both of the above are running all processes (client and server) on the same machine, and occur under older and newer Macs, Intel and Silicon. And in all cases, the issues all go away (and no exception handling is required at all) when using 'builtin' instead of 'pyopenssl'.

@liquidaty
Copy link

liquidaty commented Jan 21, 2024

@julianz-

why did you remove the line that sets n to 0

In the changes I made, that line became moot: either the loop is short-circuited via continue, an exception is raised via raise e, or the write was successful and n is a valid value.

Also your retry is not handled by an exception, so if the retry fails your code is going to break no?

I'm not following your logic-- in my code, the retry is achieved by simply executing continue and going back to the top of the while loop and then entering the try block again.

And having a delay on the order of half a second seems way too long

All the more reason for delay and retry parameters to be configurable

@liquidaty
Copy link

@julianz-

Stepping back further: why bother supporting pyopenssl at all? Wouldn't it be both more stable, and less maintenance, to eliminate that altogether and only support 'builtin'?

@julianz-
Copy link

@liquidaty Stepping back further: why bother supporting pyOpenssl at all? Wouldn't it be both more stable, and less maintenance, to eliminate that altogether and only support 'builtin'?

Maybe so. I haven't tried the builtin module actually so I should give that a try. What was the reason pyOpenSSL.py was added as an alternative? @webknjaz mentioned earlier in this thread he saw some problems with the builtin module - he said it failed some tlsfuzzer tests but I don't know enough about that to know its significance.

@liquidaty
Copy link

What was the reason pyOpenSSL.py was added as an alternative

@julianz- I'm not sure but it's not surprising that it was supported in earlier versions given that back in Py2 days the built-in SSL module didn't work well, and pyopenssl was more popular (and possibly in many cases the only working choice)

@webknjaz
Copy link
Member

The moving buffer fix depends on the release of version 42.0.0 of cryptography which is still under active development with no release date yet. So we are kind of stuck until this becomes available.

It was released 16 hours ago FYI.

julianz- added a commit to julianz-/pyopenssl that referenced this issue Jan 23, 2024
Need v42.0.0 or later of Cryptography as this restored the SSL_MODE_ACCEPT_MOVING_WRITE_BUFFER constant which is required for possible fix per cherrypy/cheroot#245
@webknjaz
Copy link
Member

webknjaz commented Jan 23, 2024

IIRC over a decade ago, there was no good ssl even with py3k so pyOpenSSL covered that need. Though, I can understand why today people may want to have their TLS setup backed by cryptography β€” the releases of anything on the PyPI have a potential to be released fast if needed, without a dependency on CPython's release cycle. Improvements or fixes in CPython aren't accessible as immediately.
It's sad that our current use is buggy, but I think it should be fixed and not dropped.

julianz- added a commit to julianz-/pyopenssl that referenced this issue Jan 26, 2024
# This is the 1st commit message:

Updated SSL.py to fix problem caused by SSL_WANT_READ or SSL_WANT_WRITE errors. 

When SSL_WANT_READ or SSL_WANT_WRITE are encountered, it's typical to retry the call but this must be repeated with the exact same arguments. Without this change, openSSL requires that the address of the buffer passed is the same. However, buffers in python can change location in some circumstances which cause the retry to fail.  By add the setting SSL_MODE_ACCEPT_MOVING_WRITE_BUFFER, the requirement for the same buffer address is forgiven and the retry has a better chance of success.  See cherrypy/cheroot#245 for discussion.
# This is the commit message pyca#2:

fixed format for flake8/black

# This is the commit message pyca#3:

E721 errors raised by flake8
julianz- added a commit to julianz-/pyopenssl that referenced this issue Jan 26, 2024
parent 7f3e4f9
author julianz- <[email protected]> 1692386750 -0700
committer julianz- <[email protected]> 1706310924 -0800

# This is a combination of 3 commits.
# This is the 1st commit message:

Updated SSL.py to fix problem caused by SSL_WANT_READ or SSL_WANT_WRITE errors. 

When SSL_WANT_READ or SSL_WANT_WRITE are encountered, it's typical to retry the call but this must be repeated with the exact same arguments. Without this change, openSSL requires that the address of the buffer passed is the same. However, buffers in python can change location in some circumstances which cause the retry to fail.  By add the setting SSL_MODE_ACCEPT_MOVING_WRITE_BUFFER, the requirement for the same buffer address is forgiven and the retry has a better chance of success.  See cherrypy/cheroot#245 for discussion.
# This is the commit message pyca#2:

fixed format for flake8/black

# This is the commit message pyca#3:

E721 errors raised by flake8

# This is the commit message pyca#5:

Update setup.py

Need v42.0.0 or later of Cryptography as this restored the SSL_MODE_ACCEPT_MOVING_WRITE_BUFFER constant which is required for possible fix per cherrypy/cheroot#245
# This is the commit message pyca#7:

resolved conflicts
julianz- added a commit to julianz-/pyopenssl that referenced this issue Jan 26, 2024
author julianz- <[email protected]> 1692386750 -0700
committer julianz- <[email protected]> 1706310924 -0800

Updated SSL.py to fix problem caused by SSL_WANT_READ or SSL_WANT_WRITE errors.

When SSL_WANT_READ or SSL_WANT_WRITE are encountered, it's typical to retry the call but this must be repeated with the exact same arguments. Without this change, openSSL requires that the address of the buffer passed is the same. However, buffers in python can change location in some circumstances which cause the retry to fail.  By add the setting SSL_MODE_ACCEPT_MOVING_WRITE_BUFFER, the requirement for the same buffer address is forgiven and the retry has a better chance of success.  See cherrypy/cheroot#245 for discussion.
julianz- added a commit to julianz-/pyopenssl that referenced this issue Jan 26, 2024
author julianz- <[email protected]> 1692386750 -0700
committer julianz- <[email protected]> 1706310924 -0800

fix for problem caused by SSL_WANT_READ or SSL_WANT_WRITE errors.

When SSL_WANT_READ or SSL_WANT_WRITE are encountered, it's typical to retry the call but this must be repeated with the exact same arguments. Without this change, openSSL requires that the address of the buffer passed is the same. However, buffers in python can change location in some circumstances which cause the retry to fail.  By add the setting SSL_MODE_ACCEPT_MOVING_WRITE_BUFFER, the requirement for the same buffer address is forgiven and the retry has a better chance of success.  See cherrypy/cheroot#245 for discussion.
julianz- added a commit to julianz-/pyopenssl that referenced this issue Jan 28, 2024
When SSL_WANT_READ or SSL_WANT_WRITE are encountered, it's typical to retry the call but this must be repeated with the exact same arguments. Without this change, openSSL requires that the address of the buffer passed is the same. However, buffers in python can change location in some circumstances which cause the retry to fail.  By add the setting SSL_MODE_ACCEPT_MOVING_WRITE_BUFFER, the requirement for the same buffer address is forgiven and the retry has a better chance of success.  See cherrypy/cheroot#245 for discussion.
julianz- added a commit to julianz-/pyopenssl that referenced this issue Jan 28, 2024
When SSL_WANT_READ or SSL_WANT_WRITE are encountered, it's typical to retry the call but this must be repeated with the exact same arguments. Without this change, openSSL requires that the address of the buffer passed is the same. However, buffers in python can change location in some circumstances which cause the retry to fail.  By add the setting SSL_MODE_ACCEPT_MOVING_WRITE_BUFFER, the requirement for the same buffer address is forgiven and the retry has a better chance of success.  See cherrypy/cheroot#245 for discussion.
julianz- added a commit to julianz-/pyopenssl that referenced this issue Jan 28, 2024
When SSL_WANT_READ or SSL_WANT_WRITE are encountered, it's typical to retry the call but this must be repeated with the exact same arguments. Without this change, openSSL requires that the address of the buffer passed is the same. However, buffers in python can change location in some circumstances which cause the retry to fail.  By add the setting SSL_MODE_ACCEPT_MOVING_WRITE_BUFFER, the requirement for the same buffer address is forgiven and the retry has a better chance of success.  See cherrypy/cheroot#245 for discussion.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something is broken
Projects
None yet
Development

Successfully merging a pull request may close this issue.

10 participants