Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix faulty write buffer behavior when sending large data chunks over TLS (Mac OS X only) #150

Merged
merged 3 commits into from
May 2, 2020

Conversation

clue
Copy link
Member

@clue clue commented Apr 30, 2020

This changeset fixes faulty write buffer error handling when sending larger data chunks over TLS (HTTPS) streams as exhibited on Mac OS X only. This component will now only report a write error when nothing could be sent and and error has been reported. This indicates a fatal error condition, whereas all other cases mean that a write should be retried according to how the loop reports the stream as writable.

Supersedes / closes #149, thanks @WyriHaximus and @mpociot for reporting and filing an initial work-around.
Refs #105, #132 and reactphp/http-client#109

@clue clue added the bug label Apr 30, 2020
@clue clue added this to the v1.1.1 milestone Apr 30, 2020
@clue clue changed the title [WIP] Fix faulty write buffer behavior on Mac OS X Fix faulty write buffer behavior when sending large data chunks over TLS (Mac OS X only) Apr 30, 2020
@clue
Copy link
Member Author

clue commented Apr 30, 2020

This is now ready for review :shipit:

The build output is a bit harder to read because Mac OS X build errors are being ignored (#112). The first commit and build introduces a reproduction case and fails as expected. The second commit fixes this issue as documented above and successfully builds also on Mac OS X. The last commit simplifies the Mac OS X test setup slightly (see also #147 and #112).

Again thanks to @WyriHaximus and @mpociot for reporting and filing an initial work-around in #149. I've had a chat earlier today with @mpociot to verify my assumption and we've been able to reproduce and fix this some minimal changes to the existing test suite and code base. I have intentionally pushed this as multiple commits to highlight this exact issue.

@clue clue mentioned this pull request Apr 30, 2020
@WyriHaximus WyriHaximus requested review from WyriHaximus and jsor April 30, 2020 17:34
trowski added a commit to amphp/byte-stream that referenced this pull request Apr 30, 2020
- PACKAGE: php70
before_install:
- curl -s http://getcomposer.org/installer | php
- mv composer.phar /usr/local/bin/composer
allow_failures:
- php: hhvm-3.18
- os: osx
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we remove it from allowed failures as well? Since this now includes a fix for Mac OS of significant impact I would feel more secure that we secure we force it to also pass on OS X

Copy link
Member Author

@clue clue Apr 30, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes we can.

The test suite works just fine on all platforms (including Mac OS X) before applying this PR and thereafter. I don't think this PR has any effect on whether Mac OS X build failures should be reported, but I can see that this could be useful in the long run.

We've put this on allow_failures because Travis' Mac OS X integration felt somewhat beta (#112), but I don't think we've seen significant problems since then, so I'm fine with it either way.

What do you think?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed 👍

@trowski
Copy link

trowski commented Apr 30, 2020

When this issue first presented itself, I recall attempting to use the error handler as well, so I was very curious to see why that approach appeared to work here but not in Amp.

I committed a similar approach in a branch of Amp's byte-stream package: amphp/byte-stream@655ea5c. However, I'm getting test failures now due to EAGAIN in 7.3 and below under some conditions: macOS Travis build. IIRC, that is why I didn't use the error handler approach.

@clue
Copy link
Member Author

clue commented May 1, 2020

When this issue first presented itself, I recall attempting to use the error handler as well, so I was very curious to see why that approach appeared to work here but not in Amp.

I committed a similar approach in a branch of Amp's byte-stream package: amphp/byte-stream@655ea5c. However, I'm getting test failures now due to EAGAIN in 7.3 and below under some conditions: macOS Travis build. IIRC, that is why I didn't use the error handler approach.

Interesting to see how you've tried to apply the same concept to Amp and how the behavior differs from what we're seeing here 👍 I took a quick glimpse and it looks like the major difference is that you're performing multiple fwrite() calls in a single writable event whereas we concatenate all outgoing data and only fwrite() once when the loop indicates a writable event. Using this approach, we can not produce any problems any more, both in our test suite and some real-world tests, so I think this is a reasonable improvement over the old situation for us.

Happy to help if you have any questions, but I'd like to ask you to please give credit where credit is due (we're looking at >6 hours of work in this PR).

@trowski
Copy link

trowski commented May 1, 2020

I took a quick glimpse and it looks like the major difference is that you're performing multiple fwrite() calls in a single writable event whereas we concatenate all outgoing data and only fwrite() once when the loop indicates a writable event.

I appreciate you taking a look into what was different here. I didn't realize React was concatenating writes. Since our abstraction has separate promises for each write, we were also performing each fwrite separately. I took the simple approach for a fix by only failing if the first write fails: amphp/byte-stream@0dd0eb8

Happy to help if you have any questions, but I'd like to ask you to please give credit where credit is due (we're looking at >6 hours of work in this PR).

Not a problem. I also spent hours debugging when I first discovered the issue through our HTTP server. I put a link to this issue next to the related code and on the added test.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants