Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixed state handling of HTTP client connections #2273

Merged
merged 13 commits into from
Nov 21, 2017

Conversation

peterzandbergen
Copy link
Contributor

The _conn: TCPConnection was used before it was opened.
Added an extra state _ConnConnecting for proper handling.

Also fixed _ClientConnection not closing the connection
when not work needed to be done by calling _send_pending
after having received the body part of the HTTP response.

Added one trick pony http server for testing

The _conn: TCPConnection was used before it was opened.
Added an extra state _ConnConnecting for proper handling.

Also fixed _ClientConnection not closing the connection
when not work needed to be done by calling _send_pending
after having received the body part of the HTTP response.
The server gives full control over the response.
fun ref apply(h: TestHelper) ? =>
let urls: Array[URL] =
[
URL.build(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

testing the connection once one should be enough, right?

Copy link
Contributor Author

@peterzandbergen peterzandbergen Oct 11, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is essential that the request is made more than once.

This is the flow of the old code before the fix:

  1. be apply is called with payload 1
  2. apply pushes payload 1 on the _unsent list
  3. _send_pending is called
  4. _send_pending tries to cast _conn as TCPConnection and fails, because _conn is None; it calls _new_conn()
  5. _new_conn creates a new TCPConnection and assigns it to _conn
  6. be apply returns
  7. be apply is called with payload 2
  8. apply pushes the payload on the _unsent list which now contains two payloads
  9. apply calls _send_pending
  10. _send_pending send_pending tries to cast _conn as TCPConnection and succeeds, but _conn is not open yet
  11. _send_pending continues to shift the payloads from the _unsent list and calls request.write, which it thinks succeeds.
  12. when all the request have been processed _send_pending and apply return and some time after that the be _connected behaviour is called and now _conn is really open and ready for use.

It works when you only send one payload, because _send_pending is called again after be _connected is called and _conn is a connected TCPConnection.

This is the scenario:

  1. be apply is called with payload 1
  2. apply pushes payload 1 on the _unsent list
  3. _send_pending is called
  4. _send_pending tries to cast _conn as TCPConnection and fails, because _conn is None; it calls _new_conn()
  5. _new_conn creates a new TCPConnection and assigns it to _conn
  6. be _connected is called and calls _send_pending
  7. _send_pending works fine now, because _conn is actually connected.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Totally true. You need to send at least two requests.
But it would suffice to create 1 url, as it is always the same and just use a collections.Range from 0 to 5 or so to iterate over and send a request on each iteration.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is true, but in my initial test I sent multiple request to different sites and this code made it easier to perform these tests.

I kept it in because there are more bugs in the http package and I may need it later. But I can take it out if you like.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to verify. Will the CI start a new build when I push new commits to the PR branch?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One idea still. Sorry if i seem picky but i did this wrong too many times in the past.

If you bind to port 50000 it might be already taken on someones machine and thus the testsuite might fail although pony is just fine.

My suggestion is to bind to port 0 with the TCPListener and to get the randomly assigned free port back in the TCPListenNotify.connected method from TCPListener.local_address().port. This way you make sure that you will most likely always get a free port and the test suite runs just fine.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Afaik the Ci will build every commit you make to a branch that has a PR open in the ponc repo

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will do that. Good tip. Did not know the feature yet.

Copy link
Contributor Author

@peterzandbergen peterzandbergen Oct 11, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume that you mean the TCPListenNotify.listening method.
The TCPListener.local_address().port returns a strange value. When I set the port value when creating TCPListener to e.g. 50000, then the port has value 20675, when I set it to 5000 port has value 34835. I have no idea why this is. But I assume it is undesired behaviour.

Copy link
Contributor

@mfelsche mfelsche Oct 12, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i sure mean TCPListenNotify.listening.
sorry for the confusion here.

The port from net.NetAddress is actually from the OS sockaddr_in (sockaddr_in6) structure defined in C. This is an unsigned short (U16) in network byte order that needs to be transformed to host byte order before using it. I learned that right now, sorry for that. :)
Here you have a playground link that shows this:

http://playground.ponylang.org/?gist=f783b91faedef2715d4a585f59ec4039

looking forward to merge your PR.

@@ -98,7 +99,7 @@ actor _ClientConnection is HTTPSession
if node()? is request then
try (_conn as TCPConnection).dispose() end
_conn = None
node.pop()?
node.>remove().pop()?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@peterzandbergen would you mind putting spaces around the .> operator to match the style guide?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will fix it

@peterzandbergen
Copy link
Contributor Author

peterzandbergen commented Oct 11, 2017 via email

@peterzandbergen
Copy link
Contributor Author

peterzandbergen commented Oct 11, 2017 via email

- put spaces around .> according to style guide
- reduced the number of http requests to 3
try (server as TCPListener).dispose() end

////////////////////////
primitive ResponseWriter
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please also make sure that all test types are private, that is start with underscore, otherwise they are exposed in the stdlib.

Needed to use the name() method of the NetAddress to retrieve the
service number.
@peterzandbergen
Copy link
Contributor Author

peterzandbergen commented Oct 12, 2017 via email

@Praetonus
Copy link
Member

Praetonus commented Oct 12, 2017

I've cancelled the AppVeyor builds since they were failing after a 1 hour time out. Not really sure what's going on here, maybe some firewall issues with AppVeyor.

@peterzandbergen
Copy link
Contributor Author

peterzandbergen commented Oct 12, 2017 via email

// "Content-Length: 0"
// "Status: 520 Unknown Error"
// ""
// ]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's going on with this commented-out code block?

fun ref tear_down(h: TestHelper) =>
try (server as TCPListener).dispose() end

////////////////////////
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the purpose of this comment line?

@peterzandbergen
Copy link
Contributor Author

peterzandbergen commented Oct 12, 2017 via email

@peterzandbergen
Copy link
Contributor Author

peterzandbergen commented Oct 12, 2017 via email

Support classes for the client are prefixed with _HTTPConnTest.
Fake server uses less classes, preventing name clashes.
@mfelsche
Copy link
Contributor

This is most likely not a firewall issue on appveyor. There are also other tests in the net package that create a TCPListener e.g. https://github.com/ponylang/ponyc/blob/master/packages/net/_test.pony#L159

Maybe it is the TestHelper.dispose_when_done call that keeps resources hanging?
Just a wild guess tbh. But this is the only difference i can spot.

I would nonetheless suggest the decrease the test timeout significantly, so timeouts are much earlier detectable.

@peterzandbergen
Copy link
Contributor Author

peterzandbergen commented Oct 14, 2017 via email

@mfelsche
Copy link
Contributor

mfelsche commented Oct 14, 2017

Gosh, this is getting out of hand. Now i mistook the timings for milliseconds. I always do tbh.
Keep it this way.

I also meant that you should call TestHelper.dispose_when_done(...) for the same resources the mentioned test from the net package calls it. I.e. TCPListener

Appveyor times out on the test and the suspicion is that
the http serve for generating the fixed response should be
closed using dispose_when_done. This works for the test for
TCPConnection.
@mfelsche
Copy link
Contributor

mfelsche commented Oct 16, 2017

I restartet 2 test runs on travisci on OSX that seem to be unrelated but concerning.

@peterzandbergen
Copy link
Contributor Author

peterzandbergen commented Oct 16, 2017 via email

Copy link
Contributor

@mfelsche mfelsche left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@peterzandbergen i am not 100% sure what the issue is with those tests. I hope the comments i made might help. it is most likely a socket not properly closed as the appveyor builds seem to be hanging after the test has been completed successfully.

h = h'

fun apply(session: HTTPSession): HTTPHandler ref^ =>
h.log("_HTTPConnTestHandlerFactory.apply called")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could also call TestHelper.dispose_when_done(session) here instead of adding a dispose() to the HTTPClient, although i think HTTPClient should have such a method. But rather in another PR.

None

fun ref connect_failed(conn: TCPConnection ref) =>
None
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i personally like the approach of https://github.com/ponylang/ponyc/blob/master/packages/net/_test.pony#L202 where multiple TestHelper.expect_action calls are made for each stage towards completion of the test. At points like this, where the test failed, a TestHelper.fail_action call can greatly help pinpointing possible issues.

// Write the response.
if start then
for r in response.values() do
conn.write(r + "\n")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if i am correct, HTTP headers are separated by \r\n.

try
(client as HTTPClient iso)(consume payload, hf)?
end
// match client
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove

end // while
true

fun ref accepted(conn: TCPConnection ref) =>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe a TestHelper.dispose_when_done(conn) on this side helps as well.

@mfelsche
Copy link
Contributor

mfelsche commented Nov 2, 2017

This one is blocked by #2322

@chalcolith
Copy link
Member

Windows tests are fixed by #2325

@peterzandbergen
Copy link
Contributor Author

Thanks for fixing the Windows issue. Does this mean that I can push my pull request again?

@peterzandbergen
Copy link
Contributor Author

The test passes with the fix that Gordon made for the TCPConnection. So after his pull request #2325 has been accepted I will push my fix again for testing on CI.

@jemc
Copy link
Member

jemc commented Nov 16, 2017

#2325 has been merged.

@peterzandbergen
Copy link
Contributor Author

One of the iOS tests in Travis failed because bintray get failed. Will bump a new version this evening to try again.

Copy link
Contributor

@mfelsche mfelsche left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Finally! :)
Just some few final-cleanup remarks.

Great stuff!

@@ -178,6 +180,7 @@ actor _ClientConnection is HTTPSession
`_chunk`. This is passed on to the front end.
"""
_app_handler.finished()
_send_pending()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could you explain why this call is here?
I cannot see how this is related to the _conn state handling.
Isn't this method is called by the HTTPParser when a chunked response body has been fully received?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi Matthias,
this allows the state machine to handle the situation that no both pending and unsent requests are empty. If this line is removed then the connection will not close the connection if no work has to be done, will not dispose the connection and waits and waits .... I realize that this is an extra fix but needed it to get the tests working and not timing out.

@@ -189,9 +192,11 @@ actor _ClientConnection is HTTPSession
"""
Close the connection from the client end.
"""
_cancel_all()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The behaviour of cancelling all pending requests should be documented in the docstring.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will add it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi Matthias, I processed your remarks and pushed a new commit.

@@ -25,6 +29,8 @@ actor Main is TestList
test(_Valid)
test(_ToStringFun)

// Disabled to check if this causes the appveyor block.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove

@@ -53,6 +53,11 @@ class HTTPClient
let valrequest: Payload val = consume request
session(valrequest)
valrequest

fun dispose() =>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please clear the _sessions map as well and add a docstring describing what happens here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will do.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

end // try

be dispose() =>
// try
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do you want to uncomment those lines?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will do.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have to take it back. This was something that I put in to fix the problem on windows. Because it is not needed I will remove it from the code. As you mentioned earlier only one fix per pull request.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This code is not necessary. I put it in hoping it would fix the problem on Windows.

I will take it out.

@mfelsche mfelsche added changelog - fixed Automatically add "Fixed" CHANGELOG entry on merge and removed bug: 0 - blocked labels Nov 17, 2017
@mfelsche mfelsche changed the title Fixed _conn state handling and closing connection Fixed state handling of HTTP client connections Nov 17, 2017
@mfelsche
Copy link
Contributor

I took the freedom to change the title to be more meaningful when appearing in the changelog as "Fixed" entry.

Processed the remarks from Matthias (thanks)

- Added docstrings
- cleared the _sessions in HTTPClient.dispose()
Copy link
Contributor

@mfelsche mfelsche left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approved.

@peterzandbergen
Copy link
Contributor Author

peterzandbergen commented Nov 21, 2017 via email

@jemc
Copy link
Member

jemc commented Nov 21, 2017

@mfelsche - I haven't reviewed this, but if you feel comfortable merging it, feel free to go ahead!

@mfelsche
Copy link
Contributor

@jemc as far as i can see, this is totally fine. If it is not and i didnt see it, i take full responsibility and buy a round for everybody involved

@peterzandbergen
Copy link
Contributor Author

I will share the round with you. No worries.

@jemc jemc merged commit 06f3f65 into ponylang:master Nov 21, 2017
ponylang-main added a commit that referenced this pull request Nov 21, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
changelog - fixed Automatically add "Fixed" CHANGELOG entry on merge
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants