Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Initial C API for hyper #2278

Merged
merged 3 commits into from
Jan 8, 2021
Merged

Initial C API for hyper #2278

merged 3 commits into from
Jan 8, 2021

Conversation

seanmonstar
Copy link
Member

@seanmonstar seanmonstar commented Sep 9, 2020

This provides an initial C API for hyper, motivated by #2265.

This starts with essentially the hyper::client::conn API exposed (server and client pool aren't yet). As part of it's merging, this is considered unstable.

@bagder
Copy link
Contributor

bagder commented Sep 10, 2020

Thanks for this!

(It's a little hard to get into the API for an outsider with zero knowledge of how hyper works.)

Let's dive in. I'm focusing on reading client.c to understand how the API is to be used.

hyper_executor_poll() is called when select() has returned for a particular socket, right? When doing multiple concurrent connections, the mapping between the socket with activity and the corresponding exec handle needs to be done by the application. So for HTTP/2 streams over the same connection, do we simply create multiple 'exec' contexts for the same socket?

Shouldn't hyper also tell the client somehow what to wait for in terms of socket event? Like the socket needs to be found writable when it wants to send the initial request? The current client seems to only set all_fds->write when it gets called to write to the socket and it gets EAGAIN back. But select() won't return for writable before that... ?

@seanmonstar
Copy link
Member Author

(It's a little hard to get into the API for an outsider with zero knowledge of how hyper works.)

Yea, apologies about that. Part of this effort will need to better document how to actually use this API, but not just what the types look like.

do we simply create multiple 'exec' contexts for the same socket?

An exec can process any number of "tasks" at a time. So you can manage multiple connections, multiple transactions with a single exec. The exec is a type that holds onto all registered futures, makes wakers to poll them with, and when polled again, knows which wakers were notified to only poll the relevant tasks.

Shouldn't hyper also tell the client somehow what to wait for in terms of socket event?

This is done at the hyper_io callback level. The callbacks determine any way they want to register their IO and be able to notify the associated waker when ready. This allows a user to pick select, epoll, kqueue, IOCP, some sort of memory channel, whatever.

It might be a bug in my client.c to not have set the all_fds->write initially. That's what I get for having a program that I can't actually run yet...

@bagder
Copy link
Contributor

bagder commented Sep 14, 2020

No need to apologize, I'm mostly pointing it out to highlight where perhaps some extra documentation might be needed down the line!

@sdroege
Copy link

sdroege commented Sep 16, 2020

Doesn't help to figure out how the C API could look like, but you might want to take a look at cargo-c and cbindgen. The former to handle all the boring build/installation parts of a C library, the latter for autogenerating the C header file from the Rust code. cargo-c uses cbindgen by default to create the header.

@bagder
Copy link
Contributor

bagder commented Sep 16, 2020

I think maybe parts of this work is about making a sensible C API that would suit users rather than just a blunt conversion from Rust as easy as possible. I don't think a tool can do that...

@sdroege
Copy link

sdroege commented Sep 16, 2020

I don't think a tool can do that...

That's also what none of the tools does. I just mentioned them so you can focus exactly on that part once you start implementing/prototyping instead of boring build questions and extern "C"/#[repr(C)] to C header translations and keeping those in sync.

@seanmonstar
Copy link
Member Author

As an update: there is now sufficient Rust FFI code in the hyper-capi crate to make the client.c file compile and run, sending a request to httpbin, and getting response headers back.

The next piece I'll be tackling is the body API, to be able to stream the response body (and send a request body).

hyper-capi/README.md Outdated Show resolved Hide resolved
Copy link
Contributor

@bagder bagder left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The use of hyper_str for strings is not very "C style" but I think I can live with it. Just two points about it apart from the compiler warning:

  1. A string in C typically means it cannot contain zero bytes but these ones probably can, right? It then needs proper documentation about that detail since in C programs we will likely convert to/from actual C strings with such limitations...
  2. The use of hyper_str is still inconsistent in the current incarnation where some functions still take a uint8_t * + size_t length and some take a hyper_str *.

hyper-capi/examples/client.c Outdated Show resolved Hide resolved
hyper-capi/include/hyper.h Outdated Show resolved Hide resolved
@seanmonstar
Copy link
Member Author

seanmonstar commented Sep 30, 2020

The use of hyper_str for strings is not very "C style"

Yea, I initially hoped it would help with functions that needed multiple strings (like passing a name and value), and as a return value. I had hoped this was possible (and nicer, giving pseudo named arguments), but it's not:

hyper_headers_set(headers, hyper_str { .buf = "foo", .len = 3 }, hyper_str { .buf = "bar", .len = 3 });

I also assumed it'd be the best way to allow a hyper_headers_get(headers, name) return value. Perhaps the more "C style" is to pass in "out" arguments? hyper_header_get(headers, name, name_len, &value, &value_len)?

Generally, I notice C prefers null-terminated strings, and Rust prefers a pointer-len pair. All the string slices inside hyper follow the Rust norm. There is std::ffi::{CStr, CString}, which will copy into a new buffer and add a null byte on the end. However, it seemed requiring that copy for all slices would be undesirable to be forced on all users of the C API... If a user needs it, they could make a copy themselves, and if they don't, the copy is saved.

@comex
Copy link

comex commented Sep 30, 2020

Yea, I initially hoped it would help with functions that needed multiple strings (like passing a name and value), and as a return value. I had hoped this was possible (and nicer, giving pseudo named arguments), but it's not:

hyper_headers_set(headers, hyper_str { .buf = "foo", .len = 3 }, hyper_str { .buf = "bar", .len = 3 });

It is possible; you just need to put the type in parentheses: (hyper_str) { .buf = "foo", .len = 3 }

This is a C99 compound literal. The version without parentheses is actually valid in C++20, but not C.

@bagder
Copy link
Contributor

bagder commented Oct 1, 2020

Don't assume everyone can and will use such statements though.

In my case, I have a code base written in C89 so even though this API will require C99, my code will not and we will not be doing these "modern" fancy compound literals.

hyper-capi/include/hyper.h Outdated Show resolved Hide resolved
@bagder
Copy link
Contributor

bagder commented Oct 5, 2020

Can I also ask for a hyper_version() call (the name can of course be different) that can return the version of the currently used Hyper?

@bagder
Copy link
Contributor

bagder commented Oct 5, 2020

Two questions on the sample client:

  1. Since the EXAMPLE_HANDSHAKE state is the first state set just before the loop, can't all this logic done in this state within the loop then just as well be done before the loop starts?
  2. Having the client use separate states for sending and body confuses me. A server can very well start sending the response while the request is in progress, they can't be mutually exclusive! So, shouldn't they then be a single state? And without "handshake" as a state, then we're down to a single state...

@bagder
Copy link
Contributor

bagder commented Oct 5, 2020

For some reason, the example client fails to download my local test file that's 17MB big, or anything that is larger. Small files seems to be fine.

If I strace the hang, I can see it read/write for a while (512KB buffers?! wow) and then just stall on a select(). Highly reproducible.

@seanmonstar
Copy link
Member Author

hyper_version()

Sure. Is it for checking functionality, and thus return an integer of the version, or for display purposes, and so just wants a string? Or maybe both?

EXAMPLE_HANDSHAKE state

Yep, it originally was done before the loop, since an HTTP/1 handshake is ready immediately. I moved it into the loop once I got state tracking in the tasks, since HTTP/2 or 3 handshakes could require IO (if wanting to check an h2 prior knowledge before sending the request, for example). That difference is simply just setting a clientconn_option (not exposed just yet).

A server can very well start sending the response while the request is in progress,

Yep, and hyper supports that. The send task resolves as soon as the server gives back response headers. The continued writing of a request body (which would be set on the hyper_request *) can still be happening while the response body is read back. The response body is only read off the transport as its polled (like with hyper_body_foreach), otherwise backpressure is applied by leaving it in the transports buffer.

fails to download my local test file that's 17MB big,

Sounds weird to me, I'll give that a try.

(512KB buffers?! wow)

Heh, that's part of hyper's "adaptive read buffer", that will try to improve throughput by increasing the read buffer if after a few reads the full buffer was filled each time, up to a configurable maximum. When on localhost, that'll reach the max pretty quickly. We can expose that know on hyper_clientconn_options.

@seanmonstar
Copy link
Member Author

fails to download my local test file that's 17MB big,

Found the issue. The executor implementation (FuturesUnordered) recently received "starvation protection", forcing a yield after 32 iterations inside its poll. This was meant to allow systems to check other tasks. However in hyper-capi, the FuturesUnordered is the entire executor, it's not sharing with other tasks, so I added a check for this condition, and now I can download larger files.

@bagder
Copy link
Contributor

bagder commented Oct 7, 2020

You might be on top of this already and I might be totally premature here (so feel free to ignore and just give this the amount of attention you think it deserves), but something in the current setup/example is not really optimal for performance. I'm mentioning it here anyway so that you're at least aware of it:

After the fix mentioned above was merged, I modified the example to download 8GB on Linux from my localhost HTTP/1.1 server (to /dev/null). On my particular host this took 150 seconds. My curl command line tool does the same transfer in slightly over 2 seconds.

@bagder
Copy link
Contributor

bagder commented Oct 7, 2020

hyper_version()

Sure. Is it for checking functionality, and thus return an integer of the version, or for display purposes, and so just wants a string? Or maybe both?

I was thinking primarily for displaying purposes so that users can tell/show what they're using. But maybe version checking will be interesting too...

@seanmonstar
Copy link
Member Author

something in the current setup/example is not really optimal for performance

I haven't done benchmarking of this example vs the top level example/client.rs, but that time difference sounds off. My first thought is that cargo build doesn't do any optimizations by default, only with cargo build --release, and then probably also needing to set optimizations in gcc. But if you're already doing that, then something else is really off...

@bagder
Copy link
Contributor

bagder commented Oct 7, 2020

Yes, this is probably something else than just optimization enabled or not, which is why I thought I'd mention it. It feels like there's a mistake somewhere.

@seanmonstar
Copy link
Member Author

Just to double check, I grabbed a localhost server with a 500mb file (my Linux host is rather limited), I ran time client 127.0.0.1 3000 >/dev/null both with and without optimizations.

  • Without, 24.4s.
  • With, 0.6s.

I had to run cargo build --release, but then also noticed I needed to update the Makefile to link to target/release instead of target/debug to finally see the improvement. So probably worth adding a client-debug and client-release target.

@bagder
Copy link
Contributor

bagder commented Oct 8, 2020

Gosh! Okay, then I probably just saw that. I could not even imagine that huge differences between debug and release. I'll rerun my tests soon to verify. Thanks!

@bagder
Copy link
Contributor

bagder commented Oct 8, 2020

Yeps, confirmed. The debug version is indeed 75 times slower for me...

hyper-capi/examples/client.c Outdated Show resolved Hide resolved
hyper-capi/examples/client.c Outdated Show resolved Hide resolved
hyper-capi/src/lib.rs Outdated Show resolved Hide resolved
hyper-capi/include/hyper.h Outdated Show resolved Hide resolved
@bagder
Copy link
Contributor

bagder commented Oct 12, 2020

hyper_headers_set: when I set the header Host: 127.0.0.1, hyper sends it off as host: 127.0.0.1 even when used over HTTP/1.1. Is this a hyper-c issue or a generic Hyper issue? (I'm not really sure how to deal with issues I find when working on this new C API). I am of the strong opinion that it isn't hyper's job to modify the headers I tell it to use. HTTP headers are by definition case insensitive but there are known issues with many server side implementations that will assume and insist on specific casing of headers. Those are server issues but I want my client to be able to work with them nonetheless.

@lnicola
Copy link
Contributor

lnicola commented Oct 12, 2020

title-case headers

See #1492 for context. This is available in the hyper API, but doesn't seem to be exposed yet in the C bindings.

Note that the header case isn't preserved AFAICT. They'll be emitted in title-case if you enable that option, but custom headers set in the command line will also be modified.

@bagder
Copy link
Contributor

bagder commented Oct 12, 2020

I would really like to see an additional (separate?) client example that receives data in a streaming manner and one that sends (POST/PUT) in a streaming manner, to help me figure out how the API is to be used for that.

@bagder

This comment has been minimized.

@bagder
Copy link
Contributor

bagder commented Oct 13, 2020

Note that the example code doesn't build anymore after the recent API changes.

@seanmonstar
Copy link
Member Author

I just noticed that as well, and fixed that. The cbindgen step is manual at the moment, I'll be adding a CI task to verify it's been checked for a commit.

@bagder
Copy link
Contributor

bagder commented Nov 25, 2020

🍾 - hooray, thanks @seanmonstar!

@seanmonstar seanmonstar force-pushed the hyper-capi branch 5 times, most recently from b0dd4da to 2ce5952 Compare January 8, 2021 01:19
This adds an internal ability to copy the HTTP/1 reason-phrase and place
it in the `http::Extensions` of a response, if it doesn't match the
canonical reason. This could be exposed in the Rust API later, but for
now it is only used by the C API.
@seanmonstar seanmonstar marked this pull request as ready for review January 8, 2021 01:23
@seanmonstar
Copy link
Member Author

This has been squashed and cleaned up as we prepare to merge to master. As part of the merge, I've marked the API as unstable, requiring a special cfg and documenting it as such. There are still some things to figure out, but I'll file follow up issues regarding those. I'm aiming to click "merge" tomorrow.

@bagder
Copy link
Contributor

bagder commented Jan 8, 2021

I'm all buckled up and ready! 😁

@seanmonstar seanmonstar merged commit c9c46ed into master Jan 8, 2021
@seanmonstar seanmonstar deleted the hyper-capi branch January 8, 2021 18:25
@seanmonstar seanmonstar mentioned this pull request Nov 16, 2021
4 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants