Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replace hyper with reqwest #330

Merged
merged 4 commits into from
Jun 18, 2019
Merged

Conversation

spl
Copy link
Contributor

@spl spl commented Mar 16, 2019

This is an attempt to replace hyper with reqwest.

I'm not familiar with either package, and I'm still pretty new to Rust, so I'm posting a draft PR to get feedback before finalizing it. Also, I left in my intermediate commits, but in the end, it would probably be better to squash them before merging.

My goal is to bring in a newer HTTP dependency than hyper 0.10. First, reqwest seems pretty decent for what's being done in Tectonic. Second, it will allow for a simple solution to introducing a proxy (#59).

Some highlights:

  • I removed some intermediate variables (e.g. client, req) where it didn't seem like they added much to the code.
  • I added an error case UnexpectedHttpResponse for several errors. I'm not sure if I did everything correctly there.
  • The range stuff is a bit different now, thanks to the apparently new headers package. Do the tests cover this? How could it be tested?
  • The redirect policy is also different, thanks to reqwest's approach. Can you verify that the logic is the same? Perhaps a test would be good for this?
  • I chose an arbitrary MAX_HTTP_REDIRECTS_ALLOWED. You may want to do something different here.

@pkgw
Copy link
Collaborator

pkgw commented Mar 18, 2019

@spl Thanks for undertaking this! After a quick skim, it looks like you're very much on the right track here. No major comments about what's in the diffs thus far.

As for testing … Tectonic's testing for this kind of thing is not as thorough as I'd like. To test this particular subsystem well, I'm pretty sure you'd need to stand up your own local HTTP server and hardcode various scenarios. That wouldn't be impossible, but not easy either. Might be worth taking a look at other comparable crates and see if they have any clever approaches to testing that we can borrow.

@pkgw pkgw self-assigned this Mar 18, 2019
@Mrmaxmeier
Copy link
Contributor

Just an observation: This update adds a bunch of new dependencies from the async/tokio ecosystem.
This has a noticeable effect on the size of the tectonic binary.

build debug release
master (hyper 0.10) 42M 6.2M
reqwest 69M 8.6M
without http 29M 4.9M

It might be nice to feature-gate the networking code with an on-by-default flag.

@pkgw
Copy link
Collaborator

pkgw commented Mar 20, 2019

@Mrmaxmeier That's a good point to raise. I'm surprised that tokio/async adds that much to the binary size ... isn't that stuff all supposed to compile down nice and efficiently? But reqwest certainly pulls in a lot of dependencies. Does anyone know if we could simply upgrade hyper and not pull in all of the extra stuff reqwest needs?

@spl
Copy link
Contributor Author

spl commented Mar 20, 2019

Based on a rough diff of hyper and reqwest, I think these are the dependencies that reqwest has in addition to hyper's (some are optional for hyper but not for reqwest):

  • base64
  • encoding_rs
  • hyper
  • libflate
  • mime
  • mime_guess
  • serde
  • serde_json
  • serde_urlencoded
  • tokio
  • tokio-executor
  • tokio-threadpool
  • tokio-timer
  • url
  • uuid
  • hyper-old-types optional
  • hyper-rustls optional
  • hyper-tls optional
  • native-tls optional
  • rustls optional
  • tokio-rustls optional
  • trust-dns-resolver optional
  • webpki-roots optional

Does anyone know if we could simply upgrade hyper and not pull in all of the extra stuff reqwest needs?

I'm sure we could. It probably won't be as convenient, esp. for things like redirect handling and proxy handling.

Are there any other HTTP client crates that should also be considered?

@spl
Copy link
Contributor Author

spl commented Mar 20, 2019

This StackOverflow post presents a similar concern about binary size and one answer mentions some smaller-sized HTTP clients. I've looked through them, but I'm not seeing one that is going to be better and better maintained than hyper or reqwest.

@spl
Copy link
Contributor Author

spl commented Mar 20, 2019

On redirect handling: It seems like there is no plan to implement this in hyper (hyperium/hyper#1719), so tectonic would have to do it manually. There is a crate called follow-redirects for hyper, but it does not expose a way to customize the policy.

On proxy: There is a hyper-proxy crate that might work.

@spl
Copy link
Contributor Author

spl commented Mar 20, 2019

I've done a bit more research on hyper. As I said, I'm new to this, so these notes are for me as much as they are for anyone else:

hyper v0.12 has a runtime Cargo feature. It is enabled by default. When enabled, it uses (as of v0.12.25) futures-cpupool, net2, tokio, tokio-executor, tokio-reactor, tokio-tcp, tokio-threadpool, and tokio-timer, as a “run time.” (I'm not quite sure how “run time” is defined here, but it seems to be pretty central to the operation of hyper.)

It is possible to disable the runtime feature and avoid depending on all of these. That aspect is not documented (hyperium/hyper#1613). However, there is a comment on using a single-threaded runtime with tokio::runtime::current_thread::Runtime. And, as I just discovered, it appears that reqwest does this. I don't know if disabling the runtime feature and using tokio::runtime::current_thread::Runtime gains anything in terms of reduced dependencies: I haven't tried to figure out how the tokio dependencies work.

@spl
Copy link
Contributor Author

spl commented Mar 22, 2019

I looked at curl, a libcurl wrapper, as an alternative. Using it directly adds too much work (IMHO), because the crate does not handle HTTP response headers. There is a crate chttp, which wraps curl and provides a higher-level HTTP client interface at the level of reqwest. It supports a proxy but is missing a customizable redirect handler.

I'm leaning toward biting the bullet on the larger binary size and using reqwest for the simple redirect and proxy handling. That seems worth more to me than trying to use hyper without tokio (or trying to use another crate) and writing more code specific to tectonic.

As for a Cargo feature to control the HTTP dependencies, I don't know what that would look like. I suppose, if disabled, would tectonic be restricted to a file cache?

Thoughts? What should be the next step?

@pkgw
Copy link
Collaborator

pkgw commented Mar 25, 2019

I also lean towards biting the bullet on this one. I'd like to keep the binary small, but in my view it's more important to deliver good functionality, and adding the ability to honor HTTP proxy settings would be valuable. It looks like there's a gap in the ecosystem for a get-me-the-damn-file HTTP client crate that supports things like redirects and proxies (thanks for researching this, @spl) — if one comes along, we can definitely revisit this in the future.

If we added a feature to disable HTTP, people could use the local cache and/or locally stored bundle files, although there's very little in the way of infrastructure to support the latter. (In principle you can build your own following the instructions in the tectonic-staging repo, but I'm not aware of anyone having done so.) One thing that I'd like to do is add more CLI tooling support for some of this stuff — I have an idea that maybe the right way to do things here is have something like the rustc/cargo split, where the core compiler is one executable, and then there's a separate higher-level tool that provides the main user interface to the functionality.

@pkgw
Copy link
Collaborator

pkgw commented Mar 25, 2019

@spl BTW as far as you're concerned, is this still a "draft" PR, or is this ready for testing and review?

@spl
Copy link
Contributor Author

spl commented Mar 25, 2019

BTW as far as you're concerned, is this still a "draft" PR, or is this ready for testing and review?

I think the basic change is done, assuming I got my logic right. My current thoughts on what needs to be done:

  • There are one or two remaining comments in the code mentioning “hyper” that should be updated to say something else. Can you take a look at those, @pkgw?
  • There's src/io/hyper_seekable.rs, which isn't used. Should that be removed from the repository? I suppose you could put it in a Gist if you want to keep it around.
  • I think the dependencies should be updated to master (so, probably, master merged in) and tested with this branch.

I can look at it again tomorrow. I checked the “allow edits from maintainers” checkbox if you want to push directly to this branch. (But preferably no force-pushes since it's now shared.)

@spl spl marked this pull request as ready for review March 25, 2019 15:19
@spl spl mentioned this pull request Mar 26, 2019
@spl
Copy link
Contributor Author

spl commented Mar 27, 2019

There is an (allowed) error building with docker for the x86_64-alpine-linux-musl target:

error: cannot produce proc-macro for headers-derive v0.1.0 as the target x86_64-alpine-linux-musl does not support these crate types

This seems to be due to the use of proc-macro2 in headers-derive:

[dependencies]
proc-macro2 = "0.4"

Researching the issue, I came upon #259, which is related to the same problem but with serde-derive.

I'm not aware of any way of using headers without its dependency headers-derive. The use of headers is rather small. It's of course nice to have the type safety, but headers could be dropped in favor of a string header.

I'm not sure, but there might have been some recent work toward a resolution of this problem in rust-lang/rust#58575 (tracking issue: rust-lang/rust#59302).

@malbarbo: I see that you commented there. Could you help us understand this?

@sanmai-NL
Copy link

@spl: See rust-lang/cargo#5266 for the upstream issue. Setting the environment variable RUSTFLAGS="-C target-feature=-crt-static" (spot the minus symbol) works around it.

@pkgw
Copy link
Collaborator

pkgw commented Mar 31, 2019

@sanmai-NL Thanks for the pointer. If we set the RUSTFLAGS in the way you suggest, does that mean that the resulting Tectonic binary will not be fully static? I'd like to be able to keep on producing static binaries if possible ... especially since, as I understand it, the limitation is on the compiler proc-macro infrastructure, and not actually on the code directly inside Tectonic.

@spl
Copy link
Contributor Author

spl commented Apr 1, 2019

@sanmai-NL Thanks. I did read that issue but didn't understand it at the time. Now, having read some of RFC 1721, I think I get the gist of it.

The important part is this, I believe:

Specifying C runtime linkage via -C target-feature=+crt-static or -C target-feature=-crt-static. This extends -C target-feature to mean not just "CPU feature" ala LLVM, but "feature of the Rust target". Several existing properties of this flag, the ability to add, with +, or remove, with -, the feature, as well as the automatic lowering to cfg values, are crucial to later aspects of the design. This target feature will be added to targets via a small extension to the compiler's target specification.

To clarify what I understand, the x86_64-alpine-linux-musl target declares that everything must be statically linked. But RUSTFLAGS="-C target-feature=-crt-static" tells rustc to make an exception to that declaration and dynamically link libc. So, the conclusion is that everything is statically linked except for libc, which is dynamically linked.

@codecov
Copy link

codecov bot commented Apr 1, 2019

Codecov Report

Merging #330 into master will decrease coverage by 2.5%.
The diff coverage is 0%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #330      +/-   ##
==========================================
- Coverage   42.12%   39.62%   -2.51%     
==========================================
  Files         135      135              
  Lines       63366    59506    -3860     
==========================================
- Hits        26694    23579    -3115     
+ Misses      36672    35927     -745
Impacted Files Coverage Δ
src/config.rs 6.32% <ø> (-4.52%) ⬇️
src/io/mod.rs 70.52% <ø> (-0.8%) ⬇️
src/errors.rs 36.17% <0%> (+1.32%) ⬆️
src/io/itarbundle.rs 0% <0%> (ø) ⬆️
tectonic/xetex-io.c 55.67% <0%> (-8.34%) ⬇️
tectonic/xetex-XeTeXFontMgr.cpp 27.24% <0%> (-6.95%) ⬇️
src/engines/mod.rs 65% <0%> (-6.8%) ⬇️
tectonic/xetex-XeTeXLayoutInterface.cpp 27.92% <0%> (-5.41%) ⬇️
tests/executable.rs 21.71% <0%> (-5.32%) ⬇️
... and 26 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 9f0ca36...70c6ce1. Read the comment docs.

@spl
Copy link
Contributor Author

spl commented Apr 1, 2019

Since I don't think it's easy to resolve the proc-macro issue and keep static builds on musl, I took the liberty of cleaning up this PR to remove the headers dependency that was causing the problem. I also removed hyper_seekable and a comment about hyper headers. (If you want to keep hyper_seekable around, perhaps something like a Gist would be better?)

As far as I'm concerned, this PR is ready to go.

@mati865
Copy link

mati865 commented Apr 1, 2019

To clarify what I understand, the x86_64-alpine-linux-musl target declares that everything must be statically linked. But RUSTFLAGS="-C target-feature=-crt-static" tells rustc to make an exception to that declaration and dynamically link libc. So, the conclusion is that everything is statically linked except for libc, which is dynamically linked.

It's not entirely true, see rust-lang/rust#55566.
Once that gets merged x86_64-alpine-linux-musl will truly statically link libs, passing -C target-feature=-crt-static will revert to current behaviour.

@spl
Copy link
Contributor Author

spl commented Apr 1, 2019

It's not entirely true, see rust-lang/rust#55566.
Once that gets merged x86_64-alpine-linux-musl will truly statically link libs, passing -C target-feature=-crt-static will revert to current behaviour.

@mati865 Thanks for chiming in with that. The summary in the last comment, rust-lang/rust#55566 (comment), if true, sounds attractive. However, the gist of preceding comments seem to imply that the PR'd solution was not preferred over the one in rust-lang/rust#58575, to which I linked above. But I'm sure I'm missing a lot of context here.

That said, for Tectonic, I think it would be better to skip on the issue for now. If change comes later to the musl target, Tectonic can always be updated to take advantage of it then. (After all, we're only talking about a few lines of HTTP header type-safety. What can go wrong? 🙄)

@mati865
Copy link

mati865 commented Apr 1, 2019

However, the gist of preceding comments seem to imply that the PR'd solution was not preferred over the one in rust-lang/rust#58575, to which I linked above. But I'm sure I'm missing a lot of context here.

Discussions about musl target desing were scattered all over the place so it's hard to get what is going on.

The summary in the last comment, rust-lang/rust#55566 (comment), if true, sounds attractive.

That's the current standing. It won't fix Procedural Macros though.

Later versions of hyper (0.11, 0.12) have dropped a number of features,
and reqwest has picked up those features. This change migrates the HTTP
code from using an old version of hyper to using the latest version of
reqwest.

Other changes:
* Remove old and unused hyper_seekable
* Add UnexpectedHttpResponse
@pkgw
Copy link
Collaborator

pkgw commented Jun 15, 2019

@spl I've locally rebased this onto current master; mind if I force-push to your branch? I think it's time to go ahead and merge this.

@spl
Copy link
Contributor Author

spl commented Jun 16, 2019 via email

@pkgw
Copy link
Collaborator

pkgw commented Jun 16, 2019

Ah, it seems that the static target once again has a proc-macro issue, due to failure_derive, which is pulled in by cookie_store.

@pkgw
Copy link
Collaborator

pkgw commented Jun 16, 2019

Ohhhhhhh — I figured something out about the MUSL issue. It might have been obvious to other people, but it wasn't to me, so in case it comes in handy:

What I didn't get, until now, is that one of the key facets of the proc-macro problem is the distinction between the cross-compilation use case and the ... non-cross-compilation one. Proc-macro crates have to be dynamically linked into the compiler. When you're cross-compiling, that means that they're built for the build platform/toolchain not the host platform/toolchain. So if you're cross-compiling for MUSL from a regular Linux host, things work fine. It gets sticky when, as our build system currently tries, both "host" and "build" are static/MUSL.

I am pretty sure that if we could set things up to compile the static executable via cross-compilation, it would work. I don't know if we can set things up to be able to do that while also pulling in external static libraries such as Freetype.

(edit: note that here I am using GNU terminology, where "build" is the machine that you're building on and "host" is the machine your program will run on. In Rust-land, it seems like the common terminology is that "host" is the machine you're building on and "target" is the machine it will run on. [In GNU-land, "target" is the architecture that the program you're building will target, if it happens to be a compiler of some sort. If you cross-compile a cross-compiler, the three platforms might all be different.])

@pkgw
Copy link
Collaborator

pkgw commented Jun 17, 2019

12 hours later ... it works! I have a proof-of-concept that creates a static executable and includes the procedural macro features. So @spl go ahead and put headers feature back if you'd like.

Bad news #1: the executable that I'm creating right now weighs in at 90 MB.

Bad news #2: I couldn't figure out a way to get it working that didn't involve manually cross-compiling all of our system deps in the musl target, so that's fun.

It's not pretty, but this is a problem that a lot of people have been struggling with: see for instance emk/rust-musl-builder#65, rust-lang/rust#36710 . So I'm pretty happy that I figured something out here. Code forthcoming ...

pkgw added 2 commits June 17, 2019 17:51
After much wailing and gnashing of teeth, I think I have an approach that
fixes the static Linux build in the face of proc-macro usage. In particular,
proc-macros can work if we cross-compile. Here, we set up a Docker container
with a fakey cross-compilation setup that farms out the work to an Alpine
Linux chroot. That way we can take advantage of the statically linked,
MUSL-based libraries that Alpine provides.

We can now build static Tectonic without having to gate any features and using
Rust stable, so this should update the CI to not ignore the problem if the
static build fails.
@pkgw
Copy link
Collaborator

pkgw commented Jun 17, 2019

Even better version pushed! We're not manually compiling the deps anymore, and the final executable is down to 60 MB.

edit: log of the static build

This reverts commit b962502.

(This commit has been squashed into the one adding support for reqwest.)
@pkgw
Copy link
Collaborator

pkgw commented Jun 18, 2019

The Travis failure is some unrelated, probably transient, issue on macOS, and Circle has a QEMU timeout. The static build on Travis succeeded, proc macros and all (log). Going to go ahead and merge!

@spl
Copy link
Contributor Author

spl commented Jun 18, 2019

I haven't looked at everything you've done, @pkgw, but it certainly sounds good.

As an aside, you might want to update the Travis-CI macOS image to osx_image: xcode10.2 to get the latest harfbuzz.

@pkgw pkgw merged commit 7e19cb3 into tectonic-typesetting:master Jun 18, 2019
@spl spl deleted the hyper-reqwest branch June 18, 2019 12:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants