Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rustdoc: offer web server as a Bazel binary #472

Open
wchargin opened this issue Nov 4, 2020 · 8 comments
Open

rustdoc: offer web server as a Bazel binary #472

wchargin opened this issue Nov 4, 2020 · 8 comments

Comments

@wchargin
Copy link
Contributor

wchargin commented Nov 4, 2020

A rust_doc rule spits out a zip file, but that’s not convenient for
actually consulting the docs. It would be great for rust_doc to
provide a Bazel binary that can be run to spin up a web server.

Here is one approach. From the Bazel rule, provide this Python file:

import mimetypes
import os
import sys
from wsgiref import simple_server
import zipfile


def main():
    (webfiles_zip_name, default_crate, *args) = sys.argv[1:]
    port = 8000
    if args:
        port = int(args[0])

    webfiles = os.path.join(os.path.dirname(__file__), webfiles_zip_name)
    data = {}
    with open(webfiles, "rb") as fp:
        with zipfile.ZipFile(fp) as zp:
            for path in zp.namelist():
                data[path] = zp.read(path)
    sys.stderr.write("Read %d files from %s\n" % (len(data), webfiles_zip_name))

    default_path = "/%s/index.html" % default_crate

    def app(environ, start_response):
        p = environ.get("PATH_INFO", "/").lstrip("/")
        if not p:
            start_response("302 Found", [("Location", default_path)])
            yield b"302 Found\n"
            return
        if p.endswith("/"):
            p += "index.html"
        blob = data.get(p)
        if not blob:
            start_response("404 Not Found", [])
            yield b"404 Not Found\n"
            return
        (mime_type, encoding) = mimetypes.guess_type(p)
        headers = []
        headers.append(("Content-Type", mime_type))
        if encoding is not None:
            headers.append(("Content-Encoding", encoding))
        start_response("200 OK", headers)
        yield blob

    server = simple_server.make_server("", port, app)
    # Find which port was actually bound, in case user requested port 0.
    real_port = server.socket.getsockname()[1]
    msg = "Serving %s docs on port %d\n" % (default_crate, real_port)
    sys.stderr.write(msg)
    try:
        server.serve_forever()
    except KeyboardInterrupt:
        print()


if __name__ == "__main__":
    main()

Then generate a py_binary build target:

# Assuming `rust_doc(name = "mylib_doc", dep = ":mylib")`, generate:

py_binary(
    name = "mylib_doc_server",
    srcs = ["mylib_doc_server.py"],
    args = [
        "mylib_doc.zip",  # output of `rust_doc` rule
        "mylib",  # default crate
    ],
    data = [":mylib_doc.zip"],  # output of `rust_doc` rule, again
    python_version = "PY3",
    srcs_version = "PY3",
)

Then, users can bazel run :mylib_doc_server to start a server, and
optionally pass a port argument. Or, use ibazel instead of bazel,
and then whenever you change one of the Rust sources, the Rustdoc will
automatically recompile and the server will automatically restart.

This depends on #471; without that fix, this will be harder.

Thoughts? Are you open to this change? Is it okay to use Python here?
Note that the above simple server only uses the Python standard library.
Happy to contribute a PR, and I license all this code as Apache 2.0.

@UebelAndre
Copy link
Collaborator

Looks great to me! But I gotta wonder, could the same thing be implemented in Rust? 😉

@wchargin
Copy link
Contributor Author

wchargin commented Nov 4, 2020

Yep, that’s a reasonable question. I think the answer is, “yes, but not
with only the standard library”. We need to read a web file and run a
web server, both of which want external crates. And once you need
external crates, it’s less clear to me whether that’s the right choice
for rules_rust itself.

For instance: how would rules_rust depend on its zipfile and web
server crates? The natural approach for end users is cargo-raze, but
that feels uncomfortably circular to me. The generated BUILD file
loads from @io_bazel_rules_rust, but from within rules_rust we’d
want it to load from just “//”.

The runtime performance of the docs web server is not critical, so
Python’s okay there. But many people might be frustrated by having to
compile the whole Rust web stack and link a Rust web server (not super
fast, in my experience) just to see their docs. We’d also have to go to
some lengths to ensure that updating the docs doesn’t re-link the
binary, since that would be quite annoying.

Since Bazel has native support for Python (native.py_binary), it seems
reasonable to me to use Python here. What do you think?

@UebelAndre
Copy link
Collaborator

I think the compile time cost is a good call out. Do you know how cargo doc handles this currently? I thought it outputs static files for you to simply view in a browser. No server required. Perhaps we'd just need a running process to watch for changes on disk and output a file:// path to some html file?

Even though Bazel currently has native.py_binary, bazelbuild/bazel#9006 indicates that the rules library should be used instead. So if this were to be implemented using python, it should be done in a future facing manner by using the python rules.

@wchargin
Copy link
Contributor Author

wchargin commented Nov 4, 2020

Do you know how cargo doc handles this currently? I thought it outputs static files for you to simply view in a browser. No server required.

Right, cargo doc just sticks files in target/doc/. With the standard
Cargo toolchain, I can just

(mkdir -p target/doc/ && cd target/doc/ && python3 -m http.server) \
    & cargo watch -x doc

or replace python3 -m http.server with your favorite static server.
But with rust_doc this isn’t an option because the contents are all
bundled up in a zip archive.

I guess it’s technically true that you can visit a file:/// URL, but
I never do so. One reason is that I’m often SSHed into a workstation,
which I can connect to if it runs a web server but not via a file URL.

Perhaps we'd just need a running process to watch for changes on disk and output a file:// path to some html file?

Not sure what you mean; please clarify?

There’s no file:/// URL that we can emit, because the index.html is
inside a zip archive. I don’t think that we should have a running
process that re-extracts the archive to disk every time that it changes.

Even though Bazel currently has native.py_binary, bazelbuild/bazel#9006 indicates that the rules library should be used instead.

Eh. That’s one interpretation. Another is that that flag was supposed to
be flipped in Bazel 2.0, and still hasn’t flipped in Bazel 3.7.0, nearly
a year later. :-) But yes, we could use rules_python and it’d be fine.

@wchargin
Copy link
Contributor Author

wchargin commented Nov 4, 2020

could the same thing be implemented in Rust?

I’ve given this a shot:

Rust `zipserver`

Cargo.toml

[package]
name = "zipserver"
version = "0.1.0"
authors = ["William Chargin <[email protected]>"]
edition = "2018"

[dependencies]
anyhow = "1.0.34"
hyper = "0.13"
mime_guess = "2.0.3"
tokio = { version = "0.2", features = ["full"] }
zip = "0.5.8"

src/main.rs

use anyhow::Context;
use hyper::{http, Body, Request, Response, Server};
use std::collections::HashMap;
use std::convert::Infallible;
use std::fs::File;
use std::io::{BufReader, Read};
use std::net::SocketAddr;
use std::path::{Path, PathBuf};
use zip::ZipArchive;

struct State {
    data: HashMap<String, Vec<u8>>,
    root_redirect: String,
}

async fn hello(state: &'static State, req: Request<Body>) -> Result<Response<Body>, http::Error> {
    let path = req.uri().path().strip_prefix("/").unwrap_or_default();
    if path.is_empty() || path == "/" {
        return Ok(Response::builder()
            .status(302)
            .header("Location", &state.root_redirect)
            .body(Body::from(b"302 Found\n".as_ref()))
            .unwrap());
    }

    let data = match state.data.get(path) {
        Some(v) => v,
        None => {
            return Ok(Response::builder()
                .status(404)
                .body(Body::from(b"404 Not Found\n".as_ref()))
                .unwrap());
        }
    };
    let ext = path.rsplit('.').next().unwrap_or("");
    let mime = mime_guess::from_ext(ext)
        .first()
        .map(|x| x.to_string())
        .unwrap_or(String::new());
    let res = Response::builder()
        .header("Content-Type", mime)
        .body(Body::from(data.as_slice()))
        .unwrap();
    Ok(res)
}

fn read_zip(path: &Path) -> anyhow::Result<HashMap<String, Vec<u8>>> {
    let mut result = HashMap::new();
    let mut zip = ZipArchive::new(BufReader::new(File::open(path)?))?;
    for i in 0..zip.len() {
        let mut file = zip
            .by_index(i)
            .with_context(|| format!("zip file entry {}", i))?;
        let name = file.name().to_string();
        let mut buf = Vec::new();
        file.read_to_end(&mut buf)
            .with_context(|| format!("reading zip entry {:?}", name))?;
        result.insert(name, buf);
    }
    Ok(result)
}

#[tokio::main]
async fn main() -> anyhow::Result<()> {
    let mut args = std::env::args_os().skip(1).collect::<Vec<_>>();
    if args.len() != 2 && args.len() != 3 {
        eprintln!("usage: zipserver <zip-file> <root-redirect> [<port>]");
        std::process::exit(1);
    }
    let zip_path = PathBuf::from(std::mem::take(&mut args[0]));
    let root_redirect = args[1]
        .to_str()
        .expect("root redirect must be valid Unicode")
        .to_string();
    let port: u16 = args
        .get(2)
        .map(|x| x.to_str().expect("port must be valid Unicode"))
        .unwrap_or("8000")
        .parse()
        .expect("port must be a valid u16");

    let state = State {
        data: read_zip(&zip_path)?,
        root_redirect,
    };
    eprintln!(
        "Read {} files from {}",
        state.data.len(),
        zip_path.display()
    );
    let state = &*Box::leak(Box::new(state));

    let addr = SocketAddr::from(([0, 0, 0, 0], port));
    let make_service = hyper::service::make_service_fn(|_conn| async move {
        Ok::<_, Infallible>(hyper::service::service_fn(move |req| hello(state, req)))
    });
    let server = Server::bind(&addr).serve(make_service);
    println!("Serving on port {}", server.local_addr().port());
    server.await?;
    Ok(())
}

With the dependencies fetched but not compiled, this clean-builds in
23.2 seconds on my workstation, and incrementally builds in 3.4 seconds.
And my workstation is fairly powerful (Xeon W-2135, 6 physical cores).
This is notably slower than the Python compile time of 0 seconds. ;-)

@wchargin
Copy link
Contributor Author

wchargin commented Nov 4, 2020

(Oh, and that pulls in 68 transitive dependencies.)

@UebelAndre
Copy link
Collaborator

Not sure what you mean; please clarify?

I was suggesting having something like ibazel watch for changes and automatically update the content you're looking at.

Eh. That’s one interpretation. Another is that that flag was supposed to
be flipped in Bazel 2.0, and still hasn’t flipped in Bazel 3.7.0, nearly
a year later. :-) But yes, we could use rules_python and it’d be fine.

I feel strongly that a rules repository should adhere to every incompatibility flag. If this is something that's not actually going to be flipped, then someone should ping that issue and ask for the flag to be removed.

With the dependencies fetched but not compiled, this clean-builds in
23.2 seconds on my workstation, and incrementally builds in 3.4 seconds.
And my workstation is fairly powerful (Xeon W-2135, 6 physical cores).
This is notably slower than the Python compile time of 0 seconds. ;-)

There's definitely no discussion to be had around the time difference of a compiled and non-compiled language. Compiling a web server was always going to come at some noticeable cost. I think the clear distinction here is you're looking for something that improves remove developer environments (ssh'ed into another machine) and not necessarily tying to address a gap in functionality between cargo doc and these rules.

I do feel you've shined light on the issue that viewing the output of rust_doc is cumbersome and am in favor of any efforts to improve it. As much as I dislike python, I think this approach looks good to me (so long as the script says small and the heavy lifting be done by rustdoc and the relevant Bazel rules). 😄

@wchargin
Copy link
Contributor Author

wchargin commented Nov 4, 2020

I feel strongly that a rules repository should adhere to every incompatibility flag. If this is something that's not actually going to be flipped, then someone should ping that issue and ask for the flag to be removed.

Cool; that sounds pretty reasonable. Happy to comply.

Thanks for your quick responses and the helpful discussion! I’ll start
with the dependent issue #471 and then move on to this one.

wchargin added a commit to wchargin/rules_rust that referenced this issue Nov 4, 2020
A `rust_doc` rule emits its output into a zip archive so that the build
graph can be statically known, but this makes it hard to actually view
the documentation. This patch adds a `rust_doc_server` macro, a sibling
to `rust_doc` that spins up a tiny web server that reads from the zip
file. The server can run under `ibazel` and automatically restart when
the documentation sources are updated.

This is written in Python for a simple cross-platform solution that
doesn’t require compiling a Rust web stack, which takes about 20 seconds
on my workstation (Xeon W-2135).

Fixes bazelbuild#472.

Test Plan:
From `examples/`, run `bazel run //hello_world:hello_world_doc_server`
and note that the server responds properly. Run with `-- --port 0`
(extra `--` for `bazel run`) and note that the server properly prints
its actual port.

wchargin-branch: rustdoc-server
wchargin-source: 307dcd532d0f7880b1cceaf0421a51416607b44f
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants