Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rustdoc: Implement cross-crate searching #12946

Closed
wants to merge 1 commit into from

Conversation

alexcrichton
Copy link
Member

A major discoverability issue with rustdoc is that all crates have their
documentation built in isolation, so it's difficult when looking at the
documentation for libstd to learn that there's a libcollections crate with a
HashMap in it.

This commit moves rustdoc a little closer to improving the multiple crate
experience. This unifies all search indexes for all crates into one file so all
pages share the same search index. This allows searching to work across crates
in the same documentation directory (as the standard distribution is currently
built).

This strategy involves updating a shared file amongst many rustdoc processes, so
I implemented a simple file locking API for handling synchronization for updates
to the shared files.

cc #12554

@alexcrichton
Copy link
Member Author

@huonw
Copy link
Member

huonw commented Mar 16, 2014

The crates column should have a consistent order (alphabetical?).

I really like this, but I'm slightly nervous about it: it makes the documentation index a huge(r) blob (especially with syntax and rustc, like the online docs have) that is loaded unconditionally: #12597 is a bigger priority with this change IMO.

@alexcrichton
Copy link
Member Author

The search index is loaded asynchronously, so the page is usable before its loaded. Additionally, it's super-cacheable because all crates and pages share the exact same index. I agree that it's a little large. With rustc/syntax, it's 1.7MB, 264K compressed. This is what we're buying into though if we want to be able to search across crates.

For comparison, google's homepage downloaded over a megabyte of JS.

@sfackler
Copy link
Member

There should probably be some indication that you're going to a different crate. I'd be confused if I found Timespec in the libstd docs but couldn't use std::time::Timespec.

@huonw
Copy link
Member

huonw commented Mar 16, 2014

The search index is loaded asynchronously, so the page is usable before its loaded

The search isn't usable immediately; and, on even just mildly slow internet connections, the delay between page loading and search working is annoying.

For comparison, google's homepage downloaded over a megabyte of JS.

1 MB after decompressing it, and I'm sure their mobile site is significantly smaller.

Additionally, it's super-cacheable because all crates and pages share the exact same index

Do we actually set the appropriate caching headers so that it can be cached?

@alexcrichton
Copy link
Member Author

There should probably be some indication that you're going to a different crate.

Each page has an accurate "breadcrumb trail" at the top, you think it should be more obvious than that? Everything is prefixed with what crate it's from.

even just mildly slow internet connections, the delay between page loading and search working is annoying

Our options here are abandon this completely or remove search altogether, otherwise I think this is basically a fact of how it works?

Do we actually set the appropriate caching headers so that it can be cached?

It's all hosted on S3, which apparently does the right thing. I'm getting lots of 304's on the homepage.

@huonw
Copy link
Member

huonw commented Mar 16, 2014

Our options here are abandon this completely or remove search altogether, otherwise I think this is basically a fact of how it works?

With compression, the transfer time would be much smaller; or if we had the option of a dynamic backend the main website wouldn't need to transfer anything. (The former is the solution I'm suggesting we should try to adopt as soon as possible.)

It's all hosted on S3, which apparently does the right thing. I'm getting lots of 304's on the homepage.

Ok, awesome.

};
if ret == -1 {
unsafe { libc::close(fd); }
fail!()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fail!("could not lock {}", p.display())? (And similarly below.)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, won't this cause rustdoc to just fail when another rustdoc process is running? Should it be sleeping and then retrying a few times? (My reading of the man-page is that fcntl doesn't block and just returns immediately?)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apparently F_SETLKW is special (this code is all from online examples, I'm not intimately familiar with file locking).

From the manpage:

F_SETLKW

This command is the same as F_SETLK except that if a shared
or exclusive lock is blocked by other locks, the process
waits until the request can be satisfied. If a signal that
is to be caught is received while fcntl is waiting for a
region, the fcntl will be interrupted if the signal handler
has not specified the SA_RESTART (see sigaction(2)).

Which means that this should probably handle EINTR, but rustdoc usage does not currently generate any signals.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, of course, I was reading the section for F_SETLK; my mistake.

@huonw
Copy link
Member

huonw commented Mar 17, 2014

r=me after comments (especially the file locking stuff).

@DaGenix
Copy link

DaGenix commented Mar 17, 2014

FYI: it compresses from 838,669 bytes down to 135,868.

@ehsanul
Copy link
Contributor

ehsanul commented Mar 18, 2014

It shouldn't be too bad getting main.js compressed. S3 doesn't automatically handle the Accept-Encoding header unfortunately, but it can be made to work in the following (hacky) manner: http://stackoverflow.com/a/5502390/127219

@huonw
Copy link
Member

huonw commented Mar 18, 2014

@ehsanul see #12597 too.

A major discoverability issue with rustdoc is that all crates have their
documentation built in isolation, so it's difficult when looking at the
documentation for libstd to learn that there's a libcollections crate with a
HashMap in it.

This commit moves rustdoc a little closer to improving the multiple crate
experience. This unifies all search indexes for all crates into one file so all
pages share the same search index. This allows searching to work across crates
in the same documentation directory (as the standard distribution is currently
built).

This strategy involves updating a shared file amongst many rustdoc processes, so
I implemented a simple file locking API for handling synchronization for updates
to the shared files.

cc rust-lang#12554
@bors bors closed this in 87e72c3 Mar 19, 2014
@alexcrichton alexcrichton deleted the discover branch March 19, 2014 04:53
lnicola pushed a commit to lnicola/rust that referenced this pull request Aug 9, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants