-
Notifications
You must be signed in to change notification settings - Fork 198
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Downloadable docs #174
Comments
How about just sharing the database dump, like a backup.torrent or whatever. Crates.io for sure have a backup, this way some users will co-host it for free. |
That's not an acceptable solution - note that @Kapeli isn't just asking for tossing the database over the wall, they specific want to integrate docs.rs into a offline documentation viewer so that users can download the documentation for specific crates as needed. The docset format isn't exclusive to Dash, either - other projects such as Zeal also make use of it. |
One note: I'm not asking for docs.rs to generate Dash docsets. I'm asking for it to provide downloadable docs. |
I'd really love to see this done. I use Dash for everything else and it's really annoying dealing with third-party Rust crate docs since I can't view them anywhere outside a web browser. |
I am really looking forward this integration. |
Can't wait for both integration and Rust syntax coloring support for Dash snippets :) |
is there any progress or any new plan? |
It'd also be nice to get the nightly rustc docs without having to build the rustc. Recursive wget is a(n inefficient) workaround, I suppose. |
https://github.com/Robzz/cargo-docset was recently released, maybe that's enough for some people here or maybe the some of the code can be reused and integrated into docs.rs somehow |
Any update on this? |
It's not too practical to generate a downloadable archive of a crate's documentation, as each file is stored individually on S3. We'd need to fetch all the files individually and generate an archive of that on the fly, which is not practical for large crates. Preparing an archive at build time and storing it separately would increase our storage costs, and due to the unbounded nature of docs.rs we should try avoiding that. If y'all have better implementation ideas I'd love to read them. |
Could we prepare an archive at build time but only for crates that are opted in to this using some notion of "important to the community"? For example I'd love to see docs.rs provide a docset for tokei that Dash can keep automatically up-to-date. I don't know who'd provide that curation though. There could be some way to nominate crates and leave it up to the docs.rs maintainers to approve it, or maybe it could be based on traffic to a particular crate's documentation. |
Could the archive be generated only when requested and have a fixed-size cache where older archives get removed? Would the archives really be that big though? Docs are generally just text, which compresses very well. You could have a separate archive for the common resources (CSS, images, fonts and so on) and then the docs archives would just be compressed HTML files. |
That's not really feasible, as some crates (like
Resources are already deduplicated on S3, and all files will be compressed soontm. Once we do that storing the prebuilt archives will double our storage requirements. Today we can afford that, but thinking long term we'll want to avoid using too much storage. |
Taking a long time is fine. For API access you can return a message saying the docs archive isn't ready yet and to try again later, for users trying to download the docs from their browser, show a page saying the same thing, maybe a bit nicer with automatic refresh and so on. With a big enough cache size, you could optimise both CPU and disk space needs. |
@nhynes this is out of scope for docs.rs, we only build user documentation. I'm not sure the right place to open a new issue, maybe https://github.com/rust-lang/www.rust-lang.org/issues ? |
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
any updates? this is a trivial issue ongoing for 4 years now... |
Is the goal of downloadable docs to improve performance, or to ensure docs are available when offline? I'm assuming the latter. In that case, it's important for tools that want to download docs to be able to enumerate all the static files that might be needed by a bundle of docs. It's not trivial to enumerate these just by processing HTML, because some are loaded by JS (e.g. search-index). I think we probably need to start recording a mapping of rustdoc release -> list of static files, and provide that listing as part of the bundle for crate docs built with that release. |
yes, the latter. More specifically this issue here is about offline doc readers that have to process the docs anyways to make them usable in their docsets.
Since processing and HTML rewriting is needed anyways right now the idea was to download the missing assets when needed, where needed. The search-index is invocation specific and will be in the archive, while I think the offline doc readers wouldn't use our internal search. But that's up to them. |
Ah, I misspoke about search-index. Good catch. But the problem exists for settings.js, settings.css, and search.js. They are loaded at runtime by other JS that uses rustdoc-vars to figure out their paths. Perhaps it's true that these pieces of functionality aren't needed by offline doc readers, but it seems like a potential source of fragility / worrying future bug. For other, more typical |
The idea is to start simple, following the comment from above by Kapeli:
Currently we're "just" exposing the internal archive for everyone that want to work with the build output in a programmatic matter. |
@jsha your point around assets that are not referenced directly in HTML sounds of course like a valid one, but I don't know enough about how the docsets would be processed to know what is actually needed. IMO any more sophisticated approach would be based on this archive, so this is a valid first step. Further steps could be to provide archives for the toolchain specific static files in a sensible way. |
Thanks for working on this @syphar! I've had a look over the docs archive for The CSS and JS are missing, but I can make Dash fetch & save them. Please let me know when this gets deployed and archives are available for all crates. |
That's good to hear, thanks for checking this @Kapeli !
Will try to do, this issue will definitely be closed then. To keep in mind: The archive only exists for all releases built since #1342 (sep 2021). Only looking at the latest version you will have an archive for ~40k crates out of ~92k. We're planning a rebuild for older releases (see #464 ), but this needs some infrastructure work we're also working on. IMO most popular crates will be fine, and I hope this will be added to Dash anyways. |
Can you provide info on how to access archives for other crates? For example, I'm not able to get the archive for the sql crate at https://static.docs.rs/rustdoc/sql/0.4.3.zip |
This will be possible after #1865 is deployed, via the endpoint directly on docs.rs. It will give you a redirect to |
And please don't depend on |
While this is merged, closed & deployed, there are some open permissions that need to be set. I'll update you all here when that's fine |
This change is live & deployed right now. |
Is there any crawling policy for these downloads? For example would it be ok to download all docs with 1 request per second speed? |
@malaire we haven't yet made a policy, but I'm curious why you would want to do that? It's several terabytes of data, mostly for crates that have been used less than a dozen times or aren't the latest version. |
I didn't realize it's that much data. I don't want to rely on internet and want to keep important data locally, but as I already have all crates downloaded, in this case |
The main point of the downloadable docs archive is to give offline docs-readers like Dash the data to process so they can generate the docsets, which then can be used by Dash or Zeal (on linux). So if you wait a little longer, you'll have it. |
Thanks a lot for your work on this @syphar. I'll integrate it into Dash ASAP. |
@Kapeli do you have any estimate when it will be available? Dash without docs for packages on crates.io is not much useful and this is an amazing app <3 |
@syphar I'm almost done adding support for this in Dash. I've found a crate for which downloadable docs are not available: |
this is awesome!
I didn't check this specific release, but see my comment from above:
So yes, for now this is expected.
Yes, the most simple solution would be to show an error stating that the archive is not available or something like that. Crate authors always can request a rebuild with us if they want. When back home I can also check our database and see if we proactively trigger a rebuild for all latest releases with >1k downloads or something like that, this depends on the numbers. But we still need the error message for the user. I assume you mostly (only?) show latest versions? |
Has this been added to Dash yet? I don't see the Rust Docsets Third-party source in my app. |
To my knowledge there is nothing released. |
I'm sorry, I've been having some health issues, mostly due to burnout. I'm currently working on Dash 7, which will include Rust Docsets support. It's almost done, it just needs some polish, but I'm not sure when I'll actually release it. |
@Kapeli Thank you for the update. I hope you find the rest you need. |
I'd like to integrate docs.rs inside Dash.
To achieve this, I need a way to download the docs for a package as HTML files. Please consider supporting this.
edit(@jyn514): see #174 (comment) for mentoring instructions.
The text was updated successfully, but these errors were encountered: