-
-
Notifications
You must be signed in to change notification settings - Fork 315
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
monolith https://win98icons.alexmeub.com/ -o test.html #357
Comments
What an interesting find! So, it looks like this file https://win98icons.alexmeub.com/win-icons.min.css refers this file https://win98icons.alexmeub.com/css_sprites.png more than 1000 times, means Monolith will try to embed the same file over 1000 times, and it makes the process run out of RAM (or just upsets it so much that it gives up and quits). Even if it did manage to save it as one file, using a browser to open it later would likely crash it, it'd be like a 100MB file at least. I'd say, let's try that page again when I add MHTML support, then it'll save it just once and reference multiple times, exactly like on the web. |
Found a similar issue! (many refs to a image by css) Very cool project! |
Yes, very nice project indeed! Now you mentioned MHTML, I cannot get around the conspiracy in my head that the name Monolith is somehow inspired by it :-) Oh and I see you just added
that makes dependency number 3 😄 |
It kind of just happened, with MHTML sounding similar. The name for the project (Monolith) was fitting, since it describes the result (one file, with everything in it), and also happens to contain all the letters of HTML in it. But with MHTML, it even ends up fitting "Monolithic HTML" well, pure luck.
|
Can't find a way to easily use some sort of on-disk buffering instead of RAM in Rust, besides that would probably slow the program down. I'm considering simply optimizing caching and cleaning up unneeded data more promptly. For example, instead of holding every retrieved asset in RAM in that global cache object, it'll keep it on disk via |
What's challenging here seems to be dedup assets that's essentially identical in the final HTML file. Since all assets (mainly images) are going to be base64 encoded in the same file, there is no way to reference the same asset multiple times. Only if HTML allows Other projects suffer with this problem as well. SingleFile cannot save this page either. Holding all retrieved assets in RAM is not a problem as far as they are de-duped by their original URI (e.g. |
For the record, SingleFile proposes to solve this kind of issue with the "self-extracting" file format. With this format, the saved page weighs around 1Mb, see the page in the attached zip file. |
I think that'd be a bit too much for monolith, overengineering. The best way to solve a problem is to blame it on someone else and not solve it at all. Having a giant file to serve as a collection of icons is just bad engineering decision from the standpoint of letting the thing load, let alone archiving it. Working around badly-made and unoptimized websites would take way too much time and effort, I rather fix some bugs and implement a couple more required features for monolith instead. |
Running monolith with in the following way:
RUST_BACKTRACE=full monolith https://win98icons.alexmeub.com/ -o test.html
gives me the following backtrace:
Let me know if you need more information!
The text was updated successfully, but these errors were encountered: