Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Saving Facebook webpages results in a broken output #396

Open
avarixa opened this issue Jul 15, 2024 · 7 comments
Open

Saving Facebook webpages results in a broken output #396

avarixa opened this issue Jul 15, 2024 · 7 comments

Comments

@avarixa
Copy link

avarixa commented Jul 15, 2024

The monolith output of a Facebook webpage that would require a login is a broken, mostly un-loaded version of the page with the login popup.

Using a Chromium/Chrome instance to pipe it into monolith results in the same, whether I'm using --incognito or not.

https://imgur.com/0NZwObS

@snshn
Copy link
Member

snshn commented Jul 15, 2024

Using Chromium headless, what if you give it more time before printing the output into the STDOUT? I think it's --virtual-time-budget=10000.

@avarixa
Copy link
Author

avarixa commented Jul 16, 2024

Thanks for the response!

Just tried it, same result. Even tried it on my main Chrome install instead of chromium to no avail

@snshn
Copy link
Member

snshn commented Jul 16, 2024

That's odd. And what if you "save page as" via the browser, does it open the result from file:///?

you can use monolith on local files by the way, just point it at the file instead of https:// and it'll make one .html bundle out of it.

@avarixa
Copy link
Author

avarixa commented Jul 16, 2024

Attempt #1: Using Save Page As > Webpage Complete, the resulting file saved on disk had no CSS/JS (for some reason, this has always been a Facebook problem). When I used monolith on it, I had an Out of Memory error on chrome upon opening the output.

Attempt #2: Using Save Page As > Single File (.mhtml), the resulting file saved on disk had formatting but missing some media. This is as close as I got to getting what I wanted, but wanted to try and capture as much media as I can with monolith. When I used monolith on the .mhtml file, it resulted in a weird HTML with only text containing the original link, my date and time, and some hash.

@snshn
Copy link
Member

snshn commented Jul 16, 2024

Uh-oh. Thank you for trying it. I'll look into he out-of-memory issue along with checking to see how I can improve monolith for Facebook pages. I know that website isn't exactly made for archiving, even saving images from FB is a big deal, just like with instagram. So it may be partially intentional, to prevent people from saving pages, make them visit the actual site.

@avarixa
Copy link
Author

avarixa commented Jul 16, 2024

Thanks - will mess around with the flags and other things I can to see if there's a workaround

@snshn
Copy link
Member

snshn commented Jul 16, 2024

There's always SingleFile browser extension, that will probably work quite well.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants