Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Several improvements around handling CSS and others. #33

Open
wants to merge 50 commits into
base: master
Choose a base branch
from

Conversation

miguelitoelgrande
Copy link

This is my fork providing the following improvements (mainly improvements in extractHTML.js logic):

  • support for additional CSS attributes (display, colspan, border-collapse,...)
  • support for fancy article headers {background-image, position, z-index, background-*}
  • retain original image filenames where applicable (make editing of epub easier)
  • better naming for resulting CSS rules (make editing of epub easier)
  • "Dedup" of CSS rules - only store relevant changes compared to parent elements (smaller CSS files)
  • additional tags and attributes
  • some bug fixes (e.g. syntax highlighting in pre and code environments...)
    Related:
  • leverage ModHeader extension: ensure, images in the ePub are in accepted formats, not WebP images.
  • Pushed "page cleanups" via CSS and JavaScript to userscripts for TamperMonkey

Thank you for your great work. save-as-ebook is really a great extension.

need to accept "float" CSS attribute.
Bugfix in including style classes: per tag -  had a <span> inside <p> with same class name.
some fixes for smaller/more accurate formatting and CSS (experimented with iX Select and Wikipedia pages..
experimental stuff to handle CSS (reuse) better - more desciptive class names and easier to merge/modify multiple articles from same site later...
just my personal notebook to keep those for future use
...useful to retain table layouts as these are not covered with CSS.
... to retain table formatting.
..check if parent node has exact same values..
e.g. to use with Tampermonkey.
Pre-Processing for cleaner output with save-as-ebook
@alexadam
Copy link
Owner

Wow, you did a lot of changes. I need a few days to review it... I think I'll allocate my next week to work on this project because there are a lot of issues waiting to be solved. Thanks for your help, I might come back with questions about your changes.

@miguelitoelgrande
Copy link
Author

Hi Adam,
anytime. Your extension is a great help in reading tech stuff on the Kindle instead of printing etc.
My changes focus on producing more accurate output and helping to edit the resulting epub afterwards.
The most relevant changes are in the extractHtml.js.
The userscripts do a great job in preprocessing.

Looking forward to hear from you,
Michael

@alexadam
Copy link
Owner

Sorry for the late reply. I’ll address your changes in 2 parts: first, about the additional css/js files and, then about the extractHtml.js changes.

There is a problem with cleaning the page before generating the ebook. I don’t like the current feature of inserting custom CSS to remove unwanted elements… It’s not user friendly because of the UI/UX and because a lot of people don’t know CSS. I’m thinking about removing it and find a better solution.

At the time I was working on it there was a bug on FF - you couldn’t access the reader mode from a web extension https://bugzilla.mozilla.org/show_bug.cgi?id=1286387 It doesn’t look like they fixed it but maybe there is a workaround, I’ll investigate more

I’ve never used Tampermonkey but it seems you have to add a script for each page you want to clean, and… I don’t want to add thousands of scripts, for every possible site that can be saved as ebook, to the main repo. Those scripts should be stored locally, if you need them.

I think that a ‘save as ebook’ app should do (only) what it says: save as ebook. I would remove everything related to ‘cleaning’ a web page, because cleaning is a non mandatory, unrelated preliminary step. And mixing them causes a lot of problems because you cannot make everybody happy.

There should be something like what ublock is for ads - an universal ‘reader mode’ extension, with a database of scripts and styles for as many sites as possible. So you don’t have to maintain anything or write code or waste time trying to identify which elements should be removed. I’ll take a look to see what’s available…

I didn’t have time to look on extractHtml.js changes, I’ll do it later.

@miguelitoelgrande
Copy link
Author

miguelitoelgrande commented Apr 1, 2019 via email

@alexadam
Copy link
Owner

sorry for taking so long... you did a lot of changes and I don't have enough time :) I'm trying to think of a way for automatically testing the web ext. before a release, maybe with Puppeteer...

@alexadam
Copy link
Owner

ok, so I created a 'tests' folder with a small puppeteer app that starts a chrome instance + the extension. I want to add some test pages & epub references and find a way to compare them with what is being generated.
I don't know if this is the best way to do it, but is the quickest for now...
In the next days I'll add as many references as possible and pages that didn't work or have issues.

@poire-z
Copy link

poire-z commented Aug 21, 2019

Hello, and thanks for that extension!

Just letting you know of a few fixes and improvements I've made for my own reading of the generated EPUBs with KOReader on eInk devices (some context here).
I've taken many bits from @miguelitoelgrande work (and from #19), so it feels a bit awkward opening a PR with my changes :) Also, I'm using it with some older version of Firefox, and can't (don't really have time) to check how it would work with newer Firefox or Chrome.
But feel free to pick any of my fixes that make sense.

@miguelitoelgrande : regarding your commit:

handle styling of links and other exceptions better (a, strong, em - tags)

This is a bit wrong. These are not exceptions, and what you did to these should be done to all tags, for all CSS properties that are inherited (per specs) and only them. I followed up on your huge improvements commit in poire-z@8daadb5 if you're interested.

Note: the choice of which styles to include or not is quite use-case dependant :|
@miguelitoelgrande added background-image, but I don't want them (as well as letter-spacing and others). But I want "float", which I understand many other EPUB reading softwares will not want. So, the choice of what styles to save or not may require manual tweaking to the code (until there is some UI configuration for that :)

added "start" attribute for <ol> lists with explicit numbering
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants