-
Notifications
You must be signed in to change notification settings - Fork 18
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
implement pdf output with pagedjs #215
Comments
first try, with zero customization, is pretty impressive. https://www.dropbox.com/scl/fi/csg704t7252c0l8jgfp07/10636.pdf?rlkey=wlhpem4xpehflhin0w7i96obo&dl=0 |
removing the |
does that fix the footnotes as well? |
Yes, removing the We probably want to add header and footer text as described here: https://pagedjs.org/documentation/7-generated-content-in-margin-boxes/ I will start with "Project Gutenberg, https://gutenberg.or/ebooks/#####" on the bottom and the book's make_pretty_title(size=80) on the top. I'm thinking that pagenumbers are going to be confusing, so better to leave them out? Also, the default config has a gutter, I think that shoule be omitted. Are people going to want different paper sizes? |
This example, chosen because it has music, is definitely less polished. We see that all the links to sound and pdf are removed (nice trick!) but also there are some missing images, including half the score on page 148. see https://gutenberg.org/cache1/epub/46419/46419-h.html#Page_132 |
55215.pdf |
There's something funny with measurements in pagedjs. I can't zero the left margin, and images get dropped even though they should fit. I think I've tricked it into working. Here's a sample output for the most recently released book: pg72955.pdf |
#72955 looks good. The only anomaly I noticed is at the very end: the Footnotes have some over-striking. I think adding generated page numbers in the footer would make sense. If nothing else, it will help people keep track of what page they were on. Of course, those books with paginated indices will be totally wrong - but they usually also have named anchor hyperlinks to the right place in the books. If possible, I think a nice header/footer would be something like this ... though I'm not sure whether the footer will look too busy? Page number could go top right, instead. ... we could also consider different header/footer combinations for alternate pages, like printed books often have. |
It took a while to figure out how to fix the overlapping text in 10636. Good to learn more about pagedjs. There was also a problem with margins, which turned out to be the same issue. pagedjs works by manipulating the page's DOM using flexbox. Each page works like a column in a very very wide page. As a result, the body element, and styles attached to the body element, don't work properly. To fix this, we'll need to remove all css properties from the body element an re-attach them to the div element introduced by the pagedjs' scripts manipulation. So in 10636, for example, we need to replace:
with:
so ebookmaker will need to do a bit of work to so that our files will render beautifully. |
In another text, I discovered that pagedjs has problems with |
Performance will be an issue as well; most likely we'll want to run pdf re-rendering separately from our other ebook production. |
This is all sounding great to me. Thanks for perseverance on all the
nuances.
I'm not worried about keeping our computers busy with rendering, and agree
that we might want to separate PDF processing from the other jobs that make
generated content.
…On Thu, Feb 22, 2024 at 7:01 AM Eric Hellman ***@***.***> wrote:
Performance will be an issue as well; most likely we'll want to run pdf
re-rendering separately from our other ebook production.
—
Reply to this email directly, view it on GitHub
<#215 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AFQRDLVXUISBUFEZEMGLKMDYU5FTLAVCNFSM6AAAAABC3C4AHKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSNJZGUYTANRQGU>
.
You are receiving this because you commented.Message ID:
***@***.***>
|
Running head chapter titles is doable, but probably not for most of the backfile. The suggested way is to use heading elements, for example, Prospectively, we can certainly do this, by asking submitters for specific markup for chapter titles. So in my tests, I've used the book title in the running head, using the version of the title that omits subtitle. For the Footer, I've been trying "Project Gutenberg, ". but the method for getting the url for the book is currently not working. Page numbers are tricky, and need discussion. Many books include original page numbers with reasonably uniform markup, and these could be printed in the side margin, for example. If we print pdf page numbers there will be producers who want the front and back matter numbered separately. Crazy idea: maybe print percentages? |
Or even crazier, a percentage bar? (not hard in css) |
For headers/footers, perhaps we could have an optimal approach based on
best practices, and then a couple of lesser approaches when the HTML markup
isn't regular enough.
The idea of doing a running chapter header when <h2> is present feels great
for an optimal approach.
Putting the embedded print page numbers in the right margin is definitely
desirable. The Kobo e-reader does that already, actually (though the footer
page numbers are not accurate due to some dynamic between EPUB and kEPUB
formats or something else...).
I realize it adds complexity to have a couple of fallback methods for
headers & footers. It seems the complexity might be worth it, though, since
we'll end up with many books that have a fantastic look.
…On Thu, Feb 22, 2024 at 8:19 AM Eric Hellman ***@***.***> wrote:
Running head chapter titles is doable, but probably not for most of the
backfile. The suggest way is to use heading elements, for example, h2.
Unfortunately the backfile is inconsistent with the use of headings, for
example by using multiple h2 elements to make line breaks. So we would
get errata reports from this.
Prospectively, we can certainly do this, by asking submitters for specific
markup for chapter titles. So in my tests, I've used the book title in the
running head, using the version of the title that omits subtitle.
For the Footer, I've been trying "Project Gutenberg, ". but the method for
getting the url for the book is currently not working.
Page numbers are tricky, and need discussion. Many books include original
page numbers with reasonably uniform markup, and these could be printed in
the side margin, for example. If we print pdf page numbers there will be
producers who want the front and back matter numbered separately.
Crazy idea: maybe print percentages?
—
Reply to this email directly, view it on GitHub
<#215 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AFQRDLUCGNUSQOUKVSEUOVTYU5OYPAVCNFSM6AAAAABC3C4AHKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSNJZGY3TCNBYG4>
.
You are receiving this because you commented.Message ID:
***@***.***>
|
OK here's a sample with page numbers and running heads and foots (first 300 pages) For this book, h3 would have been better for heads, but we can only pick one thing. An empty h2 sets the head empty for 200 pages or so. there's some text overlap on p54-57, but overall I think this is spectacular! |
Looks great!
I had suggested earlier, for EPUB, that the first page should be the cover
image.
Then, boilerplate can be the 2nd page .. i.e., a verso page
…On Thu., Feb. 22, 2024, 11:37 a.m. Eric Hellman, ***@***.***> wrote:
OK here's a sample with page numbers and running heads and foots (first
300 pages)
10636.pdf
<https://github.com/gutenbergtools/ebookmaker/files/14378012/10636.pdf>
For this book, h3 would have been better for heads, but we can only pick
one thing. An empty h2 sets the head empty for 200 pages or so.
there's some text overlap on p54-57, but overall I think this is
spectacular!
—
Reply to this email directly, view it on GitHub
<#215 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AFQRDLRVFBP5J3WRISJ4LXTYU6GAHAVCNFSM6AAAAABC3C4AHKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSNRQGAZTOMZZGU>
.
You are receiving this because you commented.Message ID:
***@***.***>
|
Also, I don't think you need PG in both the header and footer. Just footer
is enough
On Thu., Feb. 22, 2024, 12:01 p.m. Greg Newby, ***@***.***>
wrote:
… Looks great!
I had suggested earlier, for EPUB, that the first page should be the cover
image.
Then, boilerplate can be the 2nd page .. i.e., a verso page
On Thu., Feb. 22, 2024, 11:37 a.m. Eric Hellman, ***@***.***>
wrote:
> OK here's a sample with page numbers and running heads and foots (first
> 300 pages)
> 10636.pdf
> <https://github.com/gutenbergtools/ebookmaker/files/14378012/10636.pdf>
>
> For this book, h3 would have been better for heads, but we can only pick
> one thing. An empty h2 sets the head empty for 200 pages or so.
>
> there's some text overlap on p54-57, but overall I think this is
> spectacular!
>
> —
> Reply to this email directly, view it on GitHub
> <#215 (comment)>,
> or unsubscribe
> <https://github.com/notifications/unsubscribe-auth/AFQRDLRVFBP5J3WRISJ4LXTYU6GAHAVCNFSM6AAAAABC3C4AHKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSNRQGAZTOMZZGU>
> .
> You are receiving this because you commented.Message ID:
> ***@***.***>
>
|
I agree, but the PG comes from the first H2 and disappears after the front matter. The running text selector are currently rather limited. |
If it were easy, it would have already been done. |
This ia not nearly as good as I thought, there's a vertical margin problem that is dropping the bottom two lines across many page breaks |
Turns out I found a bug with use of |
I like the direction this is heading.
…On Thu, Feb 22, 2024 at 3:08 PM Eric Hellman ***@***.***> wrote:
Turns out I found a bug with use of blockquote. I think the effort of
chasing down the many problems exposed by PG's use will result in big
benefits for book production in general.
—
Reply to this email directly, view it on GitHub
<#215 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AFQRDLWIKXCFVLASQWE4G4DYU66WTAVCNFSM6AAAAABC3C4AHKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSNRQGM4TQNJYGQ>
.
You are receiving this because you commented.Message ID:
***@***.***>
|
That's not crazy at all - for example Kindles do exactly that. I couldn't remember the exact behavior so I grabbed my Kindle Paperwhite just now, where I'm currently smooth-reading. In the footer it displays a page number in the lower left, and a percentage in the lower right. The percentage is surely auto-calculated by the device. The page number doesn't change on every page-turn, and sometimes a page number is skipped over. I assume page numbers come from the epub3 file. Perhaps the page number that's in effect at the beginning of the current viewport. Here's an example, FWIW. I would guess Kindle isn't unique in this behavior, but I don't have other devices to check. |
https://pagedjs.org/
The text was updated successfully, but these errors were encountered: