Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

implement pdf output with pagedjs #215

Open
eshellman opened this issue Feb 6, 2024 · 24 comments
Open

implement pdf output with pagedjs #215

eshellman opened this issue Feb 6, 2024 · 24 comments

Comments

@eshellman
Copy link
Collaborator

https://pagedjs.org/

@eshellman
Copy link
Collaborator Author

first try, with zero customization, is pretty impressive. https://www.dropbox.com/scl/fi/csg704t7252c0l8jgfp07/10636.pdf?rlkey=wlhpem4xpehflhin0w7i96obo&dl=0
there is an issue with the backlinks on the citations, probably caused by absolute positioning. The file was generated from https://www.gutenberg.org/cache/epub/10636/pg10636-images.html

@eshellman
Copy link
Collaborator Author

removing the position: absolute rules fixes the only problem I see here.

@asylumcs
Copy link
Contributor

asylumcs commented Feb 6, 2024

does that fix the footnotes as well?

@eshellman
Copy link
Collaborator Author

Yes, removing the position:absolute from the text's css fixes the example above.

We probably want to add header and footer text as described here: https://pagedjs.org/documentation/7-generated-content-in-margin-boxes/ I will start with "Project Gutenberg, https://gutenberg.or/ebooks/#####" on the bottom and the book's make_pretty_title(size=80) on the top. I'm thinking that pagenumbers are going to be confusing, so better to leave them out? Also, the default config has a gutter, I think that shoule be omitted.

Are people going to want different paper sizes?

@eshellman
Copy link
Collaborator Author

46419.pdf

This example, chosen because it has music, is definitely less polished. We see that all the links to sound and pdf are removed (nice trick!) but also there are some missing images, including half the score on page 148. see https://gutenberg.org/cache1/epub/46419/46419-h.html#Page_132

@eshellman
Copy link
Collaborator Author

55215.pdf
This one looks very good - I've added the header and footer text, and made sure we have page breaks after/before our boilerplate header/footer.

@eshellman
Copy link
Collaborator Author

There's something funny with measurements in pagedjs. I can't zero the left margin, and images get dropped even though they should fit. I think I've tricked it into working. Here's a sample output for the most recently released book: pg72955.pdf

@gbnewby
Copy link
Collaborator

gbnewby commented Feb 22, 2024

#72955 looks good. The only anomaly I noticed is at the very end: the Footnotes have some over-striking.

I think adding generated page numbers in the footer would make sense. If nothing else, it will help people keep track of what page they were on. Of course, those books with paginated indices will be totally wrong - but they usually also have named anchor hyperlinks to the right place in the books.

If possible, I think a nice header/footer would be something like this
Header: Chapter title (centered, possibly truncated)
Footer: Project Gutenberg (left), book title (centered, possibly truncated), pagenumber (right)

... though I'm not sure whether the footer will look too busy? Page number could go top right, instead.

... we could also consider different header/footer combinations for alternate pages, like printed books often have.

@eshellman
Copy link
Collaborator Author

eshellman commented Feb 22, 2024

It took a while to figure out how to fix the overlapping text in 10636. Good to learn more about pagedjs. There was also a problem with margins, which turned out to be the same issue.

pagedjs works by manipulating the page's DOM using flexbox. Each page works like a column in a very very wide page. As a result, the body element, and styles attached to the body element, don't work properly. To fix this, we'll need to remove all css properties from the body element an re-attach them to the div element introduced by the pagedjs' scripts manipulation. So in 10636, for example, we need to replace:

    margin-left: 10%;
    margin-right: 10%
    }

with:

@media screen {
body {
    margin-left: 10%;
    margin-right: 10%
    }}
.pagedjs_page_content > div {
    margin-left: 10%;
    margin-right: 10%
    }

so ebookmaker will need to do a bit of work to so that our files will render beautifully.

@eshellman
Copy link
Collaborator Author

In another text, I discovered that pagedjs has problems with overflow: auto. We'll need to change those to 'overflow: visible` with media query rules. This is to be expected because pdfs don't have scroll bar boxes!

@eshellman
Copy link
Collaborator Author

Performance will be an issue as well; most likely we'll want to run pdf re-rendering separately from our other ebook production.

@gbnewby
Copy link
Collaborator

gbnewby commented Feb 22, 2024 via email

@eshellman
Copy link
Collaborator Author

eshellman commented Feb 22, 2024

Running head chapter titles is doable, but probably not for most of the backfile. The suggested way is to use heading elements, for example, h2. Unfortunately the backfile is inconsistent with the use of headings, for example by using multiple h2 elements to make line breaks. So we would get errata reports from this.

Prospectively, we can certainly do this, by asking submitters for specific markup for chapter titles. So in my tests, I've used the book title in the running head, using the version of the title that omits subtitle.

For the Footer, I've been trying "Project Gutenberg, ". but the method for getting the url for the book is currently not working.

Page numbers are tricky, and need discussion. Many books include original page numbers with reasonably uniform markup, and these could be printed in the side margin, for example. If we print pdf page numbers there will be producers who want the front and back matter numbered separately.

Crazy idea: maybe print percentages?

@eshellman
Copy link
Collaborator Author

Or even crazier, a percentage bar? (not hard in css)

@gbnewby
Copy link
Collaborator

gbnewby commented Feb 22, 2024 via email

@eshellman
Copy link
Collaborator Author

OK here's a sample with page numbers and running heads and foots (first 300 pages)
10636.pdf

For this book, h3 would have been better for heads, but we can only pick one thing. An empty h2 sets the head empty for 200 pages or so.

there's some text overlap on p54-57, but overall I think this is spectacular!

@gbnewby
Copy link
Collaborator

gbnewby commented Feb 22, 2024 via email

@gbnewby
Copy link
Collaborator

gbnewby commented Feb 22, 2024 via email

@eshellman
Copy link
Collaborator Author

Also, I don't think you need PG in both the header and footer. Just footer is enough On Thu., Feb. 22, 2024, 12:01 p.m. Greg Newby, @.> wrote:

Looks great! I had suggested earlier, for EPUB, that the first page should be the cover image. Then, boilerplate can be the 2nd page .. i.e., a verso page On Thu., Feb. 22, 2024, 11:37 a.m. Eric Hellman, @.
> wrote: > OK here's a sample with page numbers and running heads and foots (first > 300 pages) > 10636.pdf > https://github.com/gutenbergtools/ebookmaker/files/14378012/10636.pdf > > For this book, h3 would have been better for heads, but we can only pick > one thing. An empty h2 sets the head empty for 200 pages or so. > > there's some text overlap on p54-57, but overall I think this is > spectacular! > > — > Reply to this email directly, view it on GitHub > <#215 (comment)>, > or unsubscribe > https://github.com/notifications/unsubscribe-auth/AFQRDLRVFBP5J3WRISJ4LXTYU6GAHAVCNFSM6AAAAABC3C4AHKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSNRQGAZTOMZZGU > . > You are receiving this because you commented.Message ID: > @.***> >

I agree, but the PG comes from the first H2 and disappears after the front matter. The running text selector are currently rather limited.

@eshellman
Copy link
Collaborator Author

Looks great! I had suggested earlier, for EPUB, that the first page should be the cover image. Then, boilerplate can be the 2nd page .. i.e., a verso page

On Thu., Feb. 22, 2024, 11:37 a.m. Eric Hellman, @.> wrote: OK here's a sample with page numbers and running heads and foots (first 300 pages) 10636.pdf https://github.com/gutenbergtools/ebookmaker/files/14378012/10636.pdf For this book, h3 would have been better for heads, but we can only pick one thing. An empty h2 sets the head empty for 200 pages or so. there's some text overlap on p54-57, but overall I think this is spectacular! — Reply to this email directly, view it on GitHub <#215 (comment)>, or unsubscribe https://github.com/notifications/unsubscribe-auth/AFQRDLRVFBP5J3WRISJ4LXTYU6GAHAVCNFSM6AAAAABC3C4AHKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSNRQGAZTOMZZGU . You are receiving this because you commented.Message ID: @.>

If it were easy, it would have already been done.

@eshellman
Copy link
Collaborator Author

This ia not nearly as good as I thought, there's a vertical margin problem that is dropping the bottom two lines across many page breaks

@eshellman
Copy link
Collaborator Author

Turns out I found a bug with use of blockquote. I think the effort of chasing down the many problems exposed by PG's use will result in big benefits for book production in general.

@gbnewby
Copy link
Collaborator

gbnewby commented Feb 23, 2024 via email

@tangledhelix
Copy link

Crazy idea: maybe print percentages?

That's not crazy at all - for example Kindles do exactly that.

I couldn't remember the exact behavior so I grabbed my Kindle Paperwhite just now, where I'm currently smooth-reading. In the footer it displays a page number in the lower left, and a percentage in the lower right.

The percentage is surely auto-calculated by the device. The page number doesn't change on every page-turn, and sometimes a page number is skipped over. I assume page numbers come from the epub3 file. Perhaps the page number that's in effect at the beginning of the current viewport.

Here's an example, FWIW. I would guess Kindle isn't unique in this behavior, but I don't have other devices to check.

IMG_0794

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants