Skip to content

Commit

Permalink
fix: preserve classes on <html> and <body> tags in EPUBs (#362)
Browse files Browse the repository at this point in the history
* fix: preserve classes on <html> and <body> tags in EPUBs

* refactor: use existing DOMParser and use body attribute of HTMLDocument

* feat: add extra classes to identify body wrapper and html wrapper

* fix: try parsing EPUB content as XML if parsing it as HTML produces an empty body

* fix: use childNodes?.length rather than childElementCount to make sure text nodes are counted

* fix: more precise error message when failing to find body content
  • Loading branch information
duxovni authored Sep 10, 2024
1 parent 9c4585a commit d2485b3
Showing 1 changed file with 29 additions and 13 deletions.
42 changes: 29 additions & 13 deletions apps/web/src/lib/functions/file-loaders/epub/generate-epub-html.ts
Original file line number Diff line number Diff line change
Expand Up @@ -139,32 +139,48 @@ export default function generateEpubHtml(
htmlHref = itemIdToHtmlRef[itemIdRef];
}

const regexResult = /.*<body(?:[^>]*id="(?<id>.+?)")*[^>]*>(?<body>(.|\s)+)<\/body>.*/.exec(
data[htmlHref] as string
)!;
let parsedContent = parser.parseFromString(data[htmlHref] as string, 'text/html');
let body = parsedContent.body;

const bodyId = regexResult?.groups?.id || '';
let innerHtml = regexResult?.groups?.body || '';
if (!body?.childNodes?.length) {
parsedContent = parser.parseFromString(data[htmlHref] as string, 'text/xml');
body = parsedContent.querySelector('body'); // XMLDocument doesn't seem to have the body property

if (!body?.childNodes?.length) {
throw new Error('Unable to find valid body content while parsing EPUB');
}
}

const htmlClass = parsedContent.querySelector('html')?.className || '';
const bodyId = body.id || '';
const bodyClass = body.className || '';
let innerHtml = body.innerHTML || '';

blobLocations.forEach((blobLocation) => {
innerHtml = innerHtml.replaceAll(
relative(htmlHref, blobLocation),
buildDummyBookImage(blobLocation)
);
});
const childDiv = document.createElement('div');
childDiv.innerHTML = innerHtml;
childDiv.id = `${prependValue}${itemIdRef}`;

const childBodyDiv = document.createElement('div');
childBodyDiv.className = `ttu-book-body-wrapper ${bodyClass}`;
if (bodyId) {
const anchorHelper = document.createElement('span');
anchorHelper.id = bodyId;
childDiv.prepend(anchorHelper);
childBodyDiv.id = bodyId;
}
childBodyDiv.innerHTML = innerHtml;

const childHtmlDiv = document.createElement('div');
childHtmlDiv.className = `ttu-book-html-wrapper ${htmlClass}`;
childHtmlDiv.appendChild(childBodyDiv);

const childWrapperDiv = document.createElement('div');
childWrapperDiv.id = `${prependValue}${itemIdRef}`;
childWrapperDiv.appendChild(childHtmlDiv);

result.appendChild(childDiv);
result.appendChild(childWrapperDiv);

currentCharCount += countForElement(childDiv);
currentCharCount += countForElement(childWrapperDiv);

const mainChapterIndex = mainChapters.findIndex((chapter) =>
chapter.reference.includes(htmlHref.split('/').pop() || '')
Expand Down

0 comments on commit d2485b3

Please sign in to comment.