Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Should "external identifiers" be "external entities"? #1881

Closed
dlazin opened this issue Oct 28, 2021 · 3 comments · Fixed by #1900
Closed

Should "external identifiers" be "external entities"? #1881

dlazin opened this issue Oct 28, 2021 · 3 comments · Fixed by #1900
Labels
EPUB33 Issues addressed in the EPUB 3.3 revision Spec-EPUB3 The issue affects the core EPUB 3.3 Recommendation Topic-XML The issue affects XML processing

Comments

@dlazin
Copy link
Contributor

dlazin commented Oct 28, 2021

https://www.w3.org/TR/epub-rs-33/#confreq-rs-xml-extid says:

when processing XML documents, it MUST NOT resolve external identifiers [XML].

But the link, to https://www.w3.org/TR/2008/REC-xml-20081126/#NT-ExternalID, refers to "external entities." Are these the same thing? Were they renamed in the XML spec? "External identifiers" does appear a few times in the linked doc, but not many.

I'm guessing that the identifier is the thing that refers to the entity, and the entity is the target itself, so not resolving identifiers is perhaps correct, but best to check and perhaps (?) to rephrase.

@dauwhe
Copy link
Contributor

dauwhe commented Oct 28, 2021

I'm wondering what the purpose of this restriction is.

It's an extreme example, but what if you had an EPUB with docbook falling back to HTML. It would be really common to have:

<!ENTITY ch01 SYSTEM "ch01.sgm">
<!ENTITY ch02 SYSTEM "ch02.sgm">
<!ENTITY ch03 SYSTEM "ch03.sgm">
<!ENTITY ch04 SYSTEM "ch04.sgm">

<book>
&ch01;
&ch02;
&ch03;
&ch04;
</book>

and in EPUB have <item href="book.xml" media-type="application/xml" fallback="book-html"/>

What harm are we preventing here? Do the commonly-used XML files in EPUB like container and package allow external entities anyway?

@mattgarrish
Copy link
Member

External identifier is only defined once as the ExternalID notation that is linked to (the XML spec reuses its definitions). So even though it appears under external entities, all references to external identifiers in the spec link to that notational description.

Why we have it goes back to #1338 and #1368 when we were discussing allowing the SVG doctype by pushing the requirement to ignore external identifiers onto reading systems, since that's where the security issue is.

The same ExternalID reference is used in the definition for doctypes. It would also ban external identifiers in notations.

If we ban external identifiers completely, as we did before, we disallow the doctypes again.

@mattgarrish mattgarrish added the Topic-PublicationResources The issue affects support for publications resources label Oct 28, 2021
@mattgarrish
Copy link
Member

What if we add the three affected syntaxes to the end:

when processing XML documents, it MUST NOT resolve external identifiers in DOCTYPE, ENTITY and NOTATION declarations [XML].

I don't see how else we could easily work around this quirk of the XML spec.

@mattgarrish mattgarrish added the EPUB33 Issues addressed in the EPUB 3.3 revision label Nov 11, 2021
@mattgarrish mattgarrish added the Spec-EPUB3 The issue affects the core EPUB 3.3 Recommendation label Sep 14, 2022
@mattgarrish mattgarrish added Topic-XML The issue affects XML processing and removed Topic-PublicationResources The issue affects support for publications resources labels Oct 4, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
EPUB33 Issues addressed in the EPUB 3.3 revision Spec-EPUB3 The issue affects the core EPUB 3.3 Recommendation Topic-XML The issue affects XML processing
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants