-
Notifications
You must be signed in to change notification settings - Fork 60
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Validation of SVG #1323
Comments
Currently SVG2 is not a recommendation and there is no SVG2 specific version indication, but of course authors can use metada for example with dublin core to indicate which version they have used. EPUB 3.2 does not indicate, which version to use, therefore it is up to the authors, what to use: For a check program there can be different approaches helpful for authors in case of a missing version indication: |
Just for the record: w3c/epubcheck#1114 if is also relevant if it comes to discussion in the (C|W)G on real world SVG, although it touches upon a more general issue regarding the usage (or not) of external references |
Let me ask a naive question. Does epubcheck rely on SVG schemas prepared by us? Or, does it now rely on validator.nu? |
Mostly this. We're trying not to fundamentally differ from the results that web validation produce, but we do extend the schemas for epub-specific rules. |
I question whether we should be bothering to check for validity of SVG documents. Per Doktorchen's comment it is not trivial, and it is hard to determine if you have even done it correctly. What do we gain from it? It seems like it only tells us that there exists or existed at some point a schema that validated a document. Unlike HTML, SVG doesn't really add much semantic meaning for reading systems from the structure of the file, what is most important is the ability to render the document and the inclusion of any a11y features, neither of which are guaranteed by validity. Additionally, SVGs are rarely edited by hand, so content creators are at the mercy of graphics tools to generate documents that are valid to some schema someone somewhere might use to check the document. If we do plan to keep validity, we should make it very clear which schema(s) SVG must be valid to. |
How do we square this with the restrictions in the specification? Are we just loosening epubcheck and in theory having restrictions, keeping the epub-specific stuff and dropping validity checks against any specific version of svg, or removing all the content requirements (even that the document be well-formed xml)? |
Excellent question! And looking at the spec ... which restrictions? Looks like we lost validity for both XHTML and SVG somewhere between 3.0 and 3.2. Specifically, in 3.0 we said:
But in 3.2 we say:
Similar is true for XHTML. So is this simply an error in epubcheck? Should we not be checking for validity anymore? |
It just looks like that reference is dead. We didn't move to drop validity unless you remember a resolution to that effect? There just isn't a schema defined in SVG and we can't point at epubcheck. I expect it should (if we keep validation) be updated to cite the "Conforming XML-compatible SVG Markup Fragments" conformance class, which would let us drop the XML conformance bullet. But, regardless, we still have some content requirements that aren't related to general validity. The identifying of XHTML fragments, for example, the restriction on title, etc. Do we check these if we don't check against a default schema? We probably should make a normative statement for XHTML, too. It looks like we've always relied on this in the relationship section to blanket cover validity to the specification:
|
EPUB is intended to be in sync with the reality of the Web. Meanwhile, the longevity of EPUB publications is crucial. I think that relying on validator.nu (including the choice of schemas for SVG) is probably the best approach. Dropping validity checking from epubcheck may well endanger the longevity of EPUB publications. |
Hunting around a bit, we dropped the validity statement in 3.1 but I can't find any specific mention of it in the minutes or issues. But it still looks like an omission to have not updated the reference, as the one in embedded SVG was updated to the fragment class in 3.2. For standalone, I put the wrong class above, though. It should be Conforming SVG Stand Alone-Files |
I am not sure how this works with the spec. Specifically, this is not a discussion about epubcheck, though I expect it will have implications there, it is an EPUB question. Given my reading of the spec, there is currently no validity requirement for SVG. We may want to put that requirement back, but I am not sure how to do that and maintain references to living documents. Do you have a proposal for how to draft such a requirement? I do think that if we can't add a requirement to the spec we shouldn't require validity in epubcheck, although we could allow for optional checks by passing a version to epubcheck (or some other mechanism). |
And, for what it is worth, I like Matt's proposal. |
I believe Matt's proposal would still keep w3c/epubcheck#1114 open. To quote that issue: SVG 1.1, per spec, relies on:
Because the The problem with this illustrates Brady's comment:
As a typical example, Adobe Illustrator systematically puts that DOCTYPE into a generated SVG files (or it did until the latest release, I did not check the last one). This goes beyond SVG, mind you. MathML has the same problem afaik. I believe we may have to come back to the problem of how to handle XML entities. B.t.w., the possible solution that came up w3c/epubcheck#1114 is to explicitly put exceptions to the entity rules concerning some standard DTD-s (that XML parsers are not really required to fetch anyway). |
In SVG 1.1.2 (second edition) it is mentioned about the doctype 'It is not recommended that a DOCTYPE declaration be included in SVG documents.' Therefore a version indication can be expected in the root/top most svg element with a version attribute with values like '1.2', '1.1' or '1.0', if this matters for the author. Version 2.0 is still in work, it has no own version indication - there is no way to identify it properly for validation. And several modules of '2.0' are currently still only drafts. The current CR for 2.0 is only a subset missing even some major parts present in tiny profiles, therefore it is not ready for pratical use until all those additional modules become recommendations. Note, that the DTDs do not provide checks for proper complex attribute values, essential for the correct presentation of SVGs. What is additionally available as EBNF seems to contain sometimes bugs in the 2.0 CR, this is not reliable, 1.2 tiny and 1.1.2 may contain more reliable information. Taking into account that 2.0 as well as HTML5 are matter of change, it seems to be not helpful for authors anymore, to test only or at all 'current' rules (whatever it might mean for a checking program, this will often differ from the point of view of an author), especially because EPUB 3.0 and EPUB 3.2 have the same version indications, but refer to different variants of SVG or HTML5. Even more, it might ease the transition from EPUB2 to EPUB3.x for some people maybe, if it would be possible to continue to use XHTML1.1 (+RDFa) as well in EPUB3.x. Finally, as long as the XML structure is wellformed, every EPUB presentation program should be able at least to provide an accessible presentation of the content with a user-agent-stylesheet as suggested in HTML5 for all variants of (X)HTML. This would be a helpful requirement: To provide an exclusive switch between (alternative) author stylesheets and a simple user-agent-stylesheet. |
Suppose that we have an invalid XML document as part of an EPUB publication. Some existing EPUB readers do not use browser engines. Will all existing EPUB RSs provide reasonably similar results? I have no ideas. If epubcheck ensures validity, the longevity of EPUB publications is more reliable.
Because the longevity of EPUB publications is extremely important for publishers. |
Longevity would be very important for me as well as an author, would be as well for libraries and archivists. If authors or publishers have to indicate used recommendations for example using a related Dublin Core term, surely there is a low chance, that a checking program will recognise it. Only archivists might do this (in theory). If W3C and other organisations continue to propagandise tag soup formats/versions, digital books/texts will be simply reduced to short living disposable products. |
The issue was discussed in a meeting on 2021-05-07
View the transcript4. Validation of SVGSee github issue #1323. Dave Cramer: this is the question of how to validate SVG, given that there are so many kinds of SVG Ivan Herman: first, the DTD problem has been settled Dave Cramer: perhaps we should postpone this issue to a later call, we are missing some members today
|
The issue was discussed in a meeting on 2021-05-13
View the transcript6. SVG ValidationSee github issue #1323. Dave Cramer: one of the complicating factors is undated references to specs, and now we have SVG1 and SVG2 Brady Duga: would prefer if we didn't validate SVG Matt Garrish: compounding the problem is that Brady Duga: i understand need for validity of XHTML, as XHTML is often edited by hand Dave Cramer: is there a requirement for well-formedness of XML? Ben Schroeter: what about the XML entities in the DTDs of the SVG? Dave Cramer: we're not touching anything about rendering or processing, just saying that spec does not require validity, therefore epubcheck does not need to check validity Brady Duga: if we don't have requirement for validity, then Ivan's issue goes away Matt Garrish: i we've always had requirement that things conform to XHTML syntax Dave Cramer: maybe our current action item is to talk to Romain about how this would affect epubcheck |
I presume that referred to w3c/epubcheck#1114. But that issue is already gone in the current version of the spec which explicitly lists SVG (and MatML) DTD-s as acceptable. |
To help the discussion, here is the summary of what SVG says about conformance. The SVG2 spec defines SVG Conformance classes:
It is all a bit convoluted, because (I presume) the SVG2 spec is prepared for SVG content embedded in HTML, including the looser syntax of HTML vs. XHTML. I guess for our content documents the reference to (5) is the correct one, and that is (almost) what we have in the spec. Almost, because (I think @mattgarrish mentioned that some somewhere) in §3.2.2 SVG Requirement the (correct) links refers to "SVG document Fragment"; it should rather say "Conforming SVG Stand-Alone Files". |
I believe that, ideally, we should (1) keep what we have and (2) let us rely on CC @rdeltour |
I think what we have now (unless @mattgarrish has changed it) is no validity requirement, but that was likely an accidental change.
That is precisely the issue. The validation is causing real harm today, forcing ignore list updates to process SVGs. There have been claims of potential harm caused by not requiring validity for SVG, but I have not seen any specific cases (real, identifiable cases of harm that were avoided by validation of SVG). The evidence currently is that validation is causing more harm than good, but I am happy to review evidence to counter that. |
Nope, we're still lacking a validity requirement for svg as far as I can tell. All I fixed was the reference to the standalone definition, but that corresponds to 5. in Ivan's list above. The SVG has to be well-formed XML and the IDs have to be unique (which is also all accessibility requires) but the markup doesn't have to be valid. We have to separately require conforming SVG DOM subtrees (1. in the list) as far as I can tell. None of the definitions appear to refer to it. (By comparison, for XHTML we require both that a document "MUST be an [HTML] document that conforms to the XHTML syntax" and that it "conform to the conformance criteria for all document constructs defined by [HTML] unless explicitly overridden in § 3.1.4 HTML Deviations and Constraints".) The problem with relying on what validator.nu implements is that its implementation is not complete, so what you're asking is that vendors tolerate invalid SVG content or that epubcheck has to fill in missing validity constraints as users stumble on them. Is either of these options really better than laxer validation that only requires well-formedness and the few other requirements in the definitions? There are also likely options for validating SVG in epubcheck without needing the specification to be strict about validity. I believe we're using nvdl to validate xhtml, so that should make it possible to validate embedded svg separately from the containing document, but this is where we need @rdeltour's input. If that is the case, though, perhaps SVG validity problems could be output as info messages rather than as warnings or errors? |
I was fighting with this yesterday, looking through the SVG spec and (re-reading it again) indeed it looks as if we would have to refer to the DOM conformance separately. Meaning I was wrong: we indeed do not have a validity requirement at this moment, beyond what, essentially, XML validity requires. (I must admit I was a bit surprised by the way the SVG spec defines all this, the HTML conformance seems to be way clearer.)
Taking also into account that SVG itself is still a bit of a moving target (in CR, without change, since 2 1/2 years...) maybe we can indeed say that we do not require SVG validation for now.
+1 to that (if it is possible). I still wonder whether
|
The issue was discussed in a meeting on 2021-05-21 List of resolutions:
View the transcript1. Validation of SVGSee github issue #1323. Dave Cramer: Validation of svg Ivan Herman: Fundamentally agree Dave Cramer: Weird case since epubcheck is currently out of sync Matt Garrish: epubcheck follows the spec, but not entirely bound to it
|
EPUB 3.3 no longer requires that SVG content conforms to SVG content model requirements, only that they are well-formed, that ID are uniques, and that they respect some additional EPUB-specific requirements. This commit: * introduces a new permissive RelaxNG schema for SVG, checking only the EPUB-specific requirements on the `title` and `foreignObject` content model see w3c/epub-specs#1323 * removes checks on the value of the `requiredExtensions` attribute of `foreignObject` see w3c/epub-specs#1087 * adapts the main XHTML to SVG schema driver to the new permissive SVG schema * adds various tests for EPUB-specific requirements
EPUB 3.3 no longer requires that SVG content conforms to SVG content model requirements, only that they are well-formed, that ID are uniques, and that they respect some additional EPUB-specific requirements. This commit: * introduces a new permissive RelaxNG schema for SVG, checking only the EPUB-specific requirements on the `title` and `foreignObject` content model see w3c/epub-specs#1323 * removes checks on the value of the `requiredExtensions` attribute of `foreignObject` see w3c/epub-specs#1087 * adapts the main XHTML to SVG schema driver to the new permissive SVG schema * adds various tests for EPUB-specific requirements
See w3c/epubcheck#1172. The WG should probably talk about how to deal with a world with SVG 1.1 and SVG2
The text was updated successfully, but these errors were encountered: