Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Check for the correct SVGDOM implementation version in Transcoder #68

Closed
carlosame opened this issue Dec 9, 2022 · 7 comments · Fixed by #70
Closed

Check for the correct SVGDOM implementation version in Transcoder #68

carlosame opened this issue Dec 9, 2022 · 7 comments · Fixed by #70
Assignees
Labels
enhancement New feature or request
Milestone

Comments

@carlosame
Copy link
Member

Currently, in TranscoderInput the possibility exists to pass a SVG 1.2 input document contained in a SVGDocument coming from SVGDOMImplementation (1.1) instead of the correct SVG12DOMImplementation (1.2). This is presumably the source of issues like #3.

I'm currently working with the transcoder stuff and perhaps it would be convenient to check for the actual SVG version of the SVGDocument and use the 1.2 factory where convenient (unless KEY_DOM_IMPLEMENTATION was explicitly set to a 1.1 SVGDOMImplementation).

This implies removing the current default value for KEY_DOM_IMPLEMENTATION which would now be null, otherwise I cannot know whether a 1.1 implementation was specified purposely.

I've already implemented it and everything seems fine. Any thoughts @DaveJarvis? Anyone would have objections to this change being committed?

@carlosame carlosame added the enhancement New feature or request label Dec 9, 2022
@carlosame carlosame added this to the 0.2.2 milestone Dec 9, 2022
@carlosame carlosame self-assigned this Dec 9, 2022
@ghost
Copy link

ghost commented Dec 9, 2022

https://github.com/DaveJarvis/keenwrite/blob/main/src/main/java/com/keenwrite/preview/SvgRasterizer.java

The KEY_DOM_IMPLEMENTATION isn't directly used in my project. A few related ideas:

  • Being able to remove BufferedImageTranscoder would be great. This implies that ImageTranscoder would no longer write to standard error when an exception (unhandled or otherwise) is encountered. That is, deprecate and/or remove ImageTranscoder::displayError.
  • Most SVG documents that don't have a version specified should presumably be parsed as version 1.1. If an unknown tag is encountered, perhaps the document can be re-parsed as 1.2. If an unknown tag is still encountered, a key to force ignoring unknown tags and continue parsing/rasterizing could be useful. For myself, I'd rather see a wonky/incomplete/wrong rasterized version of the SVG file returned than no rasterized content at all. Having a way to retrieve a list of errors encountered while rasterizing would be useful (rather than having to trap individual exceptions).

The commit isn't tagged in this issue. Mind adding adding a link to the commit hash?

carlosame added a commit that referenced this issue Dec 9, 2022
Also, try to use a SVG 1.2 implementation when a 1.2 document is found in a DOM
document coming from a version 1.1 implementation.

Fixes #69
Fixes #68
@carlosame
Copy link
Member Author

The KEY_DOM_IMPLEMENTATION isn't directly used in my project. A few related ideas:

Yes but KEY_DOM_IMPLEMENTATION, by default, has a 1.1 factory. My proposal is to remove that default.

  • Being able to remove BufferedImageTranscoder would be great. This implies that ImageTranscoder would no longer write to standard error when an exception (unhandled or otherwise) is encountered. That is, deprecate and/or remove ImageTranscoder::displayError.

Not sure how you would avoid having BufferedImageTranscoder. You mean deprecating UserAgent.displayError? That's how processing by the user agent is supposed to work, I don't see how this could ever be deprecated.

  • Most SVG documents that don't have a version specified should presumably be parsed as version 1.1. If an unknown tag is encountered, perhaps the document can be re-parsed as 1.2.

By unknown tag you mean 1.2? Version 1.2 tags inside 1.1 documents are already tolerated.

If an unknown tag is still encountered, a key to force ignoring unknown tags and continue parsing/rasterizing could be useful.

That should be happening if you do not force validation, but is outside of the scope of this bug.

Having a way to retrieve a list of errors encountered while rasterizing would be useful (rather than having to trap individual exceptions).

That's where you have to subclass UserAgent.

The commit isn't tagged in this issue. Mind adding adding a link to the commit hash?

See PR #70.

carlosame added a commit that referenced this issue Dec 9, 2022
Also, try to use a SVG 1.2 implementation when a 1.2 document is found in a DOM
document coming from a version 1.1 implementation.

Fixes #69
Fixes #68
@ghost
Copy link

ghost commented Dec 9, 2022

Right, BufferedImageTranscoder would need to stay. The default error handler is where the issue resides:

Libraries should not write to standard error or standard output. (Applications use standard output for their own purposes, such as creating CSV files or other structured text formats. A third-party library that dumps its output to stdout would invalidate the application's output. KeenWrite, for example, can write XHTML documents to stdout and so I had to override this behaviour to avoid creating malformed XHTML documents. The DefaultErrorHandler violates the Principle of Least Surprise by writing to stdout [and stderr].)

By unknown tag I meant an unknown XML tag that's neither SVG 1.1 nor SVG 1.2.

The PR looks okay. Could consider .htm filename extensions, as well. Also, I didn't see any case conversion, so hopefully the algorithm allows for .HTML, .Html, .Xhtml, and other variations.

@carlosame
Copy link
Member Author

Libraries should not write to standard error or standard output. (Applications use standard output for their own purposes, such as creating CSV files or other structured text formats. A third-party library that dumps its output to stdout would invalidate the application's output. KeenWrite, for example, can write XHTML documents to stdout and so I had to override this behaviour to avoid creating malformed XHTML documents. The DefaultErrorHandler violates the Principle of Least Surprise by writing to stdout [and stderr].)

The solution is to override that behaviour, as you did. It has always worked this way, and it is likely that quite a few people would become upset if it changed.

By unknown tag I meant an unknown XML tag that's neither SVG 1.1 nor SVG 1.2.

If you find problems with a non-validating parser/transcoder, please open an issue.

The PR looks okay. Could consider .htm filename extensions, as well. Also, I didn't see any case conversion, so hopefully the algorithm allows for .HTML, .Html, .Xhtml, and other variations.

The extensions are only for the testing infrastructure: The library uses KEY_DOCUMENT_ELEMENT (that you must set to html) and KEY_DOCUMENT_ELEMENT_NAMESPACE_URI (also set to the XHTML ns uri).

It should not even need that (I personally find somewhat silly that one must know the document element and the NS URI in advance, just to parse a document from a stream), but it's how it always has worked and we aren't going to change it now.

People hate changes to the old Batik API even when there are good reasons for it.

@ghost
Copy link

ghost commented Dec 9, 2022

The solution is to override that behaviour, as you did. It has always worked this way, and it is likely that quite a few people would become upset if it changed.

That's appeal to tradition, which isn't a great reason to keep poor decisions around. I don't know how many people are using EchoSVG at the moment, but I'd change it and let people override to add writing to standard output if they needed it. My guess is that they'd be in the vast minority, but without a survey, it's anyone's guess.

@carlosame
Copy link
Member Author

The solution is to override that behaviour, as you did. It has always worked this way, and it is likely that quite a few people would become upset if it changed.

That's appeal to tradition, which isn't a great reason to keep poor decisions around. I don't know how many people are using EchoSVG at the moment, but I'd change it and let people override to add writing to standard output if they needed it. My guess is that they'd be in the vast minority, but without a survey, it's anyone's guess.

There is no "appeal to tradition" when we are talking about a software that was written about 20 years ago. With legacy software one talks about backward compatibility.

@ghost
Copy link

ghost commented Dec 9, 2022

It's a toughie, that's for sure. For me, I'd prefer erring on the side of not introducing latent bugs in unsuspecting developers' code. :-) Like there's nothing that tells any developer that their stderr/stdout streams could be corrupted by this library. Maybe consider adding a warning somewhere in the README.md file, then?

(Again, I wouldn't be overly concerned with being 100% backwards compatible with Batik in this particular case, because they got this one wrong, unfortunately.)

carlosame added a commit that referenced this issue Dec 10, 2022
Also, try to use a SVG 1.2 implementation when a 1.2 document is found in a DOM
document coming from a version 1.1 implementation.

Fixes #69
Fixes #68
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant