-
Notifications
You must be signed in to change notification settings - Fork 17.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
encoding/xml: accepts various ill-formed XML #68293
Comments
Unfortunately our experience with encoding/xml is that making it more strictly inevitably breaks existing working code. |
We could introduce new API. Or in some cases it might be more appropriate to introduce a |
Maybe it would be better to choose a 3rd party library to handle this? Do we have such an option now? |
I’m not aware of an open-source pure-Go third-party XML tokenizer. |
It seems the encoding/xml package is broken in various ways. The solution would be to make a new encoding/xml/v2 package ad to not break backwards compatibility. In doing so the API can also be improved. However this is a large effort, and I suspect someone else than the Go dev team will have to do it. |
That it is. There are several problems with it:
|
@DemiMarie my meaning is that this would be a great occasion to "scratch your own itch" and make such a Go language package for the benefit of the whole community. The Go developers likely have their hands full with other issues. |
@bjorndm I don’t have anywhere near enough time for that, sorry. If they have interest, I think @russellhaering might be a good choice, since they have had to deal with the limitations of |
My current thoughts:
|
Sorry, but looking at the current use of the xml package your point 4 is not correct. For example for generating and parsing SVG or other XML documents it is necessary to generate and parse partial XML as well. For such use cases parsing is necessarily less strict. |
Do you have an example @bjorndm? |
Well, excelize and svgo come to mind. https://github.com/qax-os/excelize |
Can you provide specific files that need to be parsed? |
Excelize parses and generates Microsoft Excel xls files. |
Do either of those formats use DTDs? If not, they only need support for parsing & generating well-formed XML toplevel documents. |
These formats can use DTDs. However, in practice, these two libraries and many others use encoding/xml to generate and parse XML fragments, often using xml.Marshal and XML Unmarshal. The generation of xml fragments is this an important use case. |
Are these fragments ever not well-formed themselves? |
These fragments are likely to be well formed, but they are not complete XML documents as they do not have the headers. |
Well-formedness is sufficient here. |
Go version
go version go1.21.11 linux/amd64
Output of
go env
in your module/workspace:What did you do?
https://go.dev/play/p/r8y4cgcybkS
What did you see happen?
encoding/xml
accepts ill-formed XML.What did you expect to see?
encoding/xml
should reject all ill-formed XML. Except for the lasttryUnmarshal()
call I linked, the constraints that it fails to check can be checked for without resolving namespaces, and therefore can be checked for even byRawToken()
.See:
:
followed by a character that cannot start a Name #68392Edit: removed #53728 because it is about serialization, not parsing.
The text was updated successfully, but these errors were encountered: