-
Notifications
You must be signed in to change notification settings - Fork 19
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Roll back token
datatype to avoid problematic character references
#181
Comments
Thanks for adding this @wendellpiez, so are you working on this moving forward or should I learn the dark arts of these regex patterns? :-) |
I would like to hear from @david-waltermire-nist what he thinks the right choice is here, on balance, from among the remedies proposed so far, or some other. Having discussed it I would welcome the help implementing a correction in the XSLT M4 pipeline. However, it is also unclear to me where this should be done. The bug we are addressing is not actually in the XSD but in a single consuming implemention (we know of); and a workaround is feasible for those cases. @david-waltermire-nist would necessarily have to be involved in any integration since this is the Metaschema support infrastructure. (And it will have a direct impact on tooling he is developing!) |
@david-waltermire-nist with AJ's help we have managed to demonstrate:
(It turns out a test for this is easy: just try and make an XSD definition for an element named So I think the solution here (on the XML side) is to roll back the definition to Issue #182 might give us a reason to do otherwise, but it could also be orthogonal. |
Yeah I hope that helped. I mothballed the project for now but we can always revive and add more tests. I integrated .NET code with GitHub Actions on a Windows runner so we can further test issues with this particular XML engine in the .NET Let me know if I can be of more help. |
…moving references to upper-Unicode characters that break at least one processor - this is not actually relaxed, since the exclusion tests out as inoperative (the offending characters are not in the sets from which they are being excluded)
…moving references to upper-Unicode characters that break at least one processor - this is not actually relaxed, since the exclusion tests out as inoperative (the offending characters are not in the sets from which they are being excluded)
Now emitting functional XSDs with the new datatype mappings. Pruning the result XSD so it does not contain unused simpleType definitions is a bit difficult when their definitions are chained, but we are managing some of it at least. JSON Schema production also looks okay wrt datatypes in this branch. |
* Redefined lexical constraint on 'token' datatype - #181 - removing references to upper-Unicode characters that break at least one processor - this is not actually relaxed, since the exclusion tests out as inoperative (the offending characters are not in the sets from which they are being excluded) * Rolling back #165 addressing usnistgov/OSCAL#956. We now have no upper-Unicode characters in the schema to be concerned about. * Adding schema-generation unit tests for 'token' datatype * Adjusted JSON Schema generation to capture latest datatype definitions #195 * Debugging path in datatype integration * Adding XSD datatype production logic - reads and rewrites JSON definitions in XSD syntax. * Adjustment to JSON -> XSD rough casting * XSD datatypes adjusted to align with JSON #195 * Adding new datatypes as aliases for old names #1186 * Touchups to inline documentation (for propagation to tools) * Replaced with updated mapping plus adjustments to XSD production * Now doing a better job excluding unneeded datatype (simpleType) definitions * Reconciled datatype merge to emit functional JSON schemas * Removing safety backups; small correction to Metaschema Schematron to avoid a runtime error.
Describe the bug
As reported in the OSCAL repo usnistgov/OSCAL#1127 a regular expression deployed in the schemas on the token datatype definition is breaking in certain tools.
Testing suggests that it is the numeric character escaping (entity syntax) in the expression
[\i-[:𐀀-]][\c-[:𐀀-]]*
that the tool is not able to handle.But further testing shows that these character exclusions have no apparent effect on the regex - which suggests we should roll them back.
If the XML-regex syntax here for NCName
[\i-[:]][\c-[:]]*
is too XMLy, an alternative could be[\w-[:\d]]\w*
.In any case the correction must be made in Metaschema back end, to propagate to into generated schemas.
This is not a backward-compatibility breaking change.
Who is the bug affecting?
Any user of tools that choke on the regex as given.
What is affected by this bug?
Can't validate an OSCAL instance with an appropriate XSD.
When does this occur?
Anytime with the tool in question (C# processor under .NET).
As it happens, the oXygen XML Editor's XML Schema Regular Expression builder also does not support entity syntax, and shows the same error.
Expected behavior (i.e. solution)
The token datatype should be validated appropriately (against NCName constraints).
The text was updated successfully, but these errors were encountered: