Double-escaped RegEx patterns in output JSON schema causing issues with some RegEx flavors #182

nikitawootten-nist · 2022-02-10T20:24:20Z

Describe the bug

Some flavors of RegEx (such as Go's regex package https://pkg.go.dev/regexp/syntax and PHP's PCRE) do not support Unicode character classes through the \u{code} syntax. The validation of certain datatypes such as the token type may improperly rely on this RegEx syntax.

Who is the bug affecting?

Tool developers that are trying to parse generated JSON schemas in some RegEx flavors (like Go's regex package or PHP's PCRE)

What is affected by this bug?

The regex present in some output JSON schema patterns is invalid for some RegEx flavors.

When does this occur?

Anytime a Unicode code sequence is placed within a RegEx pattern (such as for the token datatype)

Expected behavior (i.e. solution)

Single escaping the Unicode character pattern \u... instead of \\u... will be interpreted by all the JSON parsers I've tested (Go, JS, and Python) as the Unicode character directly. I have not tested how any of these regex flavors handle Unicode characters directly, but it could be a simple solution to this issue.

Other Comments

This bug is related to #181

The text was updated successfully, but these errors were encountered:

wendellpiez · 2022-02-11T15:54:45Z

Let's start with the simplest thing and see how far it works, or does not -- express the tests in the (Metaschema) backend source not in escaped form, but with literals (or XML character representations of literals). It would be interesting to see how these are handled by default by the JSON serializer built into Saxon, in particular whether and where it escapes the characters in question (presumably into a single-escaped form).

This issue is hard if we work against the Saxon serializer, easy if we work with it. If we have to work against it, an option is to start with this XSLT, linked from the XSLT 3.0 Rec, which presumably could be used to replace the Saxon serialization of JSON this pipeline presently relies on.

But trying the simple thing first could open a way forward to work with it.

…test Metaschema

wendellpiez · 2022-04-01T19:16:48Z

We should look at this again after #183 is merged.

New regex patterns may make the issue just go away by themselves. If not, a solution can probably be patched over from #184.

nikitawootten-nist added the bug Something isn't working label Feb 10, 2022

wendellpiez mentioned this issue Feb 11, 2022

Roll back token datatype to avoid problematic character references #181

Closed

wendellpiez added a commit to wendellpiez/metaschema that referenced this issue Feb 18, 2022

Addressing usnistgov#182 - character escaping in JSON - with updated …

b3ab061

…test Metaschema

wendellpiez mentioned this issue Feb 18, 2022

Adjustment to character escaping in JSON #184

Closed

8 tasks

david-waltermire assigned david-waltermire and wendellpiez Mar 10, 2022

david-waltermire added this to the Metaschema 0.9.0 milestone Mar 10, 2022

david-waltermire linked a pull request Mar 10, 2022 that will close this issue

Relocate schema resources #191

Merged

11 tasks

david-waltermire pushed a commit to wendellpiez/metaschema that referenced this issue Mar 20, 2022

Addressing usnistgov#182 - character escaping in JSON - with updated …

fcc166b

…test Metaschema

david-waltermire linked a pull request Mar 29, 2022 that will close this issue

Metaschema / XSLT implementation alignment #197

Merged

8 tasks

david-waltermire linked a pull request Apr 15, 2022 that will close this issue

Redefined lexical constraint on 'token' datatype #183

Merged

8 tasks

david-waltermire closed this as completed in #183 Apr 15, 2022

aj-stein-nist mentioned this issue May 12, 2022

Error in datetime-regex usnistgov/OSCAL#1260

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Double-escaped RegEx patterns in output JSON schema causing issues with some RegEx flavors #182

Double-escaped RegEx patterns in output JSON schema causing issues with some RegEx flavors #182

nikitawootten-nist commented Feb 10, 2022 •

edited

Loading

wendellpiez commented Feb 11, 2022

wendellpiez commented Apr 1, 2022

Double-escaped RegEx patterns in output JSON schema causing issues with some RegEx flavors #182

Double-escaped RegEx patterns in output JSON schema causing issues with some RegEx flavors #182

Comments

nikitawootten-nist commented Feb 10, 2022 • edited Loading

Describe the bug

Who is the bug affecting?

What is affected by this bug?

When does this occur?

Expected behavior (i.e. solution)

Other Comments

wendellpiez commented Feb 11, 2022

wendellpiez commented Apr 1, 2022

nikitawootten-nist commented Feb 10, 2022 •

edited

Loading