From 4a8ffc47db44de918a221e476a7311db63998610 Mon Sep 17 00:00:00 2001 From: David Waltermire Date: Thu, 14 Jul 2022 15:48:02 -0400 Subject: [PATCH 1/6] Metaschema / XSLT implementation alignment (#197) * Relocate schema resources (#191) - Moved schema resources out of XSLT implementation - Relocated XSD and datatype XML schemas. Removed junk and generated files. Many datatype adjustments - Tweeks to schema data types to address unicode issues. Resolves usnistgov/OSCAL#1127. Resolves usnistgov/OSCAL#956. - Adjusted type names of Metaschema types to be purely hyphenated. - Removed extra character ranges that are causing C# problems. Resolves usnistgov/OSCAL#1127. - Adjusted URI data type names to be more consistent between XML and JSON. Many metaschema.xsd adjustments - Alignment of data types used with new Metschema datatype names. - Additional repairs of tests, including naming consistency. - adding new unit test for a valid json-value key using a label - many fixes to broken tests - Removed duplicate uuid test scenario. - Many fixes to make unit test Metaschemas valid. - Added JSON schema schema. - Some schema generation refactoring to support new data-driven test harness. * Fixed URI of choice unit test. * Update metaschema-datatypes.xsd Adjusting whitespace handling for URI types. * adding leading and trailing whitespace checks to all XML schema datatype derived types * Adjustments to debug charstrings. * Update metaschema-datatypes.json Minor adjustments to data types for comments on #1260. * refactored metaschema schema to support external constraint definitions * Added constraint extensibility configuration. * added formal-name and description to definition references * XSLT M4 Metaschema pipeline updates (#214) * Composition unit tests now valid to updated schemas; added (missing) tests. * Adding XSpec for schema generation; nominal schema target examples * Adding XSpec tests (testing schema generation as transforms) and initial set of targets for testing * Major reorganization and archiving (some temporary, prior to defenestration); new tests; readme documents in each folder for schema-generation unit tests to help trace efforts * More details in json-value-key readme; updated top-level JSON schema generation xspec * Metaschema Schematron rule intercepting a json-key setting with no BY_KEY in the grouping logic * Cleaned up extra JSON Schema file * Updated metaschema model wrt json-key, json-value-key flag-ref (no longer flag-name); other Metaschema touchups * Rewiring and simplifying XSD production pipeline with datatypes now acquired from the normative metaschema /schema subdirectory - removed the namespace fixup step, no longer needed. * Updated anthology ('menagerie') metaschema example, with cleanup * In XSD generation, now emitting datatype definitions only for datatypes actually needed by a schema * Updating schema TODO notes on synching unit tests; bit of cleanup * Addressing #212 - XML to JSON converter no longer chokes on edge cases (array items of SINGLETON_OR_ARRAY groups) * Updating top-level composition unit tests * .gitignore covering HTML reports from test runs * Removing outdated testing results * Adding back support for old datatype names as indicated in metaschema.xsd (cf line 1252) * Restoring deprecated datatypes per #195 * Bit of cleanup; utility maintenance * Removing 'INFO' level comment in XSD * Tweakage to align XSD out (dropping comment; cleaning up namespaces) * Patching hole in JSON datatype assignment also * Emitting cleaner namespaces in XSD; slight refactoring of JSON Schema * Adjusting JSON Schema type definitions to produce valid constraints over Metaschema atomic types * Extending atomic data type acquisition to collect one level of indirect references * Cleaning up obsolete datatype support * Bit more cleanup; updated readme * Adjusting handling of warnings and exception messages in pipelines * Removing outdated artifacts from schema generation unit testing.md Co-authored-by: Wendell Piez --- schema/json/metaschema-datatypes.json | 114 ++++++++++ schema/xml/metaschema-datatypes.xsd | 241 +++++++++++++++++++++ schema/xml/metaschema-markup-line.xsd | 11 + schema/xml/metaschema-markup-multiline.xsd | 102 +++++++++ schema/xml/metaschema-prose-base.xsd | 76 +++++++ schema/xml/metaschema-prose-module.xsd | 5 + 6 files changed, 549 insertions(+) create mode 100644 schema/json/metaschema-datatypes.json create mode 100644 schema/xml/metaschema-datatypes.xsd create mode 100644 schema/xml/metaschema-markup-line.xsd create mode 100644 schema/xml/metaschema-markup-multiline.xsd create mode 100644 schema/xml/metaschema-prose-base.xsd create mode 100644 schema/xml/metaschema-prose-module.xsd diff --git a/schema/json/metaschema-datatypes.json b/schema/json/metaschema-datatypes.json new file mode 100644 index 000000000..180f86333 --- /dev/null +++ b/schema/json/metaschema-datatypes.json @@ -0,0 +1,114 @@ +{ + "$schema" : "http://json-schema.org/draft-07/schema#", + "$id" : "http://csrc.nist.gov/ns/oscal/1.0/metaschema-datatypes-schema.json", + "$comment" : "OSCAL Unified Model of Models", + "type" : "object", + "definitions" : { + "Base64Datatype": { + "type": "string", + "pattern": "^[0-9A-Fa-f]+$", + "contentEncoding": "base64" + }, + "BooleanDatatype": { + "type": "boolean" + }, + "DateDatatype": { + "type": "string", + "pattern": "^(((2000|2400|2800|(19|2[0-9](0[48]|[2468][048]|[13579][26])))-02-29)|(((19|2[0-9])[0-9]{2})-02-(0[1-9]|1[0-9]|2[0-8]))|(((19|2[0-9])[0-9]{2})-(0[13578]|10|12)-(0[1-9]|[12][0-9]|3[01]))|(((19|2[0-9])[0-9]{2})-(0[469]|11)-(0[1-9]|[12][0-9]|30)))(Z|[+-][0-9]{2}:[0-9]{2})?$" + }, + "DateWithTimezoneDatatype": { + "type": "string", + "pattern": "^(((2000|2400|2800|(19|2[0-9](0[48]|[2468][048]|[13579][26])))-02-29)|(((19|2[0-9])[0-9]{2})-02-(0[1-9]|1[0-9]|2[0-8]))|(((19|2[0-9])[0-9]{2})-(0[13578]|10|12)-(0[1-9]|[12][0-9]|3[01]))|(((19|2[0-9])[0-9]{2})-(0[469]|11)-(0[1-9]|[12][0-9]|30)))(Z|[+-][0-9]{2}:[0-9]{2})$" + }, + "DateTimeDatatype": { + "type": "string", + "pattern": "^(((2000|2400|2800|(19|2[0-9](0[48]|[2468][048]|[13579][26])))-02-29)|(((19|2[0-9])[0-9]{2})-02-(0[1-9]|1[0-9]|2[0-8]))|(((19|2[0-9])[0-9]{2})-(0[13578]|10|12)-(0[1-9]|[12][0-9]|3[01]))|(((19|2[0-9])[0-9]{2})-(0[469]|11)-(0[1-9]|[12][0-9]|30)))T(2[0-3]|[01][0-9]):([0-5][0-9]):([0-5][0-9])(\\.[0-9]+)?(Z|[+-][0-9]{2}:[0-9]{2})?$" + }, + "DateTimeWithTimezoneDatatype": { + "type": "string", + "format": "date-time", + "pattern": "^(((2000|2400|2800|(19|2[0-9](0[48]|[2468][048]|[13579][26])))-02-29)|(((19|2[0-9])[0-9]{2})-02-(0[1-9]|1[0-9]|2[0-8]))|(((19|2[0-9])[0-9]{2})-(0[13578]|10|12)-(0[1-9]|[12][0-9]|3[01]))|(((19|2[0-9])[0-9]{2})-(0[469]|11)-(0[1-9]|[12][0-9]|30)))T(2[0-3]|[01][0-9]):([0-5][0-9]):([0-5][0-9])(\\.[0-9]+)?(Z|[+-][0-9]{2}:[0-9]{2})$" + }, + "DayTimeDurationDatatype": { + "type": "string", + "format": "duration", + "pattern": "^[-+]?P([-+]?[0-9]+D)?(T([-+]?[0-9]+H)?([-+]?[0-9]+M)?([-+]?[0-9]+([.,][0-9]{0,9})?S)?)?$" + }, + "DecimalDatatype": { + "type": "number", + "pattern": "^(\\+|-)?([0-9]+(\\.[0-9]*)?|\\.[0-9]+)$" + }, + "EmailAddressDatatype": { + "type": "string", + "format": "email", + "pattern": "^.+@.+$" + }, + "HostnameDatatype": { + "allOf": [ + {"$ref": "#/definitions/StringDatatype"}, + {"format": "idn-hostname"} + ] + }, + "IntegerDatatype": { + "type": "integer" + }, + "IPV4AddressDatatype": { + "type": "string", + "format": "ipv4", + "pattern": "^((25[0-5]|2[0-4][0-9]|1[0-9][0-9]|[1-9][0-9]|[0-9])\\.){3}(25[0-5]|2[0-4][0-9]|1[0-9][0-9]|[1-9][0-9]|[0-9])$" + }, + "IPV6AddressDatatype": { + "type": "string", + "format": "ipv6", + "pattern": "^(([0-9a-fA-F]{1,4}:){7,7}[0-9a-fA-F]{1,4}|([0-9a-fA-F]{1,4}:){1,7}:|([0-9a-fA-F]{1,4}:){1,6}:[0-9a-fA-F]{1,4}|([0-9a-fA-F]{1,4}:){1,5}(:[0-9a-fA-F]{1,4}){1,2}|([0-9a-fA-F]{1,4}:){1,4}(:[0-9a-fA-F]{1,4}){1,3}|([0-9a-fA-F]{1,4}:){1,3}(:[0-9a-fA-F]{1,4}){1,4}|([0-9a-fA-F]{1,4}:){1,2}(:[0-9a-fA-F]{1,4}){1,5}|[0-9a-fA-F]{1,4}:((:[0-9a-fA-F]{1,4}){1,6})|:((:[0-9a-fA-F]{1,4}){1,7}|:)|[fF][eE]80:(:[0-9a-fA-F]{0,4}){0,4}%[0-9a-zA-Z]{1,}|::([fF]{4}(:0{1,4}){0,1}:){0,1}((25[0-5]|2[0-4][0-9]|1[0-9][0-9]|[1-9][0-9]|[0-9]).){3,3}(25[0-5]|2[0-4][0-9]|1[0-9][0-9]|[1-9][0-9]|[0-9])|([0-9a-fA-F]{1,4}:){1,4}:((25[0-5]|2[0-4][0-9]|1[0-9][0-9]|[1-9][0-9]|[0-9]).){3,3}(25[0-5]|2[0-4][0-9]|1[0-9][0-9]|[1-9][0-9]|[0-9]))$" + }, + "MarkupLineDatatype": { + "type": "string", + "pattern": "^[^\n]+$" + }, + "MarkupMultilineDatatype": { + "type": "string" + }, + "NonNegativeIntegerDatatype": { + "allOf": [ + {"$ref": "#/definitions/IntegerDatatype"}, + {"minimum": 0, + "type": "number"} + ] + }, + "PositiveIntegerDatatype": { + "allOf": [ + {"$ref": "#/definitions/IntegerDatatype"}, + {"minimum": 1, + "type": "number"} + ] + }, + "StringDatatype": { + "type": "string", + "pattern": "^\\S(.*\\S)?$" + }, + "TokenDatatype": { + "type": "string", + "pattern": "^(\\p{L}|_)(\\p{L}|\\p{N}|[.\\-_])*$" + }, + "URIDatatype": { + "type": "string", + "format": "uri", + "pattern": "^[a-zA-Z][a-zA-Z0-9+\\-.]+:.+$" + }, + "URIReferenceDatatype": { + "type": "string", + "format": "uri-reference" + }, + "UUIDDatatype": { + "type": "string", + "description": "A type 4 ('random' or 'pseudorandom') or type 5 UUID per RFC 4122.", + "pattern": "^[0-9A-Fa-f]{8}-[0-9A-Fa-f]{4}-[45][0-9A-Fa-f]{3}-[89ABab][0-9A-Fa-f]{3}-[0-9A-Fa-f]{12}$" + }, + "YearMonthDurationDatatype": { + "type": "string", + "format": "duration", + "pattern": "^[-+]?P([-+]?[0-9]+Y)?([-+]?[0-9]+M)?([-+]?[0-9]+W)?([-+]?[0-9]+D)?$" + } + } +} diff --git a/schema/xml/metaschema-datatypes.xsd b/schema/xml/metaschema-datatypes.xsd new file mode 100644 index 000000000..a1f8e099a --- /dev/null +++ b/schema/xml/metaschema-datatypes.xsd @@ -0,0 +1,241 @@ + + + + + + + + A trimmed string, at least one character with no + leading or trailing whitespace. + + + + + + + + + + A trimmed string, at least one character with no + leading or trailing whitespace. + + + + + + + + + + + + + + The xs:date with a required timezone. + + + + + + + + + + + + + + + The xs:dateTime with a required timezone. + + + + + + + + + + + + + + + + + A trimmed string, at least one character with no + leading or trailing whitespace. + + + + + + + + An email address + + + + + Need a better pattern. + + + + + + + + A host name + + + + + + + + + + + A trimmed string, at least one character with no + leading or trailing whitespace. + + + + + + + + The ip-v4-address type specifies an IPv4 address in + dot decimal notation. + + + + + + + + + The ip-v6-address type specifies an IPv6 address + represented in 8 hextets separated by colons. + This is based on the pattern provided here: + https://stackoverflow.com/questions/53497/regular-expression-that-matches-valid-ipv6-addresses + with some customizations. + + + + + + + + + + + + A trimmed string, at least one character with no + leading or trailing whitespace. + + + + + + + + + + A trimmed string, at least one character with no + leading or trailing whitespace. + + + + + + + + A string, but not empty and not whitespace-only + (whitespace is U+9, U+10, U+32 or [ \n\t]+ ) + + + + The OSCAL 'string' datatype restricts the XSD type by prohibiting leading + and trailing whitespace, and something (not only whitespace) is required. + + + + + A trimmed string, at least one character with no + leading or trailing whitespace. + + + + + + + + + A string token following the rules of XML "no + colon" names, with no whitespace. (XML names are single alphabetic + characters followed by alphanumeric characters, periods, underscores or dashes.) + + + + + + + + A single token may not contain whitespace. + + + + + + + + + A URI + + + + + Requires a scheme with colon per RFC 3986. + + + + + + + + A URI reference, such as a relative URL + + + + + + A trimmed URI, at least one character with no + leading or trailing whitespace. + + + + + + + + A type 4 ('random' or 'pseudorandom') or type 5 UUID per RFC + 4122. + + + + + A sequence of 8-4-4-4-12 hex digits, with extra + constraints in the 13th and 17-18th places for version 4 and 5 + + + + + + + diff --git a/schema/xml/metaschema-markup-line.xsd b/schema/xml/metaschema-markup-line.xsd new file mode 100644 index 000000000..7d8d48bd2 --- /dev/null +++ b/schema/xml/metaschema-markup-line.xsd @@ -0,0 +1,11 @@ + + + + + + + + + + + diff --git a/schema/xml/metaschema-markup-multiline.xsd b/schema/xml/metaschema-markup-multiline.xsd new file mode 100644 index 000000000..559a3915f --- /dev/null +++ b/schema/xml/metaschema-markup-multiline.xsd @@ -0,0 +1,102 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + The content model is the same as inlineType, but line endings need + to be preserved, since this is preformatted. + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + diff --git a/schema/xml/metaschema-prose-base.xsd b/schema/xml/metaschema-prose-base.xsd new file mode 100644 index 000000000..320daac4b --- /dev/null +++ b/schema/xml/metaschema-prose-base.xsd @@ -0,0 +1,76 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + An insert can be used to identify a placeholder for dynamically inserting text related to a specific object, which is referenced by the object's identifier using an id-ref. This insert mechanism allows the selection of which text value from the object to dynamically include based on the application's display requirements. + + + + The type of object to include from (e.g., parameter, control, component, role, etc.) + + + + + The identity of the object to insert a value for. The identity will be selected from the index of objects of the specified type. The specific value to include is based on the application's display requirements, which will likely use a specific data element associated with the type (e.g., title, identifier, value, etc.) that is appropriate for the application. + + + + + + + diff --git a/schema/xml/metaschema-prose-module.xsd b/schema/xml/metaschema-prose-module.xsd new file mode 100644 index 000000000..e653c0537 --- /dev/null +++ b/schema/xml/metaschema-prose-module.xsd @@ -0,0 +1,5 @@ + + + + + From 068f85ff364088418fbaf7a8244f155bdd706fa6 Mon Sep 17 00:00:00 2001 From: David Waltermire Date: Wed, 24 Aug 2022 12:47:17 -0400 Subject: [PATCH 2/6] Updated date-time regular expressions per usnistgov/metaschema#224. Fixes usnistgov/metaschema#224. (#229) --- schema/json/metaschema-datatypes.json | 4 ++-- schema/xml/metaschema-datatypes.xsd | 4 ++-- 2 files changed, 4 insertions(+), 4 deletions(-) diff --git a/schema/json/metaschema-datatypes.json b/schema/json/metaschema-datatypes.json index 180f86333..b93a29199 100644 --- a/schema/json/metaschema-datatypes.json +++ b/schema/json/metaschema-datatypes.json @@ -22,12 +22,12 @@ }, "DateTimeDatatype": { "type": "string", - "pattern": "^(((2000|2400|2800|(19|2[0-9](0[48]|[2468][048]|[13579][26])))-02-29)|(((19|2[0-9])[0-9]{2})-02-(0[1-9]|1[0-9]|2[0-8]))|(((19|2[0-9])[0-9]{2})-(0[13578]|10|12)-(0[1-9]|[12][0-9]|3[01]))|(((19|2[0-9])[0-9]{2})-(0[469]|11)-(0[1-9]|[12][0-9]|30)))T(2[0-3]|[01][0-9]):([0-5][0-9]):([0-5][0-9])(\\.[0-9]+)?(Z|[+-][0-9]{2}:[0-9]{2})?$" + "pattern": "^(((2000|2400|2800|(19|2[0-9](0[48]|[2468][048]|[13579][26])))-02-29)|(((19|2[0-9])[0-9]{2})-02-(0[1-9]|1[0-9]|2[0-8]))|(((19|2[0-9])[0-9]{2})-(0[13578]|10|12)-(0[1-9]|[12][0-9]|3[01]))|(((19|2[0-9])[0-9]{2})-(0[469]|11)-(0[1-9]|[12][0-9]|30)))T(2[0-3]|[01][0-9]):([0-5][0-9]):([0-5][0-9])(\\.[0-9]*[1-9])?(Z|(-((0[0-9]|1[0-2]):00|0[39]:30)|\\+((0[0-9]|1[0-4]):00|(0[34569]|10):30|(0[58]|12):45)))?$" }, "DateTimeWithTimezoneDatatype": { "type": "string", "format": "date-time", - "pattern": "^(((2000|2400|2800|(19|2[0-9](0[48]|[2468][048]|[13579][26])))-02-29)|(((19|2[0-9])[0-9]{2})-02-(0[1-9]|1[0-9]|2[0-8]))|(((19|2[0-9])[0-9]{2})-(0[13578]|10|12)-(0[1-9]|[12][0-9]|3[01]))|(((19|2[0-9])[0-9]{2})-(0[469]|11)-(0[1-9]|[12][0-9]|30)))T(2[0-3]|[01][0-9]):([0-5][0-9]):([0-5][0-9])(\\.[0-9]+)?(Z|[+-][0-9]{2}:[0-9]{2})$" + "pattern": "^(((2000|2400|2800|(19|2[0-9](0[48]|[2468][048]|[13579][26])))-02-29)|(((19|2[0-9])[0-9]{2})-02-(0[1-9]|1[0-9]|2[0-8]))|(((19|2[0-9])[0-9]{2})-(0[13578]|10|12)-(0[1-9]|[12][0-9]|3[01]))|(((19|2[0-9])[0-9]{2})-(0[469]|11)-(0[1-9]|[12][0-9]|30)))T(2[0-3]|[01][0-9]):([0-5][0-9]):([0-5][0-9])(\\.[0-9]*[1-9])?(Z|(-((0[0-9]|1[0-2]):00|0[39]:30)|\\+((0[0-9]|1[0-4]):00|(0[34569]|10):30|(0[58]|12):45)))$" }, "DayTimeDurationDatatype": { "type": "string", diff --git a/schema/xml/metaschema-datatypes.xsd b/schema/xml/metaschema-datatypes.xsd index a1f8e099a..59a52e627 100644 --- a/schema/xml/metaschema-datatypes.xsd +++ b/schema/xml/metaschema-datatypes.xsd @@ -41,7 +41,7 @@ - + @@ -50,7 +50,7 @@ The xs:dateTime with a required timezone. - + From 1ba604d8d54e42e08ed43a1873f3a4b9c96451e4 Mon Sep 17 00:00:00 2001 From: David Waltermire Date: Tue, 20 Sep 2022 08:47:36 -0400 Subject: [PATCH 3/6] Refactored inline markup types to simplify. --- schema/xml/metaschema-markup-line.xsd | 7 +++-- schema/xml/metaschema-markup-multiline.xsd | 27 ++++++---------- schema/xml/metaschema-prose-base.xsd | 36 +++++++++------------- 3 files changed, 29 insertions(+), 41 deletions(-) diff --git a/schema/xml/metaschema-markup-line.xsd b/schema/xml/metaschema-markup-line.xsd index 7d8d48bd2..c430944f9 100644 --- a/schema/xml/metaschema-markup-line.xsd +++ b/schema/xml/metaschema-markup-line.xsd @@ -3,9 +3,10 @@ + - - - + + + diff --git a/schema/xml/metaschema-markup-multiline.xsd b/schema/xml/metaschema-markup-multiline.xsd index 559a3915f..33d35f539 100644 --- a/schema/xml/metaschema-markup-multiline.xsd +++ b/schema/xml/metaschema-markup-multiline.xsd @@ -11,13 +11,13 @@ - - - - - - - + + + + + + + @@ -35,18 +35,11 @@ td th: phrase inline markup, a, insert, img (phrase+img) --> - - - - - - - - + - The content model is the same as inlineType, but line endings need + The content model is the same as inlineMarkupType, but line endings need to be preserved, since this is preformatted. @@ -85,7 +78,7 @@ - + diff --git a/schema/xml/metaschema-prose-base.xsd b/schema/xml/metaschema-prose-base.xsd index 320daac4b..2c24ceb31 100644 --- a/schema/xml/metaschema-prose-base.xsd +++ b/schema/xml/metaschema-prose-base.xsd @@ -1,36 +1,31 @@ - + + + + + + - - - - - - - - - - - - - - + + + + + + + + + - - - - - @@ -39,7 +34,6 @@ - From 633a22d9150b3de07e48f42ac12738122ce8d431 Mon Sep 17 00:00:00 2001 From: David Waltermire Date: Thu, 9 Mar 2023 10:47:25 -0500 Subject: [PATCH 4/6] - Relocated data types into their own file for easier maintenance. - Updated data types to ensure consistency between specification and implementation. - Identify locations of data type schemas in data type specifications. - Discuss use of regular expression dialects used in XML and JSON schema. #235 --- schema/json/metaschema-datatypes.json | 76 ++++++++++---- schema/xml/metaschema-datatypes.xsd | 137 ++++++++++++-------------- 2 files changed, 119 insertions(+), 94 deletions(-) diff --git a/schema/json/metaschema-datatypes.json b/schema/json/metaschema-datatypes.json index b93a29199..def9145fb 100644 --- a/schema/json/metaschema-datatypes.json +++ b/schema/json/metaschema-datatypes.json @@ -5,110 +5,144 @@ "type" : "object", "definitions" : { "Base64Datatype": { + "description": "Binary data encoded using the Base 64 encoding algorithm as defined by RFC4648.", "type": "string", - "pattern": "^[0-9A-Fa-f]+$", + "pattern": "^[0-9A-Za-z+/]+$", "contentEncoding": "base64" }, "BooleanDatatype": { + "description": "A binary value that is either: true or false.", "type": "boolean" }, "DateDatatype": { + "description": "A string representing a 24-hour period with an optional timezone.", "type": "string", "pattern": "^(((2000|2400|2800|(19|2[0-9](0[48]|[2468][048]|[13579][26])))-02-29)|(((19|2[0-9])[0-9]{2})-02-(0[1-9]|1[0-9]|2[0-8]))|(((19|2[0-9])[0-9]{2})-(0[13578]|10|12)-(0[1-9]|[12][0-9]|3[01]))|(((19|2[0-9])[0-9]{2})-(0[469]|11)-(0[1-9]|[12][0-9]|30)))(Z|[+-][0-9]{2}:[0-9]{2})?$" }, "DateWithTimezoneDatatype": { + "description": "A string representing a 24-hour period with a required timezone.", "type": "string", "pattern": "^(((2000|2400|2800|(19|2[0-9](0[48]|[2468][048]|[13579][26])))-02-29)|(((19|2[0-9])[0-9]{2})-02-(0[1-9]|1[0-9]|2[0-8]))|(((19|2[0-9])[0-9]{2})-(0[13578]|10|12)-(0[1-9]|[12][0-9]|3[01]))|(((19|2[0-9])[0-9]{2})-(0[469]|11)-(0[1-9]|[12][0-9]|30)))(Z|[+-][0-9]{2}:[0-9]{2})$" }, "DateTimeDatatype": { + "description": "A string representing a point in time with an optional timezone.", "type": "string", "pattern": "^(((2000|2400|2800|(19|2[0-9](0[48]|[2468][048]|[13579][26])))-02-29)|(((19|2[0-9])[0-9]{2})-02-(0[1-9]|1[0-9]|2[0-8]))|(((19|2[0-9])[0-9]{2})-(0[13578]|10|12)-(0[1-9]|[12][0-9]|3[01]))|(((19|2[0-9])[0-9]{2})-(0[469]|11)-(0[1-9]|[12][0-9]|30)))T(2[0-3]|[01][0-9]):([0-5][0-9]):([0-5][0-9])(\\.[0-9]*[1-9])?(Z|(-((0[0-9]|1[0-2]):00|0[39]:30)|\\+((0[0-9]|1[0-4]):00|(0[34569]|10):30|(0[58]|12):45)))?$" }, "DateTimeWithTimezoneDatatype": { + "description": "A string representing a point in time with a required timezone.", "type": "string", "format": "date-time", "pattern": "^(((2000|2400|2800|(19|2[0-9](0[48]|[2468][048]|[13579][26])))-02-29)|(((19|2[0-9])[0-9]{2})-02-(0[1-9]|1[0-9]|2[0-8]))|(((19|2[0-9])[0-9]{2})-(0[13578]|10|12)-(0[1-9]|[12][0-9]|3[01]))|(((19|2[0-9])[0-9]{2})-(0[469]|11)-(0[1-9]|[12][0-9]|30)))T(2[0-3]|[01][0-9]):([0-5][0-9]):([0-5][0-9])(\\.[0-9]*[1-9])?(Z|(-((0[0-9]|1[0-2]):00|0[39]:30)|\\+((0[0-9]|1[0-4]):00|(0[34569]|10):30|(0[58]|12):45)))$" }, "DayTimeDurationDatatype": { + "description": "An amount of time quantified in days, hours, minutes, and seconds.", "type": "string", "format": "duration", - "pattern": "^[-+]?P([-+]?[0-9]+D)?(T([-+]?[0-9]+H)?([-+]?[0-9]+M)?([-+]?[0-9]+([.,][0-9]{0,9})?S)?)?$" + "pattern": "^-?P([0-9]+D(T(([0-9]+H([0-9]+M)?(([0-9]+|[0-9]+(\\.[0-9]+)?)S)?)|([0-9]+M(([0-9]+|[0-9]+(\\.[0-9]+)?)S)?)|([0-9]+|[0-9]+(\\.[0-9]+)?)S))?)|T(([0-9]+H([0-9]+M)?(([0-9]+|[0-9]+(\\.[0-9]+)?)S)?)|([0-9]+M(([0-9]+|[0-9]+(\\.[0-9]+)?)S)?)|([0-9]+|[0-9]+(\\.[0-9]+)?)S)$" }, "DecimalDatatype": { + "description": "A real number expressed using a whole and optional fractional part separated by a period.", "type": "number", "pattern": "^(\\+|-)?([0-9]+(\\.[0-9]*)?|\\.[0-9]+)$" }, "EmailAddressDatatype": { - "type": "string", - "format": "email", - "pattern": "^.+@.+$" + "description": "An email address string formatted according to RFC 6531.", + "allOf": [ + {"$ref": "#/definitions/StringDatatype"}, + { + "type": "string", + "format": "email", + "pattern": "^.+@.+$" + } + ] }, "HostnameDatatype": { - "allOf": [ - {"$ref": "#/definitions/StringDatatype"}, - {"format": "idn-hostname"} - ] + "description": "An internationalized Internet host name string formatted according to section 2.3.2.3 of RFC5890.", + "allOf": [ + {"$ref": "#/definitions/StringDatatype"}, + { + "type": "string", + "format": "idn-hostname" + } + ] }, "IntegerDatatype": { + "description": "A whole number value.", "type": "integer" }, "IPV4AddressDatatype": { + "description": "An Internet Protocol version 4 address represented using dotted-quad syntax as defined in section 3.2 of RFC2673.", "type": "string", "format": "ipv4", "pattern": "^((25[0-5]|2[0-4][0-9]|1[0-9][0-9]|[1-9][0-9]|[0-9])\\.){3}(25[0-5]|2[0-4][0-9]|1[0-9][0-9]|[1-9][0-9]|[0-9])$" }, "IPV6AddressDatatype": { + "description": "An Internet Protocol version 6 address represented using the syntax defined in section 2.2 of RFC3513.", "type": "string", "format": "ipv6", "pattern": "^(([0-9a-fA-F]{1,4}:){7,7}[0-9a-fA-F]{1,4}|([0-9a-fA-F]{1,4}:){1,7}:|([0-9a-fA-F]{1,4}:){1,6}:[0-9a-fA-F]{1,4}|([0-9a-fA-F]{1,4}:){1,5}(:[0-9a-fA-F]{1,4}){1,2}|([0-9a-fA-F]{1,4}:){1,4}(:[0-9a-fA-F]{1,4}){1,3}|([0-9a-fA-F]{1,4}:){1,3}(:[0-9a-fA-F]{1,4}){1,4}|([0-9a-fA-F]{1,4}:){1,2}(:[0-9a-fA-F]{1,4}){1,5}|[0-9a-fA-F]{1,4}:((:[0-9a-fA-F]{1,4}){1,6})|:((:[0-9a-fA-F]{1,4}){1,7}|:)|[fF][eE]80:(:[0-9a-fA-F]{0,4}){0,4}%[0-9a-zA-Z]{1,}|::([fF]{4}(:0{1,4}){0,1}:){0,1}((25[0-5]|2[0-4][0-9]|1[0-9][0-9]|[1-9][0-9]|[0-9]).){3,3}(25[0-5]|2[0-4][0-9]|1[0-9][0-9]|[1-9][0-9]|[0-9])|([0-9a-fA-F]{1,4}:){1,4}:((25[0-5]|2[0-4][0-9]|1[0-9][0-9]|[1-9][0-9]|[0-9]).){3,3}(25[0-5]|2[0-4][0-9]|1[0-9][0-9]|[1-9][0-9]|[0-9]))$" }, "MarkupLineDatatype": { + "description": "", "type": "string", "pattern": "^[^\n]+$" }, "MarkupMultilineDatatype": { + "description": "", "type": "string" }, "NonNegativeIntegerDatatype": { - "allOf": [ - {"$ref": "#/definitions/IntegerDatatype"}, - {"minimum": 0, - "type": "number"} - ] + "description": "An integer value that is equal to or greater than 0.", + "allOf": [ + {"$ref": "#/definitions/IntegerDatatype"}, + { + "type": "number", + "minimum": 0 + } + ] }, "PositiveIntegerDatatype": { - "allOf": [ - {"$ref": "#/definitions/IntegerDatatype"}, - {"minimum": 1, - "type": "number"} - ] + "description": "An integer value that is greater than 0.", + "allOf": [ + {"$ref": "#/definitions/IntegerDatatype"}, + { + "type": "number", + "minimum": 1 + } + ] }, "StringDatatype": { + "description": "A non-empty string with leading and trailing whitespace disallowed. Whitespace is: U+9, U+10, U+32 or [ \n\t]+", "type": "string", "pattern": "^\\S(.*\\S)?$" }, "TokenDatatype": { + "description": "A non-colonized name as defined by XML Schema Part 2: Datatypes Second Edition. https://www.w3.org/TR/xmlschema11-2/#NCName.", "type": "string", "pattern": "^(\\p{L}|_)(\\p{L}|\\p{N}|[.\\-_])*$" }, "URIDatatype": { + "description": "A universal resource identifier (URI) formatted according to RFC3986.", "type": "string", "format": "uri", "pattern": "^[a-zA-Z][a-zA-Z0-9+\\-.]+:.+$" }, "URIReferenceDatatype": { + "description": "A URI Reference, either a URI or a relative-reference, formatted according to section 4.1 of RFC3986.", "type": "string", "format": "uri-reference" }, "UUIDDatatype": { - "type": "string", "description": "A type 4 ('random' or 'pseudorandom') or type 5 UUID per RFC 4122.", + "type": "string", "pattern": "^[0-9A-Fa-f]{8}-[0-9A-Fa-f]{4}-[45][0-9A-Fa-f]{3}-[89ABab][0-9A-Fa-f]{3}-[0-9A-Fa-f]{12}$" }, "YearMonthDurationDatatype": { + "description": "An amount of time quantified in years and months based on ISO-8601 durations (see also RFC3339 appendix A).", "type": "string", "format": "duration", - "pattern": "^[-+]?P([-+]?[0-9]+Y)?([-+]?[0-9]+M)?([-+]?[0-9]+W)?([-+]?[0-9]+D)?$" + "pattern": "^-?P([0-9]+Y([0-9]+M)?)|[0-9]+M$" } } } diff --git a/schema/xml/metaschema-datatypes.xsd b/schema/xml/metaschema-datatypes.xsd index 59a52e627..38d5b6d41 100644 --- a/schema/xml/metaschema-datatypes.xsd +++ b/schema/xml/metaschema-datatypes.xsd @@ -3,10 +3,14 @@ elementFormDefault="qualified"> + + Binary data encoded using the Base 64 encoding algorithm + as defined by RFC4648. + - + - A trimmed string, at least one character with no + A string with at least one character and no leading or trailing whitespace. @@ -14,17 +18,18 @@ + + A binary value that is either: true (or 1) or false (or 0). + - - - A trimmed string, at least one character with no - leading or trailing whitespace. - - + + + A string representing a 24-hour period with an optional timezone. + @@ -32,7 +37,7 @@ - The xs:date with a required timezone. + A string representing a 24-hour period with a required timezone. @@ -40,6 +45,9 @@ + + A string representing a point in time with an optional timezone. + @@ -47,7 +55,7 @@ - The xs:dateTime with a required timezone. + A string representing a point in time with a required timezone. @@ -55,31 +63,30 @@ + + An amount of time quantified in days, hours, minutes, and seconds. + - + + + A real number expressed using a whole and optional fractional part separated by a period. + - - - A trimmed string, at least one character with no - leading or trailing whitespace. - - + - An email address + An email address string formatted according to RFC 6531. - - - Need a better pattern. - + + @@ -94,20 +101,18 @@ + + A whole number value. + - - - A trimmed string, at least one character with no - leading or trailing whitespace. - - + - The ip-v4-address type specifies an IPv4 address in - dot decimal notation. + An Internet Protocol version 4 address represented using + dotted-quad syntax as defined in section 3.2 of RFC2673. @@ -116,45 +121,40 @@ - The ip-v6-address type specifies an IPv6 address - represented in 8 hextets separated by colons. + An Internet Protocol version 6 address represented using + the syntax defined in section 2.2 of RFC3513. This is based on the pattern provided here: https://stackoverflow.com/questions/53497/regular-expression-that-matches-valid-ipv6-addresses with some customizations. - - + + + An integer value that is equal to or greater than 0. + - - - A trimmed string, at least one character with no - leading or trailing whitespace. - - + + + An integer value that is greater than 0. + - - - A trimmed string, at least one character with no - leading or trailing whitespace. - - + - A string, but not empty and not whitespace-only - (whitespace is U+9, U+10, U+32 or [ \n\t]+ ) + A non-empty string of unicode characters with leading and trailing whitespace + disallowed. Whitespace is: U+9, U+10, U+32 or [ \n\t]+ @@ -162,27 +162,20 @@ and trailing whitespace, and something (not only whitespace) is required. - - - A trimmed string, at least one character with no - leading or trailing whitespace. - - + - A string token following the rules of XML "no - colon" names, with no whitespace. (XML names are single alphabetic - characters followed by alphanumeric characters, periods, underscores or dashes.) - + A non-empty, non-colonized name as defined by XML Schema Part 2: Datatypes + Second Edition (https://www.w3.org/TR/xmlschema11-2/#NCName), with leading and trailing + whitespace disallowed. - - + + @@ -195,7 +188,7 @@ - A URI + A universal resource identifier (URI) formatted according to RFC3986. @@ -208,16 +201,10 @@ - A URI reference, such as a relative URL - + A URI Reference, either a URI or a relative-reference, formatted according to section 4.1 of RFC3986. - - - A trimmed URI, at least one character with no - leading or trailing whitespace. - - + @@ -227,8 +214,7 @@ 4122. - + A sequence of 8-4-4-4-12 hex digits, with extra constraints in the 13th and 17-18th places for version 4 and 5 @@ -237,5 +223,10 @@ - + + + + + + From aa8bad2616237dd8bb2d5a7a4d1a3105211982f4 Mon Sep 17 00:00:00 2001 From: David Waltermire Date: Thu, 9 Mar 2023 10:54:42 -0500 Subject: [PATCH 5/6] fixed a bug prevent valid base64 data from being valid in JSON. This was previosuly fixed for XML. --- schema/json/metaschema-datatypes.json | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/schema/json/metaschema-datatypes.json b/schema/json/metaschema-datatypes.json index def9145fb..133f56eb0 100644 --- a/schema/json/metaschema-datatypes.json +++ b/schema/json/metaschema-datatypes.json @@ -7,7 +7,7 @@ "Base64Datatype": { "description": "Binary data encoded using the Base 64 encoding algorithm as defined by RFC4648.", "type": "string", - "pattern": "^[0-9A-Za-z+/]+$", + "pattern": "^[0-9A-Za-z+/]+={0,2}$", "contentEncoding": "base64" }, "BooleanDatatype": { From 43689458f3defa65a8d5d2553fa9007274467681 Mon Sep 17 00:00:00 2001 From: David Waltermire Date: Fri, 10 Mar 2023 09:57:51 -0500 Subject: [PATCH 6/6] added documentation around whitespace restrictions to make clear how the data type implementations restrict leading and trailing whitespace. --- schema/xml/metaschema-datatypes.xsd | 51 +++++++++++++++++++++++++---- 1 file changed, 44 insertions(+), 7 deletions(-) diff --git a/schema/xml/metaschema-datatypes.xsd b/schema/xml/metaschema-datatypes.xsd index 38d5b6d41..722ff2cbc 100644 --- a/schema/xml/metaschema-datatypes.xsd +++ b/schema/xml/metaschema-datatypes.xsd @@ -73,10 +73,17 @@ - A real number expressed using a whole and optional fractional part separated by a period. + A real number expressed using a whole and optional fractional part + separated by a period. - + + + This pattern ensures that leading and trailing whitespace is + disallowed. This helps to even the user experience between implementations + related to whitespace. + + @@ -105,7 +112,13 @@ A whole number value. - + + + This pattern ensures that leading and trailing whitespace is + disallowed. This helps to even the user experience between implementations + related to whitespace. + + @@ -138,7 +151,13 @@ An integer value that is equal to or greater than 0. - + + + This pattern ensures that leading and trailing whitespace is + disallowed. This helps to even the user experience between implementations + related to whitespace. + + @@ -147,7 +166,13 @@ An integer value that is greater than 0. - + + + This pattern ensures that leading and trailing whitespace is + disallowed. This helps to even the user experience between implementations + related to whitespace. + + @@ -162,7 +187,13 @@ and trailing whitespace, and something (not only whitespace) is required. - + + + This pattern ensures that leading and trailing whitespace is + disallowed. This helps to even the user experience between implementations + related to whitespace. + + @@ -204,7 +235,13 @@ A URI Reference, either a URI or a relative-reference, formatted according to section 4.1 of RFC3986. - + + + This pattern ensures that leading and trailing whitespace is + disallowed. This helps to even the user experience between implementations + related to whitespace. + +