Skip to content

Commit

Permalink
follow-up edits from PR #641
Browse files Browse the repository at this point in the history
There were still some outstanding comments/edits from the discussion in PR #641:

* Add explanatory text (#641 (comment))
* Removed definition of "collation" - it was not used in the text and it is not necessary to introduce the term (#641 (comment))
* Add "unicode normalization" as a term (#641 (comment))
* Add "unicode case folding" as a term (#641 (comment))
* Correct examples (#641 (comment))
  • Loading branch information
cportele committed Jan 31, 2022
1 parent c8d2ae3 commit 687a99e
Show file tree
Hide file tree
Showing 4 changed files with 31 additions and 17 deletions.
15 changes: 10 additions & 5 deletions cql2/standard/clause_4_terms_and_definitions.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -7,11 +7,6 @@ This document also uses terms defined in the OGC Standard for Modular specificat

For the purposes of this document, the following additional terms and definitions apply.

[[collation-def]]
==== collation
a set of rules that indicate how to compare and sort character string data
the process of ordering units of textual information https://www.unicode.org/glossary/#collation[Glossary of Unicode Terms]

[[collection-def]]
==== collection
a body of **resources** that belong or are used together; an aggregate, set, or group of related **resources** (http://docs.opengeospatial.org/DRAFTS/20-024.html#terms_and_definitions[OGC 20-024, OGC API - Common - Part 2: Collections]).
Expand All @@ -32,6 +27,16 @@ a token that represents a property of a resource that can be used in a **filter
==== resource
entity that might be identified (<<iso15836-2,Dublin Core Metadata Initiative - DCMI Metadata Terms>>)

[[case-folding-def]]
==== unicode case folding; case folding
process of making two texts which differ only in case identical for comparison purposes (https://www.w3.org/TR/charmod-norm/#definitionCaseFolding[W3C Character Model for the World Wide Web: String Matching])

NOTE: Case folding is meant for the purpose of case-insensitive string matching.

[[normalization-def]]
==== unicode normalization; normalization
process of removing alternate representations of equivalent sequences from textual data, to convert the data into a form that can be binary-compared for equivalence (https://www.unicode.org/glossary/#normalization[Glossary of Unicode Terms])

=== Symbols

* **&#x2229;** intersection, operation on two or more sets
Expand Down
27 changes: 18 additions & 9 deletions cql2/standard/clause_7_enhanced.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -117,7 +117,15 @@ category NOT IN (1,2,3,4)

include::requirements/requirements_class_accent-case-insensitive-comparison.adoc[]

The following requirements class adds support for case-insensitive string comparisons.
The following requirements class adds support for case- and accent-insensitive string comparisons.

Both capabilities are useful to operate across data that has not been normalized or has been normalized to values that are different than they should be. This is implemented via string-valued functions to normalize a string with respect to case (`CASEI`) or accents (`ACCENTI`). Examples:

* The `CASEI` function is useful when a property is set to "PLANET", "Planet", or "planet" and I want to match either without having to enumerate all the variations.

* The `ACCENTI` function is useful when accents (or, more generally, diacritics not available in ASCII) were dropped when indexing a property. This may be useful, for example, to support users that are not familiar with accents or that do not know how to type them on their keyboard. For example, "papa" would also match "papá". Note that accent-insensitive comparisons can match values with a different meaning. E.g., in Spanish "papa" is potato and "papá" is father. "papá" in an accent-insensitive comparison will match both, but this may also be intentional, because I know that some of the data has been processed in ASCII.

Implementations of these functions can be complex, but in many cases the underlying datastore will provide a capability that these CQL2 functions can be mapped to.

include::requirements/accent-case-insensitive-comparison/REQ_casei-builtin-function.adoc[]

Expand All @@ -132,11 +140,12 @@ CASEI(road_class) IN (CASEI('Οδος'),CASEI('Straße'))
[source,JSON]
----
{
"in": [
{ "function": { "name": "casei", "arguments": ["road_class"] } },
"op": "in",
"args": [
{ "function": { "name": "casei", "arguments": [{ "property": "road_class" }] } },
[
{ "function": { "name": "casei", "arguments": ["Οδος"] } },
{ "function": { "name": "casei", "arguments": ["Straße"] } }
{ "function": { "name": "casei", "args": ["Οδος"] } },
{ "function": { "name": "casei", "args": ["Straße"] } }
]
]
}
Expand All @@ -152,22 +161,22 @@ include::requirements/accent-case-insensitive-comparison/REQ_accenti-builtin-fun
====
----
etat_vol = ACCENTI('débárquér')
ACCENTI(etat_vol) = ACCENTI('débárquér')
----
[source,JSON]
----
{
"op": "=",
"args": [
{ "property": "etat_vol" },
{ "function": { "name": "accenti", "args": ["débárquér"] } },
{ "function": { "name": "accenti", "arguments": [{ "property": "etat_vol" }] } },
{ "function": { "name": "accenti", "arguments": ["débárquér"] } },
]
}
----
====

The CASEI() and ACCENTI() functions return a string typed representation of the input expression that is guaranteed to be equal to any other case or accent insensitive representation of that string. In order to ensure correct comparisons, these functions should be applied to both sides of an expression. So, for example, the only durable case-insensitive equality comparison would be `CASEI(some_property) = CASEI('Straße')`. An expression such as `CASEI(some_property) = 'strasse'` might work but is not guranteed to work across implementations or between versions of the same implementation.
The `CASEI()` and `ACCENTI()`` functions return a string typed representation of the input expression that is guaranteed to be equal to any other case or accent insensitive representation of that string. In order to ensure correct comparisons, these functions should be applied to both sides of an expression. So, for example, the only durable case-insensitive equality comparison would be `CASEI(some_property) = CASEI('Straße')`. An expression such as `CASEI(some_property) = 'strasse'` might work but is not guranteed to work across implementations or between versions of the same implementation.

[[basic-spatial-operators]]
=== Requirements Class "Basic Spatial Operators"
Expand Down
2 changes: 1 addition & 1 deletion cql2/standard/schema/examples/example26.json
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
{
"op": "in",
"args": [
{ "function": { "name": "casei", "args": ["road_class"] } },
{ "function": { "name": "casei", "arguments": [{ "property": "road_class" }] } },
[
{ "function": { "name": "casei", "args": ["Οδος"] } },
{ "function": { "name": "casei", "args": ["Straße"] } }
Expand Down
4 changes: 2 additions & 2 deletions cql2/standard/schema/examples/example27.json
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
{
"op": "=",
"args": [
{ "property": "etat_vol" },
{ "function": { "name": "accenti", "args": ["débárquér"] } }
{ "function": { "name": "accenti", "arguments": [{ "property": "etat_vol" }] } },
{ "function": { "name": "accenti", "args": ["débárquér"] } }
]
}

0 comments on commit 687a99e

Please sign in to comment.