Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue-630: Changes to define the "CASEI" function for supporting case-insensitive string comparisons. #641

Merged
merged 16 commits into from
Jan 30, 2022
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 5 additions & 0 deletions cql2/standard/clause_4_terms_and_definitions.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,11 @@ This document also uses terms defined in the OGC Standard for Modular specificat

For the purposes of this document, the following additional terms and definitions apply.

[[collation-def]]
==== collation
a set of rules that indicate how to compare and sort character string data
the process of ordering units of textual information https://www.unicode.org/glossary/#collation[Glossary of Unicode Terms]
Comment on lines +10 to +13
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The term “collation” does not seem to be used in the remainder of the document. Should it be removed from the terms & definitions?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Meeting 2022-01-17: Use the term in the description and keep the definition.


[[collection-def]]
==== collection
a body of **resources** that belong or are used together; an aggregate, set, or group of related **resources** (http://docs.opengeospatial.org/DRAFTS/20-024.html#terms_and_definitions[OGC 20-024, OGC API - Common - Part 2: Collections]).
Expand Down
84 changes: 66 additions & 18 deletions cql2/standard/clause_7_enhanced.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -29,7 +29,7 @@ include::requirements/advanced-comparison-operators/REQ_like-predicate.adoc[]

include::recommendations/advanced-comparison-operators/PER_like-predicate.adoc[]

[[example_8_4]]
[[example_7_1]]
.Example of a LIKE predicate
====
----
Expand All @@ -52,7 +52,7 @@ include::requirements/advanced-comparison-operators/REQ_between-predicate.adoc[]

include::recommendations/advanced-comparison-operators/PER_between-predicate.adoc[]

[[example_8_5]]
[[example_7_2]]
.Examples of a BETWEEN predicate
====
----
Expand All @@ -76,7 +76,7 @@ include::requirements/advanced-comparison-operators/REQ_in-predicate.adoc[]

include::recommendations/advanced-comparison-operators/PER_in-predicate.adoc[]

[[example_8_6]]
[[example_7_3]]
.Examples of a IN predicate
====
----
Expand Down Expand Up @@ -112,14 +112,62 @@ category NOT IN (1,2,3,4)
----
====

[[case-insensitive-comparison]]
=== Requirements Class "Case-insensitive Comparison"
[[accent-case-insensitive-comparison]]
=== Requirements Class "Accent and Case-insensitive Comparison"

include::requirements/requirements_class_case-insensitive-comparison.adoc[]
include::requirements/requirements_class_accent-case-insensitive-comparison.adoc[]

This requirements class adds support for case-insensitive string comparisons.
The following requirements class adds support for case-insensitive string comparisons.

#TODO: add requirement for UPPER() and LOWER(). However, there is still the unresolved question how we exactly define UPPER() and LOWER(). Or whether we should define a unicode case folding function instead (see http://unicode.org/reports/tr21/tr21-5.html#Caseless_Matching[Unicode] or https://www.w3.org/TR/charmod-norm/#definitionCaseFolding[W3C]).#
include::requirements/accent-case-insensitive-comparison/REQ_casei-builtin-function.adoc[]

[[example_7_4_casei]]
.Example case-insensitive comparison
====

----
CASEI(road_class) IN (CASEI('Οδος'),CASEI('Straße'))
----

[source,JSON]
----
{
"in": [
{ "function": { "name": "casei", "arguments": ["road_class"] } },
[
{ "function": { "name": "casei", "arguments": ["Οδος"] } },
{ "function": { "name": "casei", "arguments": ["Straße"] } }
]
]
}
----
====

This requirements class adds support for accent-insensitive string comparisons.

include::requirements/accent-case-insensitive-comparison/REQ_accenti-builtin-function.adoc[]

[[example_7_4_accenti]]
.Example accent-insensitive comparison
====

----
etat_vol = ACCENTI('débárquér')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pvretano I know you just merged this, but shouldn't this be

ACCENTI(etat_vol) = ACCENTI('débárquér')

as as it is for CASEI? same for the JSON example.

----

[source,JSON]
----
{
"op": "=",
"args": [
{ "property": "etat_vol" },
{ "function": { "name": "accenti", "args": ["débárquér"] } },
]
}
----
====

The CASEI() and ACCENTI() functions return a string typed representation of the input expression that is guaranteed to be equal to any other case or accent insensitive representation of that string. In order to ensure correct comparisons, these functions should be applied to both sides of an expression. So, for example, the only durable case-insensitive equality comparison would be `CASEI(some_property) = CASEI('Straße')`. An expression such as `CASEI(some_property) = 'strasse'` might work but is not guranteed to work across implementations or between versions of the same implementation.

[[basic-spatial-operators]]
=== Requirements Class "Basic Spatial Operators"
Expand All @@ -138,7 +186,7 @@ include::recommendations/basic-spatial-operators/PER_spatial-predicates.adoc[]

CQL2 uses Well-Known Text (WKT) or GeoJSON to encode geometry literals. Since WKT and GeoJSON do not provide a capability to specify the CRS of a geometry literal, the server has to determine the CRS of the geometry literals in a filter expression through another mechanism. For example, a query parameter `filter-crs` is used in <<OGCFeat-3,OGC API - Features - Part 3: Filtering>> to pass the CRS information to the server.

[[example_9_1]]
[[example_7_5]]
.Example spatial predicate
====
----
Expand Down Expand Up @@ -168,7 +216,7 @@ S_INTERSECTS(geometry,POLYGON((36.319836 32.288087,36.320041 32.288032,36.320210
----
====

[[example_9_2]]
[[example_7_6]]
.Example for the filter-crs query parameter
====
----
Expand Down Expand Up @@ -226,7 +274,7 @@ image::images/within.png[alt=Within,width=70%]

NOTE: If geometry *_a_* `S_CONTAINS` geometry *_b_*, then geometry *_b_* is `S_WITHIN` geometry *_a_*.

[[example_9_3]]
[[example_7_7]]
.Example of a spatial relationship between a property and a literal geometry.
====
----
Expand Down Expand Up @@ -330,7 +378,7 @@ include::requirements/temporal-operators/REQ_temporal-operators.adoc[]

include::recommendations/temporal-operators/PER_temporal-predicates.adoc[]

[[example_9_4]]
[[example_7_8]]
.Examples of temporal predicate using T_INTERSECTS
====
----
Expand All @@ -349,7 +397,7 @@ T_INTERSECTS(event_date, INTERVAL("1969-07-16T05:32:00Z", "1969-07-24T16:50:35Z"
----
====

[[example_9_5]]
[[example_7_9]]
.Examples of temporal relationships using a property and a temporal literal.
====
----
Expand Down Expand Up @@ -387,7 +435,7 @@ the requirements class <<rc_functions,Custom Functions>>.
Support for the BNF rule `arithmeticExpression` is added by
the requirements class <<rc_arithmetic,Arithmetic Expressions>>.

[[example_9_6]]
[[example_7_10]]
.Evalute if the value of an array property contains the specified subset of values.
====
----
Expand Down Expand Up @@ -415,7 +463,7 @@ This requirements class adds support for properties on the right side of predica

include::requirements/property-property/REQ_withdraw-permissions.adoc[]

[[example_9_7]]
[[example_7_11]]
.Example of a spatial relationship between two literal geometries.
====
----
Expand Down Expand Up @@ -458,7 +506,7 @@ S_CROSSES(LINESTRING(43.72992 -79.2998, 43.73005 -79.2991, 43.73006 -79.2984,
----
====

[[example_9_8]]
[[example_7_12]]
.Examples of temporal relationships using temporal literals.
====
----
Expand Down Expand Up @@ -490,7 +538,7 @@ include::requirements/functions/REQ_functions.adoc[]
NOTE: Support for the BNF rule `arithmeticExpression` is added by
the requirements class <<rc_arithmetic,Arithmetic Expressions>>.

[[example_9_9]]
[[example_7_13]]
.Example of a spatial relationship between a property and a function that return a geometry value.
====
It should be noted that the function "Buffer()" in this example is not part of CQL2 but is an example of a function that an implementation may offer that returns a geometry value.
Expand Down Expand Up @@ -534,7 +582,7 @@ include::requirements/arithmetic/REQ_arithmetic.adoc[]

NOTE: Support for the BNF rule `function` is added by the requirements class <<rc_functions,Custom Functions>>.

[[example_9_11]]
[[example_7_14]]
.Predicate with an arithmetic expression finding all vehicles that are too tall to pass under a bridge.
====
----
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
[[req_accent-case-insensitive-comparison_accenti-builtin-function]]
[width="90%",cols="2,6a"]
|===
^|*Requirement {counter:req-id}* |*/req/accent-case-insensitive-comparison/accenti-builtin-function*
^|A |The server SHALL support a built-in function named `ACCENTI`.
^|B |The function SHALL accept one argument that can be a character string literal, the name of a property that evaluates to a character string literal or a function that returns a character string literal (see rules `characterLiteral`, `propertyName`, `function`).
^|C |The function SHALL return a character string literal.
^|D |The function SHALL implement https://www.w3.org/TR/charmod-norm/#unicodeNormalization[unicode normalization] described in the implementation guidelines of https://www.unicode.org/versions/Unicode14.0.0[The Unicode Standard, Version 14.0] (see https://www.unicode.org/versions/Unicode14.0.0/ch05.pdf[clause 5.6 Normalization]).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider adding the term “Unicode normalization” to the terms & definitions.

Unicode normalization
normalization
process of removing alternate representations of equivalent sequences from textual data, to convert the data into a form that can be binary-compared for equivalence

Source: https://www.unicode.org/glossary/#normalization

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Meeting 2022-01-17: Agreed.

|===

NOTE: The references in D need to be verified. Also need to have some discussion about NFC (cacnonically-composed form) or NFD (canonically decomposed form). I think the correct thing is to say that ACCENTI() must do NFC ... but I need to verify.
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
[[req_accent-case-insensitive-comparison_casei-builtin-function]]
[width="90%",cols="2,6a"]
|===
^|*Requirement {counter:req-id}* |*/req/accent-case-insensitive-comparison/casei-builtin-function*
^|A |The server SHALL support a built-in function named `CASEI`.
^|B |The function SHALL accept one argument that can be a character string literal, the name of a property that evaluates to a character string literal or a function that returns a character string literal (see rules `characterLiteral`, `propertyName`, `function`).
^|C |The function SHALL return a character string literal.
^|D |The function SHALL implement the https://www.w3.org/TR/charmod-norm/#definitionCaseFolding[full case folding] algorithm defined in the implementation guidelines of https://www.unicode.org/versions/Unicode14.0.0[The Unicode Standard, Version 14.0] (see https://www.unicode.org/versions/Unicode14.0.0/ch05.pdf[clause 5.18 Case Mappings, sub-clause Caseless Matching]).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider adding the term “Unicode case folding” to the terms & definitions:

Unicode case folding
case folding
process of making two texts which differ only in case identical for comparison purposes

Note: Case folding is meant for the purpose of string matching.

Source: https://www.w3.org/TR/charmod-norm/#definitionCaseFolding

Consider adding and using the term “Unicode case-insensitive matching” as well.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Meeting 2022-01-17: Agreed.

|===
16 changes: 9 additions & 7 deletions cql2/standard/schema/cql2.bnf
Original file line number Diff line number Diff line change
Expand Up @@ -202,9 +202,7 @@ arithmeticOperand = numericLiteral
#=============================================================================#
# Definition of a PROPERTYNAME
#=============================================================================#
propertyName = identifier | doubleQuote identifier doubleQuote
| "UPPER" leftParen propertyName rightParen
| "LOWER" leftParen propertyName rightParen;
propertyName = identifier | doubleQuote identifier doubleQuote;

identifier = identifierStart [ {colon | period | identifierPart} ];

Expand All @@ -217,8 +215,8 @@ identifierPart = alpha | digit | dollar | underscore;
# The functions offered by an implementation are provided at `/functions`
#=============================================================================#
function = identifier leftParen {argumentList} rightParen
| "UPPER" leftParen function rightParen
| "LOWER" leftParen function rightParen;
| "CASEI" leftParen characterExpression rightParen
| "ACCENTI" leftParen characterExpression rightParen;

argumentList = argument [ { comma argument } ];

Expand All @@ -232,12 +230,16 @@ argument = characterLiteral
| arithmeticExpression
| arrayExpression;

characterExpression = characterLiteral
| propertyName
| function;

#=============================================================================#
# Definition of CHARACTER literals
#=============================================================================#
characterLiteral = characterStringLiteral
| "UPPER" leftParent characterLiteral rightParen
| "LOWER" leftParent characterLiteral rightParen;
| bitStringLiteral
| hexStringLiteral;

characterStringLiteral = quote [ {character} ] quote;

Expand Down
Loading