Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sorting parameters #22

Closed
uvoges opened this issue Jan 24, 2020 · 27 comments
Closed

Sorting parameters #22

uvoges opened this issue Jan 24, 2020 · 27 comments
Assignees

Comments

@uvoges
Copy link

uvoges commented Jan 24, 2020

A sortKey would be important to have the possibility to page through sorted results.
We defined something in https://portal.opengeospatial.org/files/?artifact_id=73994 (chapter 7.2.6)

@uvoges
Copy link
Author

uvoges commented Jan 24, 2020

..better tbd in API-Common !?

@tomkralidis
Copy link
Contributor

pygeoapi implemented with the following definition. Not as detailed as the OpenSearch-EO, but might be generic enough for core capability

@cportele
Copy link
Member

Meeting 2020-02-24: We have several extensions / conformance classes for presenting the results (e.g, sorting, projecting properties). These should be bundled in one part.

@pvretano pvretano changed the title Query parameters Sorting parameters Mar 4, 2020
@tomkralidis
Copy link
Contributor

Meeting 2020-05-04

  • better off in OGC API - Common
  • need to consider case-senstivity

@tomkralidis
Copy link
Contributor

For reference/information, current state of sort in the STAC API specification.

@cholmes
Copy link
Member

cholmes commented Nov 2, 2020

Any word on this? We're aiming to more fully align with OGC API's in our next version. Any tweaks desired to our sorting extension?

@pvretano
Copy link
Contributor

pvretano commented Nov 3, 2020

@cholmes the current thinking is Parameter sortby. Can you point me to the STAC sorting extension?
NOTE: This is still NOT a reviewable draft outside of the SWG. Now that tb16 is winding down the current plan is to have a reviewable draft by the end of the month. :)

@pvretano
Copy link
Contributor

pvretano commented Nov 3, 2020

@cholmes Doh! Check that. Just saw the link in the previous comment!

@uvoges
Copy link
Author

uvoges commented Nov 3, 2020

In OpenSearch-EO (13-026r9) we defined per sortKey (in addition to the name + order (desc/asc)):

  • CaseSensitive: optional, boolean, default ‘0‘ (meaning ‘false’)

  • MissingValue: optional, (Default is ‘highValue’). If the sortKey is not present within the record (for example if a sort by publicationDate is requested, and a record has no publicationDate), it will behave in accordance with this value. One of:
    o 'abort': Supply a diagnostic saying that the sort could not be performed.
    o 'highValue': The server should sort this as if it were the highest possible value.
    o 'lowValue': The server should sort this as if it were the lowest possible value.
    o 'omit': remove the record from the results.
    o A fixed value: The server should sort the record as if this value were supplied.

@m-mohr
Copy link
Contributor

m-mohr commented Nov 30, 2020

I just saw that you plan to use some kind of name:asc|desc for sorting. STAC has discarded that as we use the double colon in field names quite a lot. Another separator would likely be easier to implement for us. As we couldn't really figure out a better separator (can explain that a bit more if required), we decided to use the - for desc.

@cholmes
Copy link
Member

cholmes commented Nov 30, 2020

Agreed, another separator would be great. I do like the simplicity of + and - in STAC, it fits to me with the terse CQL style. But if there are really good reasons we can align to asc and desc. But a different separator would be good, as there are already a lot of catalogs in use that use ':' in their property names, so it'd be tough to get everyone to switch - we'd probably just keep our own sort extension, even though we'd really, really like to align. From looking online for rest patterns:

https://www.moesif.com/blog/technical/api-design/REST-API-Design-Filtering-Sorting-and-Pagination/ has:

GET /users?sort_by=asc(email) and GET /users?sort_by=desc(email)

GET /users?sort_by=+email and GET /users?sort_by=-email

GET /users?sort_by=email.asc and GET /users?sort_by=email.desc

GET /users?sort_by=email&order_by=asc and GET /users?sort_by=email&order_by=desc

@pvretano
Copy link
Contributor

pvretano commented Dec 1, 2020

I surveyed how a number of back-end databases do sorting and this is what I found ...

SQL: ORDER BY age desc, posts asc
ELASTIC SEARCH: "sort": [{"age": "desc"},{"posts": "asc"}]
MONGODB: sort({are : -1, posts : 1})

I also took a look at OpenSearch EO which uses a more complex form composed of a space-separated list of comma-separated list elements. Each list element is composed of the sort key and then a bunch of optional sort facet modifiers of which sort order is one component. The details can be found here: https://portal.ogc.org/index.php?m=projects&a=view&project_id=442&tab=2&artifact_id=83291. Here is an example:

GET /items?sortkey=email,,0,, senddate=,,1,,&...

So it looks like we have as many sortby formulations as people writing specifications. I actually like the simplicity of the STAC approach except of one thing which may or may not be a problem. I envision that future extensions of the sorting capability might allow expressions (including arithmetic expressions) as the sort key and so the use of +/- might cause issues. So, I propose we use a space separator. Some examples ...

GET /items?sortby=email,senddate&...
GET /items?sortby=email desc,senddate&...
GET /items?sortby=email desc,senddate asc&...
GET /items?sortby=email desc,senddate+5 asc&...

Comments?

@cholmes
Copy link
Member

cholmes commented Dec 1, 2020

What would a sort capability with an expression look like? I see why they'd be in the filter, but don't see the use case to have it in the sortby.

If we do really think that you'd have math expressions in a sortby then I think I can get behind space as a seperator. It's actually what Google has in their API guide - https://cloud.google.com/apis/design/design_patterns#sorting_order

@mhogeweg
Copy link
Contributor

mhogeweg commented Dec 1, 2020

for sure expression-sort would be in an extension of an extension of an extension... but that doesn't preclude us from allowing it to be done.

while senddate+5 doesn't change the sorting order, sorting based on an addition of two independent properties of the item as included in the metadata could work. But before starting specifying the algebra that would be supported, I would like to see a real need for this. some sort of weighted sort? this field twice as important as that field? 2A/B?

@mhogeweg
Copy link
Contributor

mhogeweg commented Dec 1, 2020

I would also distinguish ranking from sorting and not try to implement some client-defined ranking via fancy sorting expressions. Ranking represents how well the item matched the criteria. The sort can then be applied to that result set or to a single page in the result set, etc

@pvretano
Copy link
Contributor

pvretano commented Dec 1, 2020

@cholmes it would not be just math expressions but expressions in general of which arithemtic expressions would be a subset.

@mhogeweg all we are doing here is trying to figure out whether the form of the sortby expression should be:

"...&sortby=-email,+sendate&..."

or

 "...&sortby=email desc,senddate asc&...".

I am arguing for the latter since I can envision a possible future extension to sorting that might allow general expressions (including arithmetic expressions) beyond just a property name as the sort key. In SQL ORDER BY, complex expressions are often used to (for example) group rows into buckets for the purpose of sorting (usually using a CASE statement) and so I figure we should probably try to avoid anything that might torpedo any such future extension. Should such a future extension ever be proposed, I can see how something like -email (not a good example) might be ambiguous and might be interpreted as the negative of email rather than indicating a descending sort order. Just to be clear however, I am only using the furture posibility of expressions in a sortby as justification for going with the later form (i.e. desc, asc) rather than the former (i.e. +/-). Expressions in the sortby are WAY out of scope for the core and I am not even suggesting that such a thing would ever be proposed ... but it might.
BTW, yes, the senddate+5 was a very bad example of an expression in a sortby but I was using my phone to respond at the time and typing out a complex sortby expression would have been really painful! ;)

So @mhogeweg which do you prefer? +/- or asc/desc?

@cportele
Copy link
Member

cportele commented Dec 1, 2020

I like the intuitive and compact style used by STAC, but if we want to keep the door open to go beyond property names then the SQL approach would be my preference, too.

@m-mohr
Copy link
Contributor

m-mohr commented Dec 1, 2020

Although I've never seen more than pure sorting (i.e. name and sort order), isn't +/- actually the "mathematical way" to sort asc/desc and be fully aligned with a future "expression" extension? So I don't see why this would be conflicting with that and for that use case I'd say +/- is the natural way forward without a need for asc/desc. I'd also be fine with some kind of asc/desc keyword and a suitable separator, but I don't really get most of the arguments against +/- at the moment.

Here are some reasons against some of the separation characters:

  • space: Is translated to %20 in GET requests and that makes the expression look bad
  • double colon: Used in STAC in field names
  • comma: Currently often used to separate multiple sort keys, but could work this way name,asc;age,desc
  • semicolon: name;asc,age;desc feels counter-intuitive as ; is "stronger" that a comma
  • dot: Is used in STAC to query for properties in (child-)objects, e.g. properties.datetime
  • all other non-encoded chars like ~ felt weird, too.

On issue that may always come up is that basically all characters are allowed in JSON keys and thus could be present in the sort field names. This can lead to bugs in implementations. Therefore, it could make sense to reserve the first character (+/-) always for sorting. That's not yet part of STAC (+ is optional), but I could see that getting required, making parsing much easier. But I'm not thinking about extensions, so could be an issue there.

@pvretano
Copy link
Contributor

pvretano commented Dec 1, 2020

@m-mohr assuming that expressions where allowed in a sortby, how would one distinguish between negation of an expression and indicating a descending sort?

sortby=-myProp+<some other expression>,...

In this case, do I mean the negative of myProp plus the rest of the expression (whatever that happens to be) OR do I mean myProp plus the expression in descending sort order? Is this not ambiguous?
Like @cportele I like to compact +/- notation but I also don't want to preclude a future extension that includes expressions. I will raise the issue in the SWG once again because I really want STAC and OAPIR to be aligned.

B.T.W. here is a complex sorting expression in SQL ...

ORDER BY attr1,
CASE WHEN ATTR1 IN (value1, value2)
       THEN TO_CHAR(creationDate,'YYYYMMDDHH24MISS') 
            || to_char(modifiedDate,'YYYYMMDDHH24MISS') || author
     WHEN ATTR1 = value3
       THEN attr2 || CASE WHEN ATTR2 = value4
                            THEN to_char(oldPriority,'99999')
                                 || to_char(modifiedDate,'YYYYMMDDHH24MISS')
                                 || to_char(creationDate,'YYYYMMDDHH24MISS') 
                                 || author
                          WHEN ATTR2 = value5
                            THEN to_char(creationDate,'YYYYMMDDHH24MISS')
                                 || to_char(modifiedDate,'YYYYMMDDHH24MISS')
                          END
     END

@m-mohr
Copy link
Contributor

m-mohr commented Dec 1, 2020

Sorting is always ascending and users need make sure to revert order with - / *-1 for descending, i.e. you don't need to distinguish.

@pvretano
Copy link
Contributor

pvretano commented Dec 7, 2020

07-DEC-2020: (1) Check with Uwe to make sure that the proposed STAC-based approach is OK. (2) Once (1) has been checked, I will update the document accordingly.

@uvoges
Copy link
Author

uvoges commented Dec 8, 2020

I am dispassionate there. For me something like "name,asc;age,desc" is intuitive but another approach is fine with me too.
Case-sensitivity: I guess the order of string-based elements is fix if we know/define the characterSet, right ?

@pvretano
Copy link
Contributor

pvretano commented Dec 8, 2020

Thanks @uvoges. One more thing off of the TODO list. ;) About the order of string-based elements, yes, if the characterSet is known then we know the sort order. The Content-Type header in the response will usually give you the charset information.

There is one unfortunate point though; as far as I know, HTTP does not mandate a default charset so we may need to add a requirement in the specification to say that if a sort is requested then the Content-Type of the response must include the character set.

@pvretano
Copy link
Contributor

pvretano commented Feb 8, 2021

SWG MEETING 08-FEB-2021: Closing. We have adopted the STAC +/- syntax for specifying sort direction.

@pvretano pvretano closed this as completed Feb 8, 2021
@pvgenuchten
Copy link
Contributor

pvgenuchten commented Feb 21, 2021

applications are expected to understand ?sortby=+title? Because in common url parsing + arrives at the application decoded as a space. It needs url-encoding as ?sortby=%2Btitle to operate fine.

instead i would suggest to use title (asc) and -title (desc) as suggested in opengeospatial/ogcapi-features#157 (comment)

@m-mohr
Copy link
Contributor

m-mohr commented Feb 22, 2021

+ can be omitted in STAC

@pvretano
Copy link
Contributor

@m-mohr and in records ... the regex is [+-]?[A-Za-z_][A-Za-z0-9]* which I think is correct. The default is + (see: https://docs.ogc.org/DRAFTS/20-004.html#sorting-parameter-sortby).
@pvgenuchten we have already debated this issue and decided to align with STAC. I don't think it's a big deal requiring the + to be URL encoded. By this point, most (all) clients (especially machine clients) know they have to URL encode. However, if you feel strongly about the issue, please reopen it and we can put it back into the discussion queue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

8 participants