Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove infix operators in filter syntax? #172

Closed
ethanresnick opened this issue Jun 16, 2018 · 6 comments
Closed

Remove infix operators in filter syntax? #172

ethanresnick opened this issue Jun 16, 2018 · 6 comments

Comments

@ethanresnick
Copy link
Owner

This is an open question for current json-api users:

Would it be worth it to remove the distinction in the current filter syntax in which binary operators get their operator "infixed" (i.e., placed in the middle, like (x,gte,4)) but other operators get their operators placed at the beginning (e.g., (and,(...),(...)))? Instead, the operator would always be the first item.

The advantages of this are:

  • It becomes easy for a binary operator to add extra, optional operands down the line. E.g., imagine you want to extend the lte operator to support "chained comparisons". So, instead of saying (and,(0,lte,minDistance),(minDistance,lte,maxDistance)), you want to support something like (lte,0,minDistance,maxDistance). In the current setup, supporting this would change where the parser is expecting lte operator to be, because it's no longer a binary a binary operator. So, either you couldn't change it (and would create a new operator instead), or the change would break existing users, or you'd have to complicate the parsing and serialization logic further to support both forms (i.e., (x,lte,y) and (lte,x,y)).

  • Related to the above, the parser and serializer (in progress) already have to special case binary operators. In this simplification, those special cases could be removed, and operators also wouldn't have to be defined with whether they're binary (though perhaps they still could be defined with their arity to enable automatic validation of the length of the arguments list).

  • Forcing the infixing of binary operators was intended to offer better usability, and it probably does on the whole. However, the forced infixing sometimes leads to really unreadable results. E.g., there's an operator that takes a center point and a radius and returns a circle value that can be used as an operand to the geoWithin operator (to find resource objects inside that circle). Because this operator is binary, it has to be infixed, so you end up with ([0,0],toGeoCircle,100). That's pretty cryptic. (The first arg is the center and the second is the radius.) It'd probably be clearer to have (geoCircle,[0,0],100), as that reads more like a standard function call.

The disadvantages are:

  • This is a break from the existing format. Big mitigating factor: people could continue to use the current version of the parser package while still updating the json-api lib to the latest version — which is very nice!

  • Most binary operators, which are the most common case, probably get a bit less readable, if you accept that that people find it easier to read (a,gte,b) than (gte,a,b).

@ethanresnick
Copy link
Owner Author

ethanresnick commented Jun 17, 2018

Hearing no objections, I'm gonna go ahead and make this change, and publish a new major version of @json-api/query-parser soon. If you want the old behavior, you can explicitly install v2 of the query parser library (npm i @json-api/[email protected]) and then configure json-api to use that as your parser:

import { parseSort, parseFilter } from "@json-api/query-parser";

// Use old parsers
const API = new API.controllers.API(registry, {
  sortParser(legalSortOperators, rawQuery) {
    const paramValue = getQueryParamValue("sort", rawQuery);
    return paramValue && parseSort(paramValue, legalSortOperators);
  },
  filterParser(legalFilterOperators, rawQuery) {
    const paramValue = getQueryParamValue("filter", rawQuery);
    return paramValue && parseFilter(paramValue, legalFilterOperators);
  }
});

Note: the code above depends on a getQueryParamValue utility function, which is defined as:

function getQueryParamValue(paramName: string, queryString: string | undefined) {
  if(typeof queryString === 'undefined') { return undefined; }

  return queryString.split("&").reduce((acc: string | undefined, paramString) => {
      const [rawKey, rawValue] = splitSingleQueryParamString(paramString);
      return rawKey === paramName ? rawValue : acc;
  }, undefined);
};

function splitSingleQueryParamString(paramString: string) {
  const bracketEqualsPos = paramString.indexOf("]=");
  const delimiterPos =
    bracketEqualsPos === -1 ? paramString.indexOf("=") : bracketEqualsPos + 1;

  // returning [undecoded key, undecoded value]
  return delimiterPos === -1
    ? [paramString, ""]
    : [paramString.slice(0, delimiterPos), paramString.slice(delimiterPos + 1)];
}

@ethanresnick
Copy link
Owner Author

ethanresnick commented Jun 18, 2018

Thinking about this more, I'm realizing that the problem with the current filter syntax is, more generally, that the special cases around where/if the operator is listed, which are so nice for usability when writing the filter constraints by hand, can lead to pretty problematic bugs when generating the filter constraints programatically.

E.g., imagine an n-ary operator that takes a one or two args, some of which can be identifiers/field references, as in (nary-op,fieldA,fieldB). This works great — unless fieldA happens to share the name of a valid binary operator, in which case the parsing suddenly switches from parsing as a use of nary-op to parsing as a use of the fieldA operator, and there's no way to indicate that that's what you want. If more operators are added over time, this expression may work at development time, but then start apparently-randomly failing in the distant future. More over, if the reference to fieldA is controlled by user input, this might be exploitable in some contexts.

The eq shorthand actually suffers from the same problem. (field,`value`) works great — unless field happens to match the name of a (non-binary) operator too, in which case this'll parse as an invocation of that operator.

All this makes me more convinced than ever that the operator needs to be in a consistent spot.

The obvious answer is to just always put the operator first, as I proposed in the OP, but, after sitting with that, I really don't like how it makes the by-far most common cases (binary operators) harder to read. At least to me, (zip,in,[90210]) is way, way more intuitive than (in,zip,[90210]).

So, my latest thinking is that:

  1. It should always be legal to write a filter constraint/field expression with the operator as the first item, but also, there has to be a colon before the operator. People who don't mind reading expressions where the operator comes first, or who prefer to use only one syntactic form (e.g., when generating constraints automatically), can use this form throughout. It looks like..... (:in,zip,[90210]) or (:and,(:gte,distance,10),(:eq,status,`published`))

  2. Because there's a colon in front of the operator in the form above, it's possible to safely keep the eq shorthand. So, (status,`published`) is still short for (:eq,status,`published`), but the colon removes the bug/security risk that existed before. I.e., the reference to status, which might have come from user input, can no longer be mistaken for a unary operator, because it doesn't have a colon.

  3. In the same way that the colon makes it possible to add a 2-item special form (for the eq shorthand), it also enables us to have a 3-item special form for infixing. So, (salary,gte,100) can be totally unambiguously resolved. I.e., the parser can say: it doesn't have a colon at the start so it can't be the form from point 1; it isn't two items long so it can't be the form from point 2; so it must be our infix form. So, the only question is: do we want this form to require a colon in front of the operator (which would be consistent with form 1, but a big break and noisy), or do we want to require it not be there (the current syntax), or make it optional? I'm inclined to make it required for now, as that can always be relaxed later (without introducing security issues), and I like that requiring it leads to a world where, from the user's perspective, the : acts as an operator sigil that let's them "move the operator around".

Overall, I think I like this approach, and plan to implement it now. Some breaking change was necessary to resolve the security issues of the current syntax, and some change beyond "always put the operator first" was needed if we wanted to be able to safely keep the eq shorthand. The fact that this colon-prefixing approach lets us keep the eq shorthand and the infix notation makes the extra colons more than worth it imo. I also think I could grow to like the way it makes the operator distinctive visually from other identifiers. Users can still stick with the old parser lib, per the comment above, if they can't afford breaking changes.

@ethanresnick
Copy link
Owner Author

Note: my comment above has been heavily edited and describes the resolution to this issue.

ethanresnick added a commit to ethanresnick/json-api-custom-query that referenced this issue Jun 18, 2018
See
ethanresnick/json-api#172 (comment)
66 for an overview of the changes
ethanresnick added a commit that referenced this issue Jun 18, 2018
@ethanresnick
Copy link
Owner Author

This has been addressed in 3.0.0-rc.6. See 0c0b35f

@ethanresnick
Copy link
Owner Author

@carlbennettnz FYI, there's now a filter/sort param serialization function in the @json-api/querystring package.

@carlbennettnz
Copy link
Contributor

Awesome, thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants