Skip to content

How to: Avoid Pitfalls

pkoppstein edited this page Aug 2, 2023 · 29 revisions

TOC

Keywords

The fact that jq has keywords such as if and end has various implications, some of which may not be obvious. In particular:

  • in jq 1.6 and earlier, keywords cannot be used in the abbreviated syntax for specifying key-value pairs, e.g. {foo} for {"foo": .foo}
  • in jq 1.6 and earlier, keywords cannot be used to form $-variable names

The full list of reserved keywords is currently:

and as break catch def elif else end foreach if import include label module or reduce then try

(The list of keywords for any particular version of jq can be derived from the lexer.l file, the “master” version of which is https://github.com/stedolan/jq/blob/master/src/lexer.l)

nan, NaN, inf, Inf, infinite and null

nan is a jq value representing IEEE NaN, but it prints as null.

NaN is recognized in JSON text and is also understood to represent IEEE NaN.

Use isnan to test whether a jq value is identical to IEEE NaN.

Here are some illustrative examples:

$ echo NaN | jq .
null

$ echo nan | jq .
parse error: Invalid literal at line 2, column 0

$ echo NaN | jq isnan
true

$ jq -n 'nan | isnan'
true

Similar comments apply to the jq value infinite, and the admissible values inf and Inf:

$ echo Inf | jq isinfinite
true

$ echo inf | jq isinfinite
true

$ jq -n 'infinite | isinfinite'
true

foo.bar vs .foo.bar

foo.bar is short for foo | .bar and means: call foo and then get the value at the "bar" key of the output(s) of foo.

.foo.bar is short for .foo | .bar and means: get the value at the "foo" key of . and then get the value at the "bar" key of that.

One character, big difference.

Cartesian Products

jq is geared to produce Cartesian products at the drop of a hat. For example, the expression (1,2) | (3,4) produces four results:

3
4
3
4

To see why:

$ jq -n '(1,2) as $i | (3,4) |  "\($i),\(.)"' 
"1,3"
"1,4"
"2,3"
"2,4"

Generator Expressions in Assignment Right-Hand Sides

Generator expressions in assignment RHS expressions are likely to surprise users. Compare (.a,.b) = (1,2) to (.a,.b) |= (.+1,.*2).

Backtracking (empty) in Assignment RHS Expressions and Reductions.

.a=empty and .a|=empty behave differently:

null | .a = empty     #=> the empty stream 
null | .a |= empty    #=> null

In reductions, care should be exercised when including empty in the body. For example, one might reasonably expect that:

reduce 1 as $x (2; empty)

would produce 2, but in fact it produces null in most versions of jq, including jq 1.5 and earlier, as well as the current “master” version as of 2018.

WARNING: Expressions of the form A | .[] |= E where A is an array and E can evaluate to empty should in general be avoided. Their behavior is inconsistent between versions of jq, and jq version 1.6 will often evaluate them incorrectly. For example, using jq 1.6:

jq -n '[0,1,2] | .[] |= if . == 0 then empty else . end'

yields:

[1,2,null]

Multi-arity Functions and Comma/Semi-colon Confusability

foo(a,b) is NOT the same as foo(a;b). If foo/1 and foo/2 are both defined, then if you write foo(a,b)intending to call the two-argument function, you'll silently get the wrong behavior.

For example, foo(1,2) is a call to foo/1 with a single argument consisting of the expression 1,2, while foo(1;2) is a call to foo/2 with two arguments: the expressions 1, and 2.

One character, big difference.

index/1 is byte-oriented but match/1 is codepoint-oriented

Given strings as input, the index family of filters (index, rindex, indices) return byte-oriented offsets. For codepoint-oriented offsets, one can use the array-oriented versions of these filters, or match/1 or match/2, or the definition of myindex given below.

For example:

$ jq -cn '"aéb" | [., index("b")]'
["aéb",3]
$ jq -cn '"aéb" | [., (explode|index("b"|explode))]'
["aéb",2]
$ jq -cn '"a\u00e9b" | [., index("b")]'
["aéb",3]
$ jq -cn '"a\u00e9b" | match("b").offset'
2 
# codepoint-oriented version of `index/1` for strings
# e.g. ("”#a" | myindex("#a")) yields 1
def myindex($string):
  ($string|length) as $sl
  | if $sl > length
    then null
    else
      explode as $x
      | ($string|explode) as $s
      | first(range(0; 1 + length - $sl) as $i
              | select($x[$i: $sl+$i] == $s) | $i) // null 
  end;

If A and B are arrays, B|.[A] is the same as B|indices(A)

If A and B are JSON arrays, then B|.[A] asks for the sorted array of ALL the indices, $i, such that .[0:$i] + A is an initial subarray of B. This has implications for B|index(A) as well.

Examples:

jq -nc '[0,1,2,3,4,1,2] | .[[1,2]]'
[1,5]

jq -nc '[0,1,2,3,4,1,2] | index([[1,2]])
1

Overriding Operator Definitions

Overriding operator definitions is possible but probably ill-advised if for no other reason than that the results can be surprising because of compile-time constant-folding. Consider, for example, what happens when we override + as follows:

def myplus($a;$b): _plus($a;$b);

def _plus($a;$b): [ myplus($a;$b) ];

We might expect that the expression 1+2 would now evaluate to [3] but, because the constant-folding occurs before the new definition becomes effective, it will instead evaluate to 3.

Clone this wiki locally