Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve handling of non-integer numeric indices (fix #2815) #2821

Merged
merged 1 commit into from
Aug 27, 2023

Conversation

nicowilliams
Copy link
Contributor

  • require slice indices to be integers when writing
  • output null when using slices with non-integer numeric indices to read
  • require indices to be integers when writing

@nicowilliams nicowilliams added this to the 1.7 release milestone Aug 4, 2023
Copy link
Member

@emanuele6 emanuele6 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't like that .[1.5] returns null. I would prefer an error, and it just rounded down to .[1].
I prefer rounding down the most.

src/jv_aux.c Outdated
if (dstart < 0) dstart = 0;
if (dstart > len) dstart = len;

int start = (int)dstart;
Copy link
Member

@emanuele6 emanuele6 Aug 4, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Like for the modulo operator, this is unspecified behaviour if dstart is an integer according to jv_is_integer(), but it is bigger/smaller than INT_MAX/INT_MIN.

I think we should add some jv_int_value() and jv_intmax_value() functions that round and clamp numbers consistently, and use those everywhere we need an integer instead of (int)jv_number_value().

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

@nicowilliams
Copy link
Contributor Author

I don't like that .[1.5] returns null. I would prefer an error, and it just rounded down to .[1].
I prefer rounding down the most.

I'm confused. You would prefer an error, but you prefer rounding down. Which is it?

@emanuele6
Copy link
Member

emanuele6 commented Aug 4, 2023

I prefer rounding down.
But I think an error would still be better than returning null.

I would prefer an error, and it just rounded down to .[1].

Oops, I meant to write "I would prefer an error, or that it just rounded down to .[1]".

@nicowilliams
Copy link
Contributor Author

@emanuele6 yeah, I too would prefer either an error or rounding down. I'll pick one!

@itchyny
Copy link
Contributor

itchyny commented Aug 4, 2023

+1 for rounding down

@nicowilliams nicowilliams force-pushed the fix_2815 branch 2 times, most recently from abdda81 to 28c8a82 Compare August 17, 2023 15:52
@nicowilliams
Copy link
Contributor Author

Oh, I have to do the rounding down thing.

src/jv_aux.c Outdated
if (dstart < INT_MIN) dstart = INT_MIN;
if (dstart > INT_MAX) dstart = INT_MAX;
if (dend < INT_MIN) dend = INT_MIN;
if (dend > INT_MAX) dend = INT_MAX;
Copy link
Member

@wader wader Aug 17, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it ok to mix a double and INT_MIN/MAX? guaranteed to fit without losing "integer" precision?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good question. If sizeof(int) is 8 then... no. I'll fix it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've pushed a fix for this.

@wader
Copy link
Member

wader commented Aug 17, 2023

Think i prefer rounding down also

@nicowilliams
Copy link
Contributor Author

Rounding down is what I went with.

@nicowilliams
Copy link
Contributor Author

Note that for slices we were already rounding the start down and the end up (because it's exclusive) before (in 1.6), and we still are.

@nicowilliams
Copy link
Contributor Author

Is this ready?

idx += jv_array_length(jv_copy(t));
v = jv_array_get(t, idx);
if (!jv_is_valid(v)) {
jv_free(v);
v = jv_null();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Previously .[nan] emits null due to the jv_is_integer checking but now we rely on undefined behavior (on my M1 macOS, [1] | .[nan] emits 1 in this branch, previously null). We should check isnan first.

src/jv_aux.c Outdated
if (dstart < 0) dstart += len;
if (dend < 0) dend += len;
if (dstart < 0) dstart = 0;
if (dstart > len) dstart = len;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you clamp indices into [0, len], you don't need to compare against integer min/max. I think we can simplify as follows.

if (dstart < 0) dstart += len;
if (dstart < 0 || isnan(dstart))
  start = 0;
else if (dstart > len)
  start = len;
else
  start = (int)dstart;

if (dend < 0) dend += len;
if (dend > len || isnan(dend))
  end = len;
else if (dend < start)
  end = start;
else if ((int)dend == dend)
  end = (int)dend;
else
  end = (int)dend + 1;

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if (dstart < 0 || isnan(dstart))
  start = 0;

won't do if we want NaN indices to yield null as in 1.6.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But for slicing,

 $ jq -n '[1,2,3] | .[nan:1]'
[
  1
]

Copy link
Contributor

@itchyny itchyny Aug 24, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This behavior might depend on an undefined behavior though. My comment above deals nan as null i.e. .[nan:N] == .[:N] and .[N:nan] == .[N:].

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see jq 1.6 trip an assertion when the start index of an array slice is a NaN:

: ; jq-1.6 -cn '[1,2,3] | .[nan:1]'
jq-1.6: src/jv_aux.c:47: parse_slice: Assertion `0 <= start && start <= end && end <= len' failed.
Abort

But not for the end index:

: ; jq-1.6 -cn '[1,2,3] | .[1:nan]'
[]

So slicing with NaNs was UB in 1.6.

src/jv_aux.c Outdated
static int parse_slice(jv j, jv slice, int* pstart, int* pend) {
enum slice_for { FOR_READ, FOR_WRITE };

static jv parse_slice(jv j, jv slice, int* pstart, int* pend, enum slice_for slice_for) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like slice_for is unused.

emanuele6
emanuele6 previously approved these changes Aug 24, 2023
src/jv_aux.c Show resolved Hide resolved
@emanuele6 emanuele6 dismissed their stale review August 24, 2023 12:01

as itchyny said, lower bound checking before casting is missing

@nicowilliams
Copy link
Contributor Author

as itchyny said, lower bound checking before casting is missing

How about now?


if (isnan(dend)) dend = len;
if (dend < 0) dend += len;
if (dend < 0) dend = start;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The operator looks misaligned.

@emanuele6 emanuele6 merged commit 0e067ef into jqlang:master Aug 27, 2023
28 checks passed
@emanuele6
Copy link
Member

Thank you!

@nicowilliams
Copy link
Contributor Author

As this was the last issue/PR open with 1.7 as its milestone, I've force-pushed a new jq-1.7rc2 tag, and that should make a new release.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants