Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(api): add nulls_first=False argument to order_by #9385

Merged
merged 34 commits into from
Jun 26, 2024

Conversation

ncclementi
Copy link
Contributor

@ncclementi ncclementi commented Jun 14, 2024

This is work in progress

TODO

  • Sort out (pun intended) how to deal with backends that don't expose option for ordering nulls.
    - [x] mssql
    - [x] mysql
    EDIT: They get handled by sqlglot and follow the expected behavior

  • Add test in test_api.py to check that this t.order_by(ibis.desc("b", nulls_first=True)) is equal than t.order_by(t["b"].desc(nulls_first=True))

  • fix window tests errors (Thanks @gforsyth for your the "diving into deep into the weeds" session)

  • Decide how and deprecate

    @classmethod
    def __coerce__(cls, value, T=None, S=None):
    if isinstance(value, tuple):
    key, asc = value
    else:
    key, asc = value, True
    asc = _is_ascending[asc]
    key = super().__coerce__(key, T=T, S=S)
    if isinstance(key, cls):
    return key
    else:
    return cls(key, asc)

Options are:

  • deprecate passing a tuple completely
  • support 2-tuples and 3-tuples (no deprecation)
  • don't support 3-tuples, leave 2-tuples for legacy reasons.

NOTE: I can't find any tests using this, not clear to me what do we want to do here.
EDIT: see #9413

  • Figure out why the doctests and plenty others are failing,I think it's because if we don't say desc() or asc() is taking the nulls_first default value set in SortKeys which is True (I'm not sure yet, need to investigate) EDIT: doctest and other tests using order_by that didn't trigger desc or asc rely on defaul nulls_first in SortKeys. Before this value wasn't provided and SQLGlot handel this by triggering nulls_first=False. So to be consistent we set it to False in the class default.

I know why they are failing, it's unclear what path we want to take here. Need to discuss

https://github.com/ncclementi/ibis/blob/78f26dada233a3da3a5e145fc32d590e058f1a4e/ibis/expr/operations/sortkeys.py#L28-L37

  • Discuss/fix: Now we have a lot of snapshot tests failing, because of the fix in the window_merge_frames rewrite, where we are now passing nulls_first see

    ibis/ibis/expr/rewrites.py

    Lines 310 to 314 in a17ff2e

    order_keys = {}
    for sort_key in window.orderings + _.order_by:
    order_keys[sort_key.expr] = sort_key.ascending, sort_key.nulls_first
    order_by = (ops.SortKey(k, *v) for k, v in order_keys.items())

  • Add tests that orders by more than one column, has nulls, and with different value of nulls_first.

@ncclementi ncclementi force-pushed the nulls-sort-order branch 2 times, most recently from a17ff2e to 0b216b9 Compare June 20, 2024 15:30
@cpcloud
Copy link
Member

cpcloud commented Jun 20, 2024

Fixing up the snapshots now.

@ncclementi ncclementi marked this pull request as ready for review June 20, 2024 20:08
@ncclementi
Copy link
Contributor Author

Need to add tests that check for order by more than one column, has nulls, and with different value of nulls_first.

Copy link
Member

@cpcloud cpcloud left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work!

We can add the tests for multiple nulls_first specifications here or in a follow-up, up to you.

@cpcloud cpcloud added this to the 9.2 milestone Jun 24, 2024
@cpcloud cpcloud changed the title refactor: standarize nulls last always when sorting feat(api): add nulls_first=False argument to order_by Jun 24, 2024
@cpcloud cpcloud added feature Features or general enhancements ux User experience related issues internals Issues or PRs related to ibis's internal APIs labels Jun 24, 2024
@cpcloud cpcloud force-pushed the nulls-sort-order branch from dd1be07 to e34476b Compare June 24, 2024 12:23
@cpcloud cpcloud added the ci-run-cloud Add this label to trigger a run of BigQuery, Snowflake, and Databricks backends in CI label Jun 24, 2024
@ibis-docs-bot ibis-docs-bot bot removed the ci-run-cloud Add this label to trigger a run of BigQuery, Snowflake, and Databricks backends in CI label Jun 24, 2024
result = con.execute(expr).reset_index(drop=True)
expected = pd.DataFrame(expected)

tm.assert_frame_equal(result.replace({np.nan: None}), expected)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I do this locally .replace({np.nan: None}) I get this

>           tm.assert_frame_equal(result.replace({np.nan: None}), expected)
E           AssertionError: Attributes of DataFrame.iloc[:, 0] (column name="col1") are different
E           
E           Attribute "dtype" are different
E           [left]:  object
E           [right]: float64

only in this test. I don't see it in the other ones.

Copy link
Member

@cpcloud cpcloud Jun 25, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's the expected that needs the astype, and perhaps the result (not sure). I'll push up a fix.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, what's not clear to me why in the other tests we needed it. Locally for me, they all passed without it.

@cpcloud
Copy link
Member

cpcloud commented Jun 26, 2024

After losing a few brain cells trying to understand how to set collation and sorting in MS SQL, we gave up and decided to xfail the multi key sort column null ordering tests for backends that for some reason known only to the original developers that strings should be compared to each other ignoring case by default.

And it should be outrageously difficult to change that behavior.

@cpcloud cpcloud added the ci-run-cloud Add this label to trigger a run of BigQuery, Snowflake, and Databricks backends in CI label Jun 26, 2024
@ibis-docs-bot ibis-docs-bot bot removed the ci-run-cloud Add this label to trigger a run of BigQuery, Snowflake, and Databricks backends in CI label Jun 26, 2024
@cpcloud cpcloud merged commit ce9011e into ibis-project:main Jun 26, 2024
82 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature Features or general enhancements internals Issues or PRs related to ibis's internal APIs ux User experience related issues
Projects
None yet
2 participants