-
Notifications
You must be signed in to change notification settings - Fork 603
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: Add option to disable FILTER when using distinct
#10567
Comments
Thank you for reporting this and for providing the code to reproduce. We will need to look deeper at how we can achieve this. The query this produces behind the scenes looks like it is filtering out nulls, as you describe. SQLSELECT
"t1"."person_id",
FIRST("t1"."contacted_at") FILTER(WHERE
"t1"."contacted_at" IS NOT NULL) AS "contacted_at",
FIRST("t1"."updated_at") FILTER(WHERE
"t1"."updated_at" IS NOT NULL) AS "updated_at"
FROM (
SELECT
*
FROM "ibis_pandas_memtable_yqklspibhbeqzabymmjjnn35bi" AS "t0"
ORDER BY
"t0"."updated_at" DESC
) AS "t1"
GROUP BY
1 In the meantime, would something like this work for you? t1.filter(
ibis.row_number().over(group_by=_.person_id, order_by=_.updated_at.desc()) == 0
) |
There's a secondary problem here which is your assumption that a subquery with an This is a common pitfall with SQL. The solution is that you'll have to manually construct the distinct query while specifying an
|
Thanks for the solution and the clarification! That is exactly what I needed. I was trying something similar before, but couldn't find documentation on I have read quite some ibis docs over the months and have learned a lot, but I don't fully understand this area yet. Maybe I should change this feature request then. What I would find beneficial is in essence a full equivalent of pandas df.drop_duplicates(). The docs of It'd be nice if t.distinct(on="person_id", order_by=[_.updated_at.desc()] ,filter_nulls=False) |
Ah sorry I found the correct docs: https://ibis-project.org/reference/expression-generic#ibis.expr.types.generic.Column.first I can't find those docs by searching for "first" though, maybe because it has the exact same name as the other method called first? |
Is your feature request related to a problem?
.distinct(on=[...])
applies a filter on NULLs by default.Output:
What is the motivation behind your request?
I have a table with "updates", I want to get the exact first row when ordering by the "updated_at" column. I don't want values from other rows in the result.
Describe the solution you'd like
.distinct(on=[...], filter_nulls=False)
What version of ibis are you running?
9.5.0
What backend(s) are you using, if any?
DuckDB
Code of Conduct
The text was updated successfully, but these errors were encountered: