Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

After Join generated extra ORDER BY section with phantom field #4633

Open
2 tasks done
annashmatko opened this issue Jun 18, 2024 · 3 comments
Open
2 tasks done

After Join generated extra ORDER BY section with phantom field #4633

annashmatko opened this issue Jun 18, 2024 · 3 comments
Assignees
Labels
bug Invalid compiler output or panic

Comments

@annashmatko
Copy link

What happened?

In non generic targets the extra ORDER BY section is added in the end of the output.
In my input I don't specify sorting and I don't need it.

And there is a tracks.name field from the tracks table in this section, which is not presented in FROM section.
It produces the error:
image

Reproduced in the website playground.

PRQL input

prql target:sql.postgres

from tracks
group media_type_id(
  sort name
  take 1
)
join media_types (== media_type_id)
select {
  tracks.track_id,
  media_types.name
}

SQL output

WITH table_0 AS (
  SELECT
    DISTINCT ON (media_type_id) track_id,
    media_type_id,
    name
  FROM
    tracks
  ORDER BY
    media_type_id,
    name
)
SELECT
  table_0.track_id,
  media_types.name
FROM
  table_0
  JOIN media_types ON table_0.media_type_id = media_types.media_type_id
ORDER BY
  table_0.media_type_id,
  tracks.name

Expected SQL output

WITH table_0 AS (
  SELECT
    DISTINCT ON (media_type_id) track_id,
    media_type_id,
    name
  FROM
    tracks
  ORDER BY
    media_type_id,
    name
)
SELECT
  table_0.track_id,
  media_types.name
FROM
  table_0
  JOIN media_types ON table_0.media_type_id = media_types.media_type_id

MVCE confirmation

  • Minimal example
  • New issue

Anything else?

No response

@annashmatko annashmatko added the bug Invalid compiler output or panic label Jun 18, 2024
@max-sixty
Copy link
Member

Yes, it seems to retain the sorting even though it's not needed in the DISTINCT ON case... Thanks for the report.

@snth
Copy link
Member

snth commented Jul 10, 2024

I did some playing around with this and it seems to be caused by the combination of:

  1. postgres or duckdb dialects
  2. take 1 which specialises to DISTINCT ON for those dialects.

If you remove the dialect or change it to take 2 then it switches to the ROW_NUMBER() window function based approach which is more general and doesn't have that ORDER BY issue.

Is there any benefit to keep that DISTINCT ON specialisation or could we just use the more general algorithm?

Btw, in the general case, the intermediate variable _expr_0 leaks into the result set, e.g.

prql target:sql.postgres

from tracks
group media_type_id(
  sort name
  take 2
)
join media_types (== media_type_id)

Extra columns can often just be ignored but it might cause an issue for some people.

@annashmatko
Copy link
Author

I use a ClickHouse dialect in my job.

In a clickhouse db I have a table where each row is a new state of some object.
And a use case actually is to receive the last state.

So I group by object_id, order by update_date_time desc and take 1.
This method gives me the last row in each group.
I cannot use take 2 in this example.

Extra columns can often just be ignored but it might cause an issue for some people.

For instance, if I later UNION tables I get 'mismatch column error' as one of a tabe has extra hidden columns.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Invalid compiler output or panic
Projects
None yet
Development

No branches or pull requests

4 participants