After Join generated extra ORDER BY section with phantom field #4633

annashmatko · 2024-06-18T14:52:16Z

What happened?

In non generic targets the extra ORDER BY section is added in the end of the output.
In my input I don't specify sorting and I don't need it.

And there is a tracks.name field from the tracks table in this section, which is not presented in FROM section.
It produces the error:

Reproduced in the website playground.

PRQL input

prql target:sql.postgres

from tracks
group media_type_id(
  sort name
  take 1
)
join media_types (== media_type_id)
select {
  tracks.track_id,
  media_types.name
}

SQL output

WITH table_0 AS (
  SELECT
    DISTINCT ON (media_type_id) track_id,
    media_type_id,
    name
  FROM
    tracks
  ORDER BY
    media_type_id,
    name
)
SELECT
  table_0.track_id,
  media_types.name
FROM
  table_0
  JOIN media_types ON table_0.media_type_id = media_types.media_type_id
ORDER BY
  table_0.media_type_id,
  tracks.name

Expected SQL output

WITH table_0 AS (
  SELECT
    DISTINCT ON (media_type_id) track_id,
    media_type_id,
    name
  FROM
    tracks
  ORDER BY
    media_type_id,
    name
)
SELECT
  table_0.track_id,
  media_types.name
FROM
  table_0
  JOIN media_types ON table_0.media_type_id = media_types.media_type_id

MVCE confirmation

Minimal example
New issue

Anything else?

No response

max-sixty · 2024-06-21T16:50:48Z

Yes, it seems to retain the sorting even though it's not needed in the DISTINCT ON case... Thanks for the report.

snth · 2024-07-10T12:43:17Z

I did some playing around with this and it seems to be caused by the combination of:

postgres or duckdb dialects
take 1 which specialises to DISTINCT ON for those dialects.

If you remove the dialect or change it to take 2 then it switches to the ROW_NUMBER() window function based approach which is more general and doesn't have that ORDER BY issue.

Is there any benefit to keep that DISTINCT ON specialisation or could we just use the more general algorithm?

Btw, in the general case, the intermediate variable _expr_0 leaks into the result set, e.g.

prql target:sql.postgres

from tracks
group media_type_id(
  sort name
  take 2
)
join media_types (== media_type_id)

Extra columns can often just be ignored but it might cause an issue for some people.

annashmatko · 2024-07-10T13:15:53Z

I use a ClickHouse dialect in my job.

In a clickhouse db I have a table where each row is a new state of some object.
And a use case actually is to receive the last state.

So I group by object_id, order by update_date_time desc and take 1.
This method gives me the last row in each group.
I cannot use take 2 in this example.

Extra columns can often just be ignored but it might cause an issue for some people.

For instance, if I later UNION tables I get 'mismatch column error' as one of a tabe has extra hidden columns.

annashmatko added the bug Invalid compiler output or panic label Jun 18, 2024

snth mentioned this issue Jul 11, 2024

"Leaking" of _expr_* columns into result relations #4719

Open

2 tasks

aljazerzen self-assigned this Jul 11, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

After Join generated extra ORDER BY section with phantom field #4633

After Join generated extra ORDER BY section with phantom field #4633

annashmatko commented Jun 18, 2024

max-sixty commented Jun 21, 2024

snth commented Jul 10, 2024

annashmatko commented Jul 10, 2024

After Join generated extra ORDER BY section with phantom field #4633

After Join generated extra ORDER BY section with phantom field #4633

Comments

annashmatko commented Jun 18, 2024

What happened?

PRQL input

SQL output

Expected SQL output

MVCE confirmation

Anything else?

max-sixty commented Jun 21, 2024

snth commented Jul 10, 2024

annashmatko commented Jul 10, 2024