Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix failure when matching empty dictionary #10873

Merged
merged 1 commit into from
Aug 19, 2022

Conversation

martint
Copy link
Member

@martint martint commented Feb 1, 2022

Fixes #9424

Release note:

# Iceberg, Hive, Delta
* Fix failure when reading Parquet data that contains only null values. ({issue}`9424`)

@findepi
Copy link
Member

findepi commented Mar 25, 2022

The CI didn't run here.

@guerremdq
Copy link
Member

Any plan to merge this soon?

@xtto
Copy link

xtto commented May 12, 2022

Hi, with that fix applied Trino returns us 0 rows, when perform IS NULL prediction on the column which is always NULL within the whole partition.

SELECT *
FROM test_tbl
WHERE a_partition = '01'
AND always_null_column IS NULL

 always_null_column | a_partition
--------------------+-------------
(0 rows)

SELECT *
FROM test_tbl
WHERE a_partition = '01'
AND always_null_column IS NOT NULL

 always_null_column | a_partition
--------------------+-------------
(0 rows)

SELECT *
FROM test_tbl
WHERE a_partition = '01'
-- AND always_null_column IS NULL

 always_null_column | a_partition
--------------------+-------------
 NULL               | 01
 NULL               | 01
 NULL               | 01
 NULL               | 01
 NULL               | 01

@martint martint force-pushed the parquet-dictionary-predicate branch 3 times, most recently from b27528c to 4e5df34 Compare July 15, 2022 19:04
@martint
Copy link
Member Author

martint commented Jul 19, 2022

@kantonczak, can you try with the latest updates to this PR?

@Gingernaut
Copy link

Gingernaut commented Aug 17, 2022

+1 on this issue and fix, this issue is preventing me from migrating from Presto to Trino.

@rushton
Copy link
Contributor

rushton commented Aug 17, 2022

@martint the issue can be reproduced by writing a parquet file with pyarrow:

import pyarrow
from pyarrow import parquet

with open("foo.parquet", "wb") as fp:
    parquet.write_table(pyarrow.Table.from_arrays([pyarrow.array([None,None,None,None])], names=["x"]), fp)

I will send the parquet file in trino slack.

Repro query:

CREATE TABLE test_null_parquet (x VARCHAR) WITH (format='PARQUET', external_location='s3a://.../foo.parquet');

SELECT * FROM test_null_parquet WHERE x IS NULL;

Query 20220817_185451_00007_yq9np, FAILED, 1 node
Splits: 5 total, 0 done (0.00%)
1.76 [0 rows, 0B] [0 rows/s, 0B/s]

Query 20220817_185451_00007_yq9np failed: Error opening Hive split s3a://.../nick/foo/myparquet.parquet (offset=0, length=408): cannot use empty rangeList

@martint martint force-pushed the parquet-dictionary-predicate branch 2 times, most recently from 77a4294 to d4f36e2 Compare August 17, 2022 21:04
@martint martint force-pushed the parquet-dictionary-predicate branch 2 times, most recently from c10a162 to 6a7594c Compare August 18, 2022 19:30
Copy link
Member

@raunaqmorarka raunaqmorarka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm, minor comments

@martint martint force-pushed the parquet-dictionary-predicate branch from 6a7594c to 9122fe8 Compare August 19, 2022 16:26
@martint martint merged commit 8494caa into trinodb:master Aug 19, 2022
@github-actions github-actions bot added this to the 394 milestone Aug 19, 2022
@martint martint deleted the parquet-dictionary-predicate branch August 22, 2022 17:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

Successfully merging this pull request may close these issues.

'cannot use empty rangeList' error after upgrade from 359 to 362
8 participants