IN (...)
clauses appear to be ignored in merge commands with S3 - extra partitions scanned
#2726
Labels
bug
Something isn't working
Environment
Delta-rs version: 0.18.2
Binding: Python
Environment:
Bug
What happened:
It appears that when performing a
merge
operation, specifyingpartition_column IN ('value_1')
as the initial predicate, whether it's a single value, or multiple values - theIN
clause is ignored as if it doesn't exist, and no errors are raised.A separate, but potentially related issue, is that when performing a
merge
operation, when specifyingpartition_column = 'value_1'
, sometimes I see additional partitions being queried from S3. The exact additional partition retrieved is non-deterministic, but there's always an extra one in the example setup I have. I set up the example to debug the IN clause behaviour described above, and spotted this along the way. Performing the same operation a second time queries only the exact partitions specified by the clauses in the predicate.What you expected to happen:
How to reproduce it:
MRE can be found here: https://github.com/MuneebBaderoen/delta-rs-in-predicate-mre
More details:
Slack thread: https://delta-users.slack.com/archives/C013LCAEB98/p1722382232123799
The impact of this behaviour is that attempting to upsert data across two partitions (for example at the boundary of days, or the boundary of months) dramatically increases the volume of data downloaded from S3.
On the boundary between days, the implementation I have would upsert data for
partition_day IN ('05', '06')
- but this clause would be ignored, resulting in all data for all partitions in the month being downloaded. This is visible in the localstack logs in the MRE provided.On the boundary between months, the implementation I have would attempt to upsert data for
partition_month IN ('07', '08') partition_day IN ('31', '01')
- but bothIN
clauses would be ignored, resulting in all data for all partitions in the year being downloaded. This is visible in the localstack logs in the MRE provided.The text was updated successfully, but these errors were encountered: