Skip to content

Commit

Permalink
GH-40428: [Python][CI] Fix dataset partition filter tests with pandas…
Browse files Browse the repository at this point in the history
… nightly (#40429)

### Rationale for this change

From debugging the failure, it seems this is due to pandas changing a filter operation to sometimes preserve a RangeIndex now instead of returning an Integer64Index. And the conversion to Arrow changes based on that (RangeIndex is metadata only by default, integer index becomes a column)

Therefore making the tests more robust to ensure there is always at least one non-partition column in the DataFrame, so it doesn't depend on the index whether the result is empty or not.

* GitHub Issue: #40428

Authored-by: Joris Van den Bossche <[email protected]>
Signed-off-by: Joris Van den Bossche <[email protected]>
  • Loading branch information
jorisvandenbossche authored Mar 13, 2024
1 parent 9381647 commit 788200a
Showing 1 changed file with 6 additions and 6 deletions.
12 changes: 6 additions & 6 deletions python/pyarrow/tests/parquet/test_dataset.py
Original file line number Diff line number Diff line change
Expand Up @@ -107,9 +107,9 @@ def test_filters_equivalency(tempdir):
df = pd.DataFrame({
'integer': np.array(integer_keys, dtype='i4').repeat(15),
'string': np.tile(np.tile(np.array(string_keys, dtype=object), 5), 2),
'boolean': np.tile(np.tile(np.array(boolean_keys, dtype='bool'), 5),
3),
}, columns=['integer', 'string', 'boolean'])
'boolean': np.tile(np.tile(np.array(boolean_keys, dtype='bool'), 5), 3),
'values': np.arange(30),
})

_generate_partition_directories(local, base_path, partition_spec, df)

Expand Down Expand Up @@ -312,9 +312,9 @@ def test_filters_inclusive_set(tempdir):
df = pd.DataFrame({
'integer': np.array(integer_keys, dtype='i4').repeat(15),
'string': np.tile(np.tile(np.array(string_keys, dtype=object), 5), 2),
'boolean': np.tile(np.tile(np.array(boolean_keys, dtype='bool'), 5),
3),
}, columns=['integer', 'string', 'boolean'])
'boolean': np.tile(np.tile(np.array(boolean_keys, dtype='bool'), 5), 3),
'values': np.arange(30),
})

_generate_partition_directories(local, base_path, partition_spec, df)

Expand Down

0 comments on commit 788200a

Please sign in to comment.