Skip to content
This repository has been archived by the owner on Feb 18, 2024. It is now read-only.

parquet_read panics with index_out_of_bounds #351

Closed
vincev opened this issue Aug 27, 2021 · 1 comment · Fixed by #352
Closed

parquet_read panics with index_out_of_bounds #351

vincev opened this issue Aug 27, 2021 · 1 comment · Fixed by #352
Assignees
Labels
bug Something isn't working

Comments

@vincev
Copy link

vincev commented Aug 27, 2021

The parquet_read example is unable to read a parquet file generated with Pandas.

To reproduce the problem I run the following script:

import pandas as pd
import numpy as np
import pyarrow as pa

print(f"Pandas version:  {pd.__version__}")
print(f"Numpy version:   {np.__version__}")
print(f"Pyarrow version: {pa.__version__}")

df = pd.DataFrame({'id': np.arange(0, 1000, dtype=np.int64)})
df.to_parquet('test.parquet', index=False, version='2.0')
print(f"Wrote {len(df)} rows")

This generates the following output on my box:

> python gen.py
Pandas version:  1.3.2
Numpy version:   1.21.2
Pyarrow version: 5.0.0
Wrote 1000 rows

Then I run parquet_read:

> RUST_BACKTRACE=1 cargo run --release --example parquet_read test.parquet 0 0
    Finished release [optimized] target(s) in 0.12s
     Running `target/release/examples/parquet_read test.parquet 0 0`
thread 'main' panicked at 'index out of bounds: the len is 1000 but the index is 1000', /home/vincev/arrow2/src/io/parquet/read/primitive/basic.rs:67:40
stack backtrace:
   0: rust_begin_unwind
             at /rustc/a178d0322ce20e33eac124758e837cbd80a6f633/library/std/src/panicking.rs:515:5
   1: core::panicking::panic_fmt
             at /rustc/a178d0322ce20e33eac124758e837cbd80a6f633/library/core/src/panicking.rs:92:14
   2: core::panicking::panic_bounds_check
             at /rustc/a178d0322ce20e33eac124758e837cbd80a6f633/library/core/src/panicking.rs:69:5
   3: arrow2::io::parquet::read::primitive::basic::extend_from_page
   4: arrow2::io::parquet::read::primitive::iter_to_array
   5: arrow2::io::parquet::read::page_iter_to_array
   6: parquet_read::main

If I change the generator to produce 500 rows then the read works well.

I am using latest master version:

> git rev-parse HEAD
77650672233bd7bbb9839a2a616f11ebffa15807
@jorgecarleitao
Copy link
Owner

Thanks! Looking into it.

@jorgecarleitao jorgecarleitao transferred this issue from jorgecarleitao/parquet2 Aug 27, 2021
@jorgecarleitao jorgecarleitao added the bug Something isn't working label Aug 27, 2021
@jorgecarleitao jorgecarleitao self-assigned this Aug 27, 2021
@jorgecarleitao jorgecarleitao changed the title parquet_read panics with index_out_of_bounds parquet_read panics with index_out_of_bounds Sep 7, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants