Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ScalarValue::to_array_of_size panics computing statistics for nested parquet file #2653

Closed
tustvold opened this issue May 30, 2022 · 2 comments
Labels
bug Something isn't working

Comments

@tustvold
Copy link
Contributor

Describe the bug

let ctx = SessionContext::new();

let mut options = ParquetReadOptions::default()
    .parquet_pruning(true)
    .to_listing_options(2);

// Disable stats collection
options.collect_stat = true;

ctx.register_listing_table("patient", "/home/raphael/Downloads/part-00000-f6337bce-7fcd-4021-9f9d-040413ea83f8-c000.snappy.parquet", options, None).await.unwrap();

let df = ctx.sql("SELECT patient.meta FROM patient LIMIT 10").await.unwrap();
df.show().await.unwrap();

Where part-00000-f6337bce-7fcd-4021-9f9d-040413ea83f8-c000.snappy.parquet is the parquet file provided by @kesavkolla in #2439

Panics with

called `Result::unwrap()` on an `Err` value: ArrowError(ComputeError("concat requires input of at least one array"))
thread 'physical_plan::file_format::parquet::tests::temp' panicked at 'called `Result::unwrap()` on an `Err` value: ArrowError(ComputeError("concat requires input of at least one array"))', datafusion/common/src/scalar.rs:1206:18
stack backtrace:
   0: rust_begin_unwind
             at /rustc/fe5b13d681f25ee6474be29d748c65adcd91f69e/library/std/src/panicking.rs:584:5
   1: core::panicking::panic_fmt
             at /rustc/fe5b13d681f25ee6474be29d748c65adcd91f69e/library/core/src/panicking.rs:143:14
   2: core::result::unwrap_failed
             at /rustc/fe5b13d681f25ee6474be29d748c65adcd91f69e/library/core/src/result.rs:1785:5
   3: core::result::Result<T,E>::unwrap
             at /rustc/fe5b13d681f25ee6474be29d748c65adcd91f69e/library/core/src/result.rs:1078:23
   4: datafusion_common::scalar::ScalarValue::to_array_of_size
             at /home/raphael/repos/external/arrow-datafusion/datafusion/common/src/scalar.rs:1198:22
   5: datafusion_common::scalar::ScalarValue::to_array_of_size::{{closure}}
             at /home/raphael/repos/external/arrow-datafusion/datafusion/common/src/scalar.rs:1253:45
   6: core::iter::adapters::map::map_fold::{{closure}}
             at /rustc/fe5b13d681f25ee6474be29d748c65adcd91f69e/library/core/src/iter/adapters/map.rs:84:28
   7: core::iter::traits::iterator::Iterator::fold
             at /rustc/fe5b13d681f25ee6474be29d748c65adcd91f69e/library/core/src/iter/traits/iterator.rs:2362:21
   8: <core::iter::adapters::map::Map<I,F> as core::iter::traits::iterator::Iterator>::fold
             at /rustc/fe5b13d681f25ee6474be29d748c65adcd91f69e/library/core/src/iter/adapters/map.rs:124:9
   9: core::iter::traits::iterator::Iterator::for_each
             at /rustc/fe5b13d681f25ee6474be29d748c65adcd91f69e/library/core/src/iter/traits/iterator.rs:779:9
  10: <alloc::vec::Vec<T,A> as alloc::vec::spec_extend::SpecExtend<T,I>>::spec_extend
             at /rustc/fe5b13d681f25ee6474be29d748c65adcd91f69e/library/alloc/src/vec/spec_extend.rs:40:17
  11: <alloc::vec::Vec<T> as alloc::vec::spec_from_iter_nested::SpecFromIterNested<T,I>>::from_iter
             at /rustc/fe5b13d681f25ee6474be29d748c65adcd91f69e/library/alloc/src/vec/spec_from_iter_nested.rs:62:9
  12: <alloc::vec::Vec<T> as alloc::vec::spec_from_iter::SpecFromIter<T,I>>::from_iter
             at /rustc/fe5b13d681f25ee6474be29d748c65adcd91f69e/library/alloc/src/vec/spec_from_iter.rs:33:9
  13: <alloc::vec::Vec<T> as core::iter::traits::collect::FromIterator<T>>::from_iter
             at /rustc/fe5b13d681f25ee6474be29d748c65adcd91f69e/library/alloc/src/vec/mod.rs:2554:9
  14: core::iter::traits::iterator::Iterator::collect
             at /rustc/fe5b13d681f25ee6474be29d748c65adcd91f69e/library/core/src/iter/traits/iterator.rs:1784:9
  15: datafusion_common::scalar::ScalarValue::to_array_of_size
             at /home/raphael/repos/external/arrow-datafusion/datafusion/common/src/scalar.rs:1248:48
  16: datafusion_common::scalar::ScalarValue::to_array
             at /home/raphael/repos/external/arrow-datafusion/datafusion/common/src/scalar.rs:658:9
  17: datafusion::datasource::get_statistics_with_limit::{{closure}}
             at ./src/datasource/mod.rs:75:56
  18: <core::future::from_generator::GenFuture<T> as core::future::future::Future>::poll
             at /rustc/fe5b13d681f25ee6474be29d748c65adcd91f69e/library/core/src/future/mod.rs:91:19
  19: datafusion::datasource::listing::table::ListingTable::list_files_for_scan::{{closure}}
             at ./src/datasource/listing/table.rs:394:67
  20: <core::future::from_generator::GenFuture<T> as core::future::future::Future>::poll
             at /rustc/fe5b13d681f25ee6474be29d748c65adcd91f69e/library/core/src/future/mod.rs:91:19
  21: <datafusion::datasource::listing::table::ListingTable as datafusion::datasource::datasource::TableProvider>::scan::{{closure}}
             at ./src/datasource/listing/table.rs:310:53
  22: <core::future::from_generator::GenFuture<T> as core::future::future::Future>::poll
             at /rustc/fe5b13d681f25ee6474be29d748c65adcd91f69e/library/core/src/future/mod.rs:91:19
  23: <core::pin::Pin<P> as core::future::future::Future>::poll
             at /rustc/fe5b13d681f25ee6474be29d748c65adcd91f69e/library/core/src/future/future.rs:124:9
  24: datafusion::physical_plan::planner::DefaultPhysicalPlanner::create_initial_plan::{{closure}}
             at ./src/physical_plan/planner.rs:392:64
  25: <core::future::from_generator::GenFuture<T> as core::future::future::Future>::poll
             at /rustc/fe5b13d681f25ee6474be29d748c65adcd91f69e/library/core/src/future/mod.rs:91:19
  26: <core::pin::Pin<P> as core::future::future::Future>::poll
             at /rustc/fe5b13d681f25ee6474be29d748c65adcd91f69e/library/core/src/future/future.rs:124:9
  27: datafusion::physical_plan::planner::DefaultPhysicalPlanner::create_initial_plan::{{closure}}
             at ./src/physical_plan/planner.rs:623:84
  28: <core::future::from_generator::GenFuture<T> as core::future::future::Future>::poll
             at /rustc/fe5b13d681f25ee6474be29d748c65adcd91f69e/library/core/src/future/mod.rs:91:19
  29: <core::pin::Pin<P> as core::future::future::Future>::poll
             at /rustc/fe5b13d681f25ee6474be29d748c65adcd91f69e/library/core/src/future/future.rs:124:9
  30: datafusion::physical_plan::planner::DefaultPhysicalPlanner::create_initial_plan::{{closure}}

Setting options.collect_stat = false eliminates the panic

Expected behavior

The above should not panic

Additional context

Follow on for #2453 which is fixed by #2631

@tustvold tustvold added the bug Something isn't working label May 30, 2022
@HuSen8891
Copy link
Contributor

I think the merge request #2671 already fix this.

@tustvold
Copy link
Contributor Author

tustvold commented Jun 3, 2022

Huzzah, can confirm 🎉

@tustvold tustvold closed this as completed Jun 3, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants