Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Overflow in repeat_arrs_from_indices #13237

Closed
demetribu opened this issue Nov 3, 2024 · 6 comments · Fixed by #13441
Closed

Overflow in repeat_arrs_from_indices #13237

demetribu opened this issue Nov 3, 2024 · 6 comments · Fixed by #13441
Labels
bug Something isn't working

Comments

@demetribu
Copy link
Contributor

demetribu commented Nov 3, 2024

Describe the bug

An error occurs when performing an inner join on two large unnest(range(...)) datasets in DataFusion version 42.2.0. The issue persists regardless of join type preference, as setting datafusion.optimizer.prefer_hash_join = false does not affect the outcome. Notably, this query executes successfully on version 41.0 without issues.

To Reproduce

./datafusion-cli/target/debug/datafusion-cli -m 512m
DataFusion CLI v42.2.0
set datafusion.optimizer.prefer_hash_join = false;
0 row(s) fetched. 
Elapsed 0.003 seconds.

set datafusion.execution.sort_spill_reservation_bytes = 104857;
0 row(s) fetched. 
Elapsed 0.004 seconds.

set datafusion.execution.sort_in_place_threshold_bytes = 104857;
0 row(s) fetched. 
Elapsed 0.002 seconds.

select * from (select unnest(range(0, 100000)) id) t inner join (select unnest(range(0, 100000)) id) t1 on t.id = t1.id;


thread 'tokio-runtime-worker' panicked at 

.cargo/registry/src/index.crates.io-6f17d22bba15001f/arrow-select-53.2.0/src/take.rs:773:13:
attempt to add with overflow
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
External error: External error: Join Error
caused by
External error: task 38 panicked with message "attempt to add with overflow"

Full Log Output (click to expand)
thread 'sql::joins::test_smj_spill' panicked at .cargo/registry/src/index.crates.io-6f17d22bba15001f/arrow-select-53.2.0/src/take.rs:773:13:
attempt to add with overflow
stack backtrace:
   0: rust_begin_unwind
             at /rustc/f6e511eec7342f59a25f7c0534f1dbea00d01b14/library/std/src/panicking.rs:662:5
   1: core::panicking::panic_fmt
             at /rustc/f6e511eec7342f59a25f7c0534f1dbea00d01b14/library/core/src/panicking.rs:74:14
   2: core::panicking::panic_const::panic_const_add_overflow
             at /rustc/f6e511eec7342f59a25f7c0534f1dbea00d01b14/library/core/src/panicking.rs:181:21
   3: <i32 as core::ops::arith::AddAssign>::add_assign
             at /rustc/f6e511eec7342f59a25f7c0534f1dbea00d01b14/library/core/src/ops/arith.rs:759:51
   4: arrow_select::take::take_value_indices_from_list
             at .cargo/registry/src/index.crates.io-6f17d22bba15001f/arrow-select-53.2.0/src/take.rs:773:13
   5: arrow_select::take::take_list
             at .cargo/registry/src/index.crates.io-6f17d22bba15001f/arrow-select-53.2.0/src/take.rs:570:9
   6: arrow_select::take::take_impl
             at .cargo/registry/src/index.crates.io-6f17d22bba15001f/arrow-select-53.2.0/src/take.rs:209:25
   7: arrow_select::take::take
             at .cargo/registry/src/index.crates.io-6f17d22bba15001f/arrow-select-53.2.0/src/take.rs:91:5
   8: datafusion_physical_plan::unnest::repeat_arrs_from_indices::{{closure}}
             at datafusion/datafusion/physical-plan/src/unnest.rs:901:23
   9: core::iter::adapters::map::map_try_fold::{{closure}}
             at /rustc/f6e511eec7342f59a25f7c0534f1dbea00d01b14/library/core/src/iter/adapters/map.rs:95:28
  10: core::iter::traits::iterator::Iterator::try_fold
             at /rustc/f6e511eec7342f59a25f7c0534f1dbea00d01b14/library/core/src/iter/traits/iterator.rs:2405:21
  11: <core::iter::adapters::map::Map<I,F> as core::iter::traits::iterator::Iterator>::try_fold
             at /rustc/f6e511eec7342f59a25f7c0534f1dbea00d01b14/library/core/src/iter/adapters/map.rs:121:9
  12: <core::iter::adapters::GenericShunt<I,R> as core::iter::traits::iterator::Iterator>::try_fold
             at /rustc/f6e511eec7342f59a25f7c0534f1dbea00d01b14/library/core/src/iter/adapters/mod.rs:191:9
  13: core::iter::traits::iterator::Iterator::try_for_each
             at /rustc/f6e511eec7342f59a25f7c0534f1dbea00d01b14/library/core/src/iter/traits/iterator.rs:2467:9
  14: <core::iter::adapters::GenericShunt<I,R> as core::iter::traits::iterator::Iterator>::next
             at /rustc/f6e511eec7342f59a25f7c0534f1dbea00d01b14/library/core/src/iter/adapters/mod.rs:174:14
  15: <alloc::vec::Vec<T> as alloc::vec::spec_from_iter_nested::SpecFromIterNested<T,I>>::from_iter
             at /rustc/f6e511eec7342f59a25f7c0534f1dbea00d01b14/library/alloc/src/vec/spec_from_iter_nested.rs:24:32
  16: <alloc::vec::Vec<T> as alloc::vec::spec_from_iter::SpecFromIter<T,I>>::from_iter
             at /rustc/f6e511eec7342f59a25f7c0534f1dbea00d01b14/library/alloc/src/vec/spec_from_iter.rs:33:9
  17: <alloc::vec::Vec<T> as core::iter::traits::collect::FromIterator<T>>::from_iter
             at /rustc/f6e511eec7342f59a25f7c0534f1dbea00d01b14/library/alloc/src/vec/mod.rs:2985:9
  18: core::iter::traits::iterator::Iterator::collect
             at /rustc/f6e511eec7342f59a25f7c0534f1dbea00d01b14/library/core/src/iter/traits/iterator.rs:2000:9
  19: <core::result::Result<V,E> as core::iter::traits::collect::FromIterator<core::result::Result<A,E>>>::from_iter::{{closure}}
             at /rustc/f6e511eec7342f59a25f7c0534f1dbea00d01b14/library/core/src/result.rs:1958:51
  20: core::iter::adapters::try_process
             at /rustc/f6e511eec7342f59a25f7c0534f1dbea00d01b14/library/core/src/iter/adapters/mod.rs:160:17
  21: <core::result::Result<V,E> as core::iter::traits::collect::FromIterator<core::result::Result<A,E>>>::from_iter
             at /rustc/f6e511eec7342f59a25f7c0534f1dbea00d01b14/library/core/src/result.rs:1958:9
  22: core::iter::traits::iterator::Iterator::collect
             at /rustc/f6e511eec7342f59a25f7c0534f1dbea00d01b14/library/core/src/iter/traits/iterator.rs:2000:9
  23: datafusion_physical_plan::unnest::repeat_arrs_from_indices
             at datafusion/datafusion/physical-plan/src/unnest.rs:899:5
  24: datafusion_physical_plan::unnest::list_unnest_at_level
             at datafusion/datafusion/physical-plan/src/unnest.rs:461:15
  25: datafusion_physical_plan::unnest::build_batch
             at datafusion/datafusion/physical-plan/src/unnest.rs:558:47
  26: datafusion_physical_plan::unnest::UnnestStream::poll_next_impl::{{closure}}
             at datafusion/datafusion/physical-plan/src/unnest.rs:289:34
  27: core::task::poll::Poll<T>::map
             at /rustc/f6e511eec7342f59a25f7c0534f1dbea00d01b14/library/core/src/task/poll.rs:54:43
  28: datafusion_physical_plan::unnest::UnnestStream::poll_next_impl
             at datafusion/datafusion/physical-plan/src/unnest.rs:284:9
  29: <datafusion_physical_plan::unnest::UnnestStream as futures_core::stream::Stream>::poll_next
             at datafusion/datafusion/physical-plan/src/unnest.rs:273:9
  30: <core::pin::Pin<P> as futures_core::stream::Stream>::poll_next
             at .cargo/registry/src/index.crates.io-6f17d22bba15001f/futures-core-0.3.31/src/stream.rs:130:9
  31: futures_util::stream::stream::StreamExt::poll_next_unpin
             at .cargo/registry/src/index.crates.io-6f17d22bba15001f/futures-util-0.3.31/src/stream/stream/mod.rs:1638:9
  32: <datafusion_physical_plan::projection::ProjectionStream as futures_core::stream::Stream>::poll_next
             at datafusion/datafusion/physical-plan/src/projection.rs:334:20
  33: <core::pin::Pin<P> as futures_core::stream::Stream>::poll_next
             at .cargo/registry/src/index.crates.io-6f17d22bba15001f/futures-core-0.3.31/src/stream.rs:130:9
  34: futures_util::stream::stream::StreamExt::poll_next_unpin
             at .cargo/registry/src/index.crates.io-6f17d22bba15001f/futures-util-0.3.31/src/stream/stream/mod.rs:1638:9
  35: <futures_util::stream::stream::next::Next<St> as core::future::future::Future>::poll
             at .cargo/registry/src/index.crates.io-6f17d22bba15001f/futures-util-0.3.31/src/stream/stream/next.rs:32:9
  36: datafusion_physical_plan::repartition::RepartitionExec::pull_from_input::{{closure}}
             at datafusion/datafusion/physical-plan/src/repartition/mod.rs:801:40
  37: tokio::runtime::task::core::Core<T,S>::poll::{{closure}}
             at .cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.41.0/src/runtime/task/core.rs:331:17
  38: tokio::loom::std::unsafe_cell::UnsafeCell<T>::with_mut
             at .cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.41.0/src/loom/std/unsafe_cell.rs:16:9
  39: tokio::runtime::task::core::Core<T,S>::poll
             at .cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.41.0/src/runtime/task/core.rs:320:13
  40: tokio::runtime::task::harness::poll_future::{{closure}}
             at .cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.41.0/src/runtime/task/harness.rs:499:19
  41: <core::panic::unwind_safe::AssertUnwindSafe<F> as core::ops::function::FnOnce<()>>::call_once
             at /rustc/f6e511eec7342f59a25f7c0534f1dbea00d01b14/library/core/src/panic/unwind_safe.rs:272:9
  42: std::panicking::try::do_call
             at /rustc/f6e511eec7342f59a25f7c0534f1dbea00d01b14/library/std/src/panicking.rs:554:40
  43: ___rust_try
  44: std::panicking::try
             at /rustc/f6e511eec7342f59a25f7c0534f1dbea00d01b14/library/std/src/panicking.rs:518:19
  45: std::panic::catch_unwind
             at /rustc/f6e511eec7342f59a25f7c0534f1dbea00d01b14/library/std/src/panic.rs:345:14
  46: tokio::runtime::task::harness::poll_future
             at .cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.41.0/src/runtime/task/harness.rs:487:18
  47: tokio::runtime::task::harness::Harness<T,S>::poll_inner
             at .cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.41.0/src/runtime/task/harness.rs:209:27
  48: tokio::runtime::task::harness::Harness<T,S>::poll
             at .cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.41.0/src/runtime/task/harness.rs:154:15
  49: tokio::runtime::task::raw::poll
             at .cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.41.0/src/runtime/task/raw.rs:271:5
  50: tokio::runtime::task::raw::RawTask::poll
             at .cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.41.0/src/runtime/task/raw.rs:201:18
  51: tokio::runtime::task::LocalNotified<S>::run
             at .cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.41.0/src/runtime/task/mod.rs:435:9
  52: tokio::runtime::scheduler::current_thread::CoreGuard::block_on::{{closure}}::{{closure}}
             at .cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.41.0/src/runtime/scheduler/current_thread/mod.rs:770:25
  53: tokio::runtime::coop::with_budget
             at .cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.41.0/src/runtime/coop.rs:107:5
  54: tokio::runtime::coop::budget
             at .cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.41.0/src/runtime/coop.rs:73:5
  55: tokio::runtime::scheduler::current_thread::Context::run_task::{{closure}}
             at .cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.41.0/src/runtime/scheduler/current_thread/mod.rs:364:43
  56: tokio::runtime::scheduler::current_thread::Context::enter
             at .cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.41.0/src/runtime/scheduler/current_thread/mod.rs:428:19
  57: tokio::runtime::scheduler::current_thread::Context::run_task
             at .cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.41.0/src/runtime/scheduler/current_thread/mod.rs:364:23
  58: tokio::runtime::scheduler::current_thread::CoreGuard::block_on::{{closure}}
             at .cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.41.0/src/runtime/scheduler/current_thread/mod.rs:769:35
  59: tokio::runtime::scheduler::current_thread::CoreGuard::enter::{{closure}}
             at .cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.41.0/src/runtime/scheduler/current_thread/mod.rs:807:68
  60: tokio::runtime::context::scoped::Scoped<T>::set
             at .cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.41.0/src/runtime/context/scoped.rs:40:9
  61: tokio::runtime::context::set_scheduler::{{closure}}
             at .cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.41.0/src/runtime/context.rs:180:26
  62: std::thread::local::LocalKey<T>::try_with
             at /rustc/f6e511eec7342f59a25f7c0534f1dbea00d01b14/library/std/src/thread/local.rs:283:12
  63: std::thread::local::LocalKey<T>::with
             at /rustc/f6e511eec7342f59a25f7c0534f1dbea00d01b14/library/std/src/thread/local.rs:260:9
  64: tokio::runtime::context::set_scheduler
             at .cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.41.0/src/runtime/context.rs:180:9
  65: tokio::runtime::scheduler::current_thread::CoreGuard::enter
             at .cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.41.0/src/runtime/scheduler/current_thread/mod.rs:807:27
  66: tokio::runtime::scheduler::current_thread::CoreGuard::block_on
             at .cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.41.0/src/runtime/scheduler/current_thread/mod.rs:716:19
  67: tokio::runtime::scheduler::current_thread::CurrentThread::block_on::{{closure}}
             at .cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.41.0/src/runtime/scheduler/current_thread/mod.rs:196:28
  68: tokio::runtime::context::runtime::enter_runtime
             at .cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.41.0/src/runtime/context/runtime.rs:65:16
  69: tokio::runtime::scheduler::current_thread::CurrentThread::block_on
             at .cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.41.0/src/runtime/scheduler/current_thread/mod.rs:184:9
  70: tokio::runtime::runtime::Runtime::block_on_inner
             at .cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.41.0/src/runtime/runtime.rs:368:47
  71: tokio::runtime::runtime::Runtime::block_on
             at .cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.41.0/src/runtime/runtime.rs:342:13
  72: core_integration::sql::joins::test_smj_spill
             at ./tests/sql/joins.rs:323:5
  73: core_integration::sql::joins::test_smj_spill::{{closure}}
             at ./tests/sql/joins.rs:281:30
  74: core::ops::function::FnOnce::call_once
             at /rustc/f6e511eec7342f59a25f7c0534f1dbea00d01b14/library/core/src/ops/function.rs:250:5
  75: core::ops::function::FnOnce::call_once
             at /rustc/f6e511eec7342f59a25f7c0534f1dbea00d01b14/library/core/src/ops/function.rs:250:5
./datafusion-cli/target/debug/datafusion-cli -m 512m
DataFusion CLI v41.0.0
select * from (select unnest(range(0, 100000)) id) t inner join (select unnest(range(0, 100000)) id) t1 on t.id = t1.id;
...
"result"
...

Expected behavior

When sufficient memory is allocated, the join operation should complete successfully without errors. If memory is insufficient, the system should handle the situation gracefully, either by spilling to disk or by returning an appropriate error message indicating a memory limit issue, rather than encountering a panic or overflow error.

@demetribu demetribu added the bug Something isn't working label Nov 3, 2024
@demetribu
Copy link
Contributor Author

Exception raised here

@Dandandan
Copy link
Contributor

Yeah, this doesn't seem like a problem with join but rather a problem with repeat_arrs_from_indices and use of take

@Dandandan Dandandan changed the title Overflow in Join Processing for Large Ranges Overflow in repeat_arrs_from_indices Nov 8, 2024
@demetribu
Copy link
Contributor Author

demetribu commented Nov 8, 2024

#[test]
fn test_batch_indices_overflow() -> Result<()> {
    use arrow_array::{Int64Array, ArrayRef, PrimitiveArray};
    use std::sync::Arc;
    
    let int_values: Vec<i64> = (1..=100000).collect();
    let int64_array = Int64Array::from(int_values);

    let mut list_builder = ListBuilder::new(Int64Array::builder(100000));
    list_builder.values().append_slice(int64_array.values());
    list_builder.append(true); // Mark this as a single list item in the ListArray

    let list_array = Arc::new(list_builder.finish()) as ArrayRef;

    let indices = Int64Array::from(vec![0i64; 100000]);
    let result = repeat_arrs_from_indices(&[list_array], &indices);    
    assert!(
        result.is_ok(),
        "Expected successful result, but got error: {:?}",
        result
    );

    Ok(())
}

Failed with:

thread 'unnest::tests::test_batch_indices_overflow' panicked at .cargo/registry/src/index.crates.io-6f17d22bba15001f/arrow-select-53.2.0/src/take.rs:773:13: attempt to add with overflow
#[test]
fn test_batch_indices_overflow() -> Result<()> {
    use arrow::{array::{LargeListBuilder, ListBuilder}, datatypes::{Field, Int32Type}};
    use arrow_array::{Int64Array, ArrayRef, PrimitiveArray};
    use std::sync::Arc;
    
    let int_values: Vec<i64> = (1..=100000).collect();
    let int64_array = Int64Array::from(int_values);

    let mut list_builder = LargeListBuilder::new(Int64Array::builder(100000));
    list_builder.values().append_slice(int64_array.values());
    list_builder.append(true); // Mark this as a single list item in the ListArray

    let list_array = Arc::new(list_builder.finish()) as ArrayRef;

    let indices = Int64Array::from(vec![0i64; 100000]);
    let result = repeat_arrs_from_indices(&[list_array], &indices);    
    assert!(
        result.is_ok(),
        "Expected successful result, but got error: {:?}",
        result
    );

    Ok(())
}

It works, but slow, taking 224.24 seconds.

cc @alamb

@Dandandan
Copy link
Contributor

Sorry, can you point out the difference between the code snippets?

@Dandandan
Copy link
Contributor

Ah largelistbuilder vs listbuilder

@alamb
Copy link
Contributor

alamb commented Nov 8, 2024

Sounds like a fun optimization exercise to figure out what is going on

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants