Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Filter crashes when the input schema contains a map #8262

Closed
Tracked by #11429
helgikrs opened this issue Nov 18, 2023 · 4 comments · Fixed by #8501
Closed
Tracked by #11429

Filter crashes when the input schema contains a map #8262

helgikrs opened this issue Nov 18, 2023 · 4 comments · Fixed by #8501
Labels
bug Something isn't working

Comments

@helgikrs
Copy link

helgikrs commented Nov 18, 2023

Describe the bug

DataFusion v33 panics & returns an error in filter when the input schema contains a map because ScalarValue doesn't support maps.

Just making a simple memory table with a map field and querying it with a simple where clause produces this error when creating a physical plan:

Context("EnforceDistribution", NotImplemented("Can't create a scalar from data_type \"Map(Field { name: \"entries\", data_type: Struct([Field { name: \"key\", data_type: Int64, nullable: false, dict_id: 0, dict_is_ordered: false, metadata: {} }, Field { name: \"value\", data_type: Int64, nullable: true, dict_id: 0, dict_is_ordered: false, metadata: {} }]), nullable: false, dict_id: 0, dict_is_ordered: false, metadata: {} }, false)\""))

Creating an explain plan panics in the Display implementation

thread 'main' panicked at /rustc/cc66ad468955717ab92600c770da8c1601a4ff33/library/alloc/src/string.rs:2460:14:
a Display implementation returned an error unexpectedly: Error
stack backtrace:
   0: rust_begin_unwind
             at /rustc/cc66ad468955717ab92600c770da8c1601a4ff33/library/std/src/panicking.rs:595:5
   1: core::panicking::panic_fmt
             at /rustc/cc66ad468955717ab92600c770da8c1601a4ff33/library/core/src/panicking.rs:67:14
   2: core::result::unwrap_failed
             at /rustc/cc66ad468955717ab92600c770da8c1601a4ff33/library/core/src/result.rs:1652:5
   3: core::result::Result<T,E>::expect
             at /rustc/cc66ad468955717ab92600c770da8c1601a4ff33/library/core/src/result.rs:1034:23
   4: <T as alloc::string::ToString>::to_string
             at /rustc/cc66ad468955717ab92600c770da8c1601a4ff33/library/alloc/src/string.rs:2459:9
   5: datafusion_physical_plan::display::DisplayableExecutionPlan::to_stringified
             at /home/helgi/.cargo/registry/src/index.crates.io-6f17d22bba15001f/datafusion-physical-plan-33.0.0/src/display.rs:206:41
   6: datafusion::physical_planner::DefaultPhysicalPlanner::handle_explain::{{closure}}
             at /home/helgi/.cargo/registry/src/index.crates.io-6f17d22bba15001f/datafusion-33.0.0/src/physical_planner.rs:1897:29
   7: <datafusion::physical_planner::DefaultPhysicalPlanner as datafusion::physical_planner::PhysicalPlanner>::create_physical_plan::{{closure}}
             at /home/helgi/.cargo/registry/src/index.crates.io-6f17d22bba15001f/datafusion-33.0.0/src/physical_planner.rs:445:64

To Reproduce

Here's a simple repro

use std::sync::Arc;

use datafusion::{
    arrow::datatypes::{DataType, Field, Schema},
    datasource::MemTable,
    prelude::SessionContext,
};

#[tokio::main]
async fn main() {
    let ctx = SessionContext::new();

    let key = Field::new("key", DataType::Int64, false);
    let value = Field::new("value", DataType::Int64, true);
    let map_field = Field::new("entries", DataType::Struct(vec![key, value].into()), false);
    let fields = vec![
        Field::new("a", DataType::Int64, true),
        Field::new("map", DataType::Map(map_field.into(), false), true),
    ];
    let schema = Schema::new(fields);

    let memory_table = MemTable::try_new(schema.into(), vec![vec![]]).unwrap();

    ctx.register_table("tbl", Arc::new(memory_table)).unwrap();

    // This panics in the display implementation, removing the explain causes
    // this to return the not implemented error described above instead.
    ctx.sql("explain select * from tbl where a > 0;")
        .await
        .unwrap()
        .collect()
        .await
        .unwrap();
}

I have a more complicated query within a more complex system that panics
in the equivalence_properties function. I don't have a small repro. I'll work on making a small repro for it and then I'll update the issue.

Expected behavior

This seems to be a regression as this worked in earlier releases (specifically v32).

Additional context

No response

@razeghi71
Copy link
Contributor

If this is still relevant, I would like to take a look at this?

@alamb
Copy link
Contributor

alamb commented Dec 8, 2023

If this is still relevant, I would like to take a look at this?

THanks @razeghi71 -- the first step is probably to see if the reproducer on this ticket still panics

@razeghi71
Copy link
Contributor

razeghi71 commented Dec 11, 2023

This problem has been introduced in c9330bc which changed the default empty value for lower and upper bound from ScalarValue::Null to ScalarValue::try_from(field.data_type()). which is not implemented for map, to fix it we have to either implement it for Map or fallback on Null value. I have done these two in two draft PRs I have above.

In the first one I tried to implement the empty value for map as described in the description of Map which returns a List of key values as a struct called entries. And in the second one I fall back to Null. Would be great to get some feedback to see what you folks think? And what is the good approach here?

@alamb
Copy link
Contributor

alamb commented Dec 12, 2023

This problem has been introduced in c9330bc which changed the default empty value for lower and upper bound from ScalarValue::Null to ScalarValue::try_from(field.data_type()). which is not implemented for map, to fix it we have to either implement it for Map or fallback on Null value. I have done these two in two draft PRs I have above.

I think implementing it for Map is what makes the most sense to me

However, I think this may be a non trivial change as it will require implementing a ScalarValue::Map variant which does not see to exist

https://github.com/apache/arrow-datafusion/blob/7f312c8a1d82b19f03c640e4a5d7d6437b2ee835/datafusion/common/src/scalar.rs#L102

l I think ScalarValue::Map can follow the model of ScalarValue::List (as Maps are just lists of structs, as I undertand):

enum ScalarValue {
  ...
  /// one element array
  Map(ArrayRef)
}

In the first one I tried to implement the empty value for map as described in the description of Map which returns a List of key values as a struct called entries.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
3 participants