Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support fixed_size_list for make_array #6759

Merged
merged 5 commits into from
Jul 5, 2023

Conversation

jayzhan211
Copy link
Contributor

Which issue does this PR close?

Ref #6560

Rationale for this change

Add FixedSizeList support for array methods

What changes are included in this PR?

Only MakeArray is done for this PR.

  1. Add casting from fixedSizeList to list in analyzer (type_coercion).
  2. Support ScalarValue::FixedSizeList, not all feature is implemented
  3. Able to parse List for sqllogictest.

Are these changes tested?

  1. sqllogictest
    array.slt

  2. unit test in type_coercion.rs

Are there any user-facing changes?

Task

  • MakeArray
  • Other array methods

@github-actions github-actions bot added core Core DataFusion crate optimizer Optimizer rules sql SQL Planner sqllogictest SQL Logic Tests (.slt) labels Jun 24, 2023
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I create fixedsizelist parquet from my own script.
Is there any existing way for us to easily create/update this kind of test case?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for this contribution @jayzhan211 -- I think other than the patch to crates.io this PR is ready to go.

I left some suggestions, which I think would improve the PR

Cargo.toml Outdated
@@ -53,6 +53,14 @@ arrow-array = { version = "42.0.0", default-features = false, features = ["chron
parquet = { version = "42.0.0", features = ["arrow", "async", "object_store"] }
sqlparser = { version = "0.35", features = ["visitor"] }

[patch.crates-io]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we will have to wait for this code to be released to crates.io (in about 10 days) before merging this PR

@@ -3646,29 +3675,9 @@ impl fmt::Display for ScalarValue {
ScalarValue::TimestampNanosecond(e, _) => format_option!(f, e)?,
ScalarValue::Utf8(e) => format_option!(f, e)?,
ScalarValue::LargeUtf8(e) => format_option!(f, e)?,
ScalarValue::Binary(e) => match e {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

❤️

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@@ -486,6 +497,8 @@ impl<'a> Tokenizer<'a> {
"Date32" => Token::SimpleType(DataType::Date32),
"Date64" => Token::SimpleType(DataType::Date64),

"List" => Token::List,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@alamb alamb added the waiting-on-upstream PR is waiting on an upstream dependency to be updated label Jun 26, 2023
@alamb
Copy link
Contributor

alamb commented Jun 26, 2023

FYI @izveigor

@tustvold tustvold changed the base branch from main to backup-main June 27, 2023 11:12
@tustvold tustvold changed the base branch from backup-main to main June 27, 2023 11:12
@alamb alamb marked this pull request as draft June 28, 2023 18:15
@alamb
Copy link
Contributor

alamb commented Jun 28, 2023

Marking as a draft as it looks like this PR is not waiting on review -- it is waiting on an upstream release (scheduled early next week) as well as to address some other comments

jayzhan211 referenced this pull request Jul 5, 2023
* feat: supports NULL in arrays

* feat: supports NULL in array functions

* fix: array_fill error

* fix: merge

* fix: cargo fmt

---------

Co-authored-by: Andrew Lamb <[email protected]>
Signed-off-by: jayzhan211 <[email protected]>
Signed-off-by: jayzhan211 <[email protected]>
@jayzhan211 jayzhan211 force-pushed the fixedsizelist_autocast branch from ecc93dd to c6ba92a Compare July 5, 2023 01:04
@github-actions github-actions bot added the physical-expr Physical Expressions label Jul 5, 2023
@@ -125,7 +125,7 @@ fn array_array(args: &[ArrayRef], data_type: DataType) -> Result<ArrayRef> {
}

let list_data_type =
DataType::List(Arc::new(Field::new("item", data_type, false)));
DataType::List(Arc::new(Field::new("item", data_type, true)));
Copy link
Contributor Author

@jayzhan211 jayzhan211 Jul 5, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Revert the change from #6662 to avoid failure on the schema mismatch

@jayzhan211 jayzhan211 marked this pull request as ready for review July 5, 2023 02:24
@jayzhan211 jayzhan211 marked this pull request as draft July 5, 2023 07:27
@jayzhan211 jayzhan211 marked this pull request as ready for review July 5, 2023 08:41
Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @jayzhan211

@alamb
Copy link
Contributor

alamb commented Jul 5, 2023

cc @izveigor

@alamb alamb merged commit 658206a into apache:main Jul 5, 2023
2010YOUY01 pushed a commit to 2010YOUY01/arrow-datafusion that referenced this pull request Jul 5, 2023
* support make_array for fixed_size_list

Signed-off-by: jayzhan211 <[email protected]>

* add arrow-typeof in test

Signed-off-by: jayzhan211 <[email protected]>

* fix schema mismatch

Signed-off-by: jayzhan211 <[email protected]>

* cleanup code

Signed-off-by: jayzhan211 <[email protected]>

* create array data with correct len

Signed-off-by: jayzhan211 <[email protected]>

---------

Signed-off-by: jayzhan211 <[email protected]>
alamb pushed a commit to alamb/datafusion that referenced this pull request Jul 6, 2023
* support make_array for fixed_size_list

Signed-off-by: jayzhan211 <[email protected]>

* add arrow-typeof in test

Signed-off-by: jayzhan211 <[email protected]>

* fix schema mismatch

Signed-off-by: jayzhan211 <[email protected]>

* cleanup code

Signed-off-by: jayzhan211 <[email protected]>

* create array data with correct len

Signed-off-by: jayzhan211 <[email protected]>

---------

Signed-off-by: jayzhan211 <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
core Core DataFusion crate optimizer Optimizer rules physical-expr Physical Expressions sql SQL Planner sqllogictest SQL Logic Tests (.slt) waiting-on-upstream PR is waiting on an upstream dependency to be updated
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants