[NSE-1171] Throw RuntimeException when reading duplicate fields in case-insensitive mode #1173

jackylee-ch · 2022-11-28T06:57:41Z

What changes were proposed in this pull request?

We didn't cover the corner case when reading duplicate fields in case-insensitive mode. If there is one more field matched, we just return the first field that matched. However in vanilla spark, it will throw RuntimeException in theses cases.

How was this patch tested?

unit tests

github-actions · 2022-11-28T06:57:55Z

#1171

PHILO-HE · 2022-11-30T07:41:39Z

Thanks for your patch. All internal tests have passed. The patch looks good to me.
@zhouyuan, please have a further check.

zhouyuan

👍

…se-insensitive mode (oap-project#1173) * throw exception if one more columns matched in case insensitive mode * add schema check in arrow v2

* [NSE-1170] Set correct row number in batch scan w/ partition columns (#1172) * [NSE-1171] Throw RuntimeException when reading duplicate fields in case-insensitive mode (#1173) * throw exception if one more columns matched in case insensitive mode * add schema check in arrow v2 * bump h2/pgsql version (#1176) * bump h2/pgsql version Signed-off-by: Yuan Zhou <[email protected]> * ignore one failed test Signed-off-by: Yuan Zhou <[email protected]> Signed-off-by: Yuan Zhou <[email protected]> * [NSE-956] allow to write parquet with compression (#1014) This patch adds support for writing parquet with compression df.coalesce(1).write.format("arrow").option("parquet.compression","zstd").save(path) Signed-off-by: Yuan Zhou [email protected] * [NSE-1161] Support read-write parquet conversion to read-write arrow (#1162) * add ArrowConvertExtension * do not convert parquet fileformat while writing to partitioned/bucketed/sorted output * fix cache failed * care about write codec * disable convertor extension by default * add some comments * remove wrong compress type check (#1178) Since the compresssion has been supported in #1014 . The extra compression check in ArrowConvertorExtension can be remove now. * fix to use right arrow branch (#1179) fix to use right arrow branch Signed-off-by: Yuan Zhou <[email protected]> * [NSE-1171] Support merge parquet schema and read missing schema (#1175) * Support merge parquet schema and read missing schema * fix error * optimize null vectors * optimize code * optimize code * change code * add schema merge suite tests * add test for struct type * to use 1.5 branch arrow Signed-off-by: Yuan Zhou <[email protected]> Signed-off-by: Yuan Zhou <[email protected]> Signed-off-by: Yuan Zhou [email protected] Co-authored-by: Jacky Lee <[email protected]>

throw exception if one more columns matched in case insensitive mode

671ed5f

add schema check in arrow v2

f1e970e

PHILO-HE approved these changes Nov 30, 2022

View reviewed changes

zhouyuan approved these changes Nov 30, 2022

View reviewed changes

zhouyuan merged commit 9b60057 into oap-project:main Nov 30, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[NSE-1171] Throw RuntimeException when reading duplicate fields in case-insensitive mode #1173

[NSE-1171] Throw RuntimeException when reading duplicate fields in case-insensitive mode #1173

jackylee-ch commented Nov 28, 2022

github-actions bot commented Nov 28, 2022

PHILO-HE commented Nov 30, 2022

zhouyuan left a comment

[NSE-1171] Throw RuntimeException when reading duplicate fields in case-insensitive mode #1173

[NSE-1171] Throw RuntimeException when reading duplicate fields in case-insensitive mode #1173

Conversation

jackylee-ch commented Nov 28, 2022

What changes were proposed in this pull request?

How was this patch tested?

github-actions bot commented Nov 28, 2022

PHILO-HE commented Nov 30, 2022

zhouyuan left a comment

Choose a reason for hiding this comment