Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CH-434] convert partition name to lower case #435

Open
wants to merge 765 commits into
base: clickhouse_backend
Choose a base branch
from

Conversation

shuai-xu
Copy link

Changelog category (leave one):

  • Bug Fix (user-visible misbehavior in official stable or prestable release)

Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):

This pr converts partition names get from hdfs path to lower case, as spark paritiions are in lower case.
It fix #434 .

Felixoid and others added 30 commits April 4, 2022 14:15
…ash_v2

One more try to resurrect build hash
Backport ClickHouse#35733 to 22.3: Added settings for insert of invalid IPv6, IPv4 values
Backport ClickHouse#35820 to 22.3: Avoid processing per-column TTL multiple times
- Allow define version as file
- Add inline cache
- Fix auto_release_type function
exmy and others added 24 commits March 21, 2023 10:35
…ence#354)

ShuffleSplitter improvement: support multiple subdirs
Support full join with join condition

Co-authored-by: shuai.li <[email protected]>
Support Decimal type in Gluten 
Co-authored-by: shuai.li <[email protected]>
…like Column 'deviceid' is not presented in input data (Kyligence#388)
@kyligence-git
Copy link
Collaborator

Can one of the admins verify this patch?

@@ -12,5 +12,6 @@ class StringUtils
public:
static PartitionValues parsePartitionTablePath(const std::string & file);
static bool isNullPartitionValue(const std::string & value);
static std::string toLowerCase(const std::string & value);
Copy link

@taiyang-li taiyang-li Apr 17, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it is better to use boost::to_lower or boost::to_lower_copy, like in other places of CH codes

@@ -13,15 +13,25 @@ PartitionValues StringUtils::parsePartitionTablePath(const std::string & file)
auto position = item.find('=');
if (position != std::string::npos)
{
result.emplace_back(PartitionValue(item.substr(0,position), item.substr(position+1)));
result.emplace_back(PartitionValue(toLowerCase(item.substr(0,position)), item.substr(position+1)));
Copy link

@taiyang-li taiyang-li Apr 17, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

应该不止这个地方要改。

下面这段代码,从partition_values中搜索name之前,也需要将name统一转化成小写。
image

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

parquet文件中的列名匹配是否忽略大小写,是由FormatSetting::ORC::case_insensitive_column_matching和FormatSetting::Parquet::case_insensitive_column_matching控制的。这里最好也加上相应的开关判断。

@lwz9103 lwz9103 force-pushed the clickhouse_backend branch 2 times, most recently from dc60d55 to 8066113 Compare May 26, 2023 03:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

can not recognize the partition when path contains capital letter