Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature][Connectors] LocalFile Support reading gz #8025

Open
wants to merge 29 commits into
base: dev
Choose a base branch
from

Conversation

zhdech
Copy link

@zhdech zhdech commented Nov 12, 2024

Purpose of this pull request

solve #8019

Does this PR introduce any user-facing change?

no

How was this patch tested?

Check list

@liunaijie liunaijie added the First-time contributor First-time contributor label Nov 12, 2024
Copy link
Member

@Hisoka-X Hisoka-X left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @zhdech ! Could you add a test case for this feature?

@zhdech
Copy link
Author

zhdech commented Nov 13, 2024

Thanks @zhdech ! Could you add a test case for this feature?

OK。May I ask how to resolve the following construction errors? What do you need me to do?
好的。请问,针对下面的构建错误,如何解决?需要我怎么做?

@Hisoka-X
Copy link
Member

May I ask how to resolve the following construction errors? What do you need me to do?

Try to retrigger failed ci. It is unstable. cc @zhangshenghang

Copy link
Member

@Hisoka-X Hisoka-X left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add test case

@github-actions github-actions bot added the e2e label Nov 14, 2024
@zhdech
Copy link
Author

zhdech commented Nov 20, 2024

@Hisoka-X Sir, please help me check it.

Copy link
Contributor

@corgy-w corgy-w left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

small changes

docs/en/connector-v2/source/S3File.md Outdated Show resolved Hide resolved
docs/en/connector-v2/source/OssJindoFile.md Outdated Show resolved Hide resolved
docs/en/connector-v2/source/LocalFile.md Outdated Show resolved Hide resolved
docs/en/connector-v2/source/HdfsFile.md Outdated Show resolved Hide resolved
docs/en/connector-v2/source/FtpFile.md Outdated Show resolved Hide resolved
docs/en/connector-v2/source/CosFile.md Outdated Show resolved Hide resolved
@corgy-w
Copy link
Contributor

corgy-w commented Nov 21, 2024

Forgot to add, although .xlsx files do not support reading after being compressed by gz, .xls does. Can be added later for testing. cc @Hisoka-X @zhdech

@zhdech
Copy link
Author

zhdech commented Nov 21, 2024

Forgot to add, although .xlsx files do not support reading after being compressed by gz, .xls does. Can be added later for testing. cc @Hisoka-X @zhdech
When testing. xls locally, it prompts that it is not supported
image

The configuration is as follows:
`env {
parallelism = 1
job.mode = "BATCH"
spark.app.name = "SeaTunnel"
spark.executor.instances = 2
spark.executor.cores = 1
spark.executor.memory = "1g"
spark.master = local
job.mode = "BATCH"
}

source {
LocalFile {
path = "/seatunnel/read/gz/excel/single/e2e-xls-gz.xls.gz"
result_table_name = "fake"
file_format_type = excel
archive_compress_codec = "gz"
field_delimiter = ;
skip_header_row_number = 1
schema = {
fields {
c_map = "map<string, string>"
c_array = "array"
c_string = string
c_boolean = boolean
c_tinyint = tinyint
c_smallint = smallint
c_int = int
c_bigint = bigint
c_float = float
c_double = double
c_bytes = bytes
c_date = date
c_decimal = "decimal(38, 18)"
c_timestamp = timestamp
c_row = {
c_map = "map<string, string>"
c_array = "array"
c_string = string
c_boolean = boolean
c_tinyint = tinyint
c_smallint = smallint
c_int = int
c_bigint = bigint
c_float = float
c_double = double
c_bytes = bytes
c_date = date
c_decimal = "decimal(38, 18)"
c_timestamp = timestamp
}
}
}
}
}

sink {
Assert {
rules {
row_rules = [
{
rule_type = MAX_ROW
rule_value = 5
},
{
rule_type = MIN_ROW
rule_value = 5
}
],
field_rules = [
{
field_name = c_string
field_type = string
field_value = [
{
rule_type = NOT_NULL
}
]
},
{
field_name = c_boolean
field_type = boolean
field_value = [
{
rule_type = NOT_NULL
}
]
},
{
field_name = c_double
field_type = double
field_value = [
{
rule_type = NOT_NULL
}
]
}
]
}
}
}
`

@corgy-w
Copy link
Contributor

corgy-w commented Nov 21, 2024

When testing. xls locally, it prompts that it is not supported

Got it. I will check it out when I have time. tks @zhdech

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Feature][Connectors] LocalFile Support reading gz
4 participants