Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Athena cannot query the iceberg files sinked by Risingwave #19459

Closed
lmatz opened this issue Nov 19, 2024 · 1 comment · Fixed by #19471
Closed

Athena cannot query the iceberg files sinked by Risingwave #19459

lmatz opened this issue Nov 19, 2024 · 1 comment · Fixed by #19471
Labels
type/bug Something isn't working
Milestone

Comments

@lmatz
Copy link
Contributor

lmatz commented Nov 19, 2024

Describe the bug

The file can be read by Risignwave's iceberg source.
But it cannot be queried by Athena on AWS

Error message/log

No response

To Reproduce

No response

Expected behavior

No response

How did you deploy RisingWave?

No response

The version of RisingWave

v2.0.3

Additional context

No response

@lmatz lmatz added the type/bug Something isn't working label Nov 19, 2024
@github-actions github-actions bot added this to the release-2.2 milestone Nov 19, 2024
@darkcofy
Copy link

steps to reproduce:

CREATE SOURCE iceberg_sink_source (
     seq_id bigint,
     user_id bigint,
     user_name varchar)
WITH (
     connector = 'datagen',
     fields.seq_id.kind = 'sequence',
     fields.seq_id.start = '1',
     fields.seq_id.end = '10000000',
     fields.user_id.kind = 'random',
     fields.user_id.min = '1',
     fields.user_id.max = '10000000',
     fields.user_name.kind = 'random',
     fields.user_name.length = '10',
     datagen.rows.per.second = '20000'
 ) FORMAT PLAIN ENCODE JSON;

CREATE TABLE iceberg_sink_table (
     seq_id bigint,
     user_id bigint,
     user_name varchar)
WITH (
     connector = 'datagen',
     fields.seq_id.kind = 'sequence',
     fields.seq_id.start = '1',
     fields.seq_id.end = '10000000',
     fields.user_id.kind = 'random',
     fields.user_id.min = '1',
     fields.user_id.max = '10000000',
     fields.user_name.kind = 'random',
     fields.user_name.length = '10',
     datagen.rows.per.second = '20000'
 ) FORMAT PLAIN ENCODE JSON;

CREATE SINK risingwave_iceberg_sink_test FROM iceberg_sink_table
with (
      type='upsert',
      primary_key='seq_id',
      connector = 'iceberg',
      catalog.type = 'glue',
      catalog.name = 'awsdatacatalog',
      warehouse.path = 's3://hw-data-platform/',
      s3.access.key = 'xxx',
      s3.secret.key = 'xxxx',
      s3.region = 'eu-west-1',
      database.name='risingwave_sink',
      table.name='iceberg_sink_test',
      create_table_if_not_exists=TRUE
  );

error in athena is :

ICEBERG_CANNOT_OPEN_SPLIT: Error opening Iceberg split s3://hw-data-platform/data/00000-0-75d35ec1-55ad-419b-99cd-229203daa761-00003.parquet (offset=47993391, length=2189420235): Range [-4, -4 + 4) out of bounds for length 0

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type/bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants