Insert high throughput events using iceberg #23592

allanbatista · 2024-09-27T11:07:28Z

allanbatista
Sep 27, 2024

I am analyzing how to do a massive insert using trino (with iceberg). - about 1 million per minute (each event has about 1kb)

I tried to do a this inserts using SQL using python connector, but the throughput is so slow.
I tried to parallelize using multiple workers, but I get continuous error from iceberg metadata.

I have my current pipeline in AWS. Kafka + Firehose + S3 + Athena

Is possible to update a partition in trino like Athena (ALTER TABLE ADD PARTITION) to add a already exists file in a path structured partitioned?

Filepath example: account_id=account-id-1/service_name=service-name-1/year=2021/month=01/day=01/hour=00/1727426921807104378_N_acc88e31-eb5e-4eac-be1e-871703dedbda.parquet

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Insert high throughput events using iceberg #23592

{{title}}

Replies: 0 comments

Select a reply

Insert high throughput events using iceberg #23592

allanbatista Sep 27, 2024

Replies: 0 comments

allanbatista
Sep 27, 2024