[Iceberg] Implement iceberg.max-splits-per-second config property #17974

marton-bod · 2023-06-20T15:58:28Z

In order to limit the impact on storage systems, the Iceberg connector should add configuration option to rate limit the split generation. This can help avoid situations where a non-selective query overwhelms the storage (especially HDFS) with block requests. This config flag is already implemented by the Hive, Hudi and Delta connectors.

marton-bod · 2023-06-22T14:33:02Z

This will require a refactor of the way Iceberg splits are returned in batches (probably using the AsyncQueue), therefore implementing iceberg.max-outstanding-splits in tandem would make sense.

alexjo2144 · 2023-07-10T19:36:40Z

This idea has come up a couple times before, but it's always been put off because there hasn't been an obvious need for it. It seems that Iceberg users are much more likely to be using cloud storage systems like S3, GCS, or ADLS which are much less likely to hit rate limits than HDFS.

Have you been hitting rate limits in a production environment, or were you thinking of doing this just for parity with the other connectors?

FYI @electrum

marton-bod · 2023-07-10T20:05:22Z

Hey @alexjo2144 , thanks for reaching out. We do have teams using iceberg tables on HDFS in production. We've had production problems before where a badly written query (a full table scan reading millions of files) would overwhelm the NN with block location requests and bring the whole cluster down. Hence this PR to install some guardrails against that scenario.

marton-bod self-assigned this Jun 22, 2023

marton-bod mentioned this issue Jul 10, 2023

Iceberg split generation rate limit #18214

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Iceberg] Implement iceberg.max-splits-per-second config property #17974

[Iceberg] Implement iceberg.max-splits-per-second config property #17974

marton-bod commented Jun 20, 2023

marton-bod commented Jun 22, 2023

alexjo2144 commented Jul 10, 2023

marton-bod commented Jul 10, 2023

[Iceberg] Implement iceberg.max-splits-per-second config property #17974

[Iceberg] Implement iceberg.max-splits-per-second config property #17974

Comments

marton-bod commented Jun 20, 2023

marton-bod commented Jun 22, 2023

alexjo2144 commented Jul 10, 2023

marton-bod commented Jul 10, 2023