[SUPPORT] Metadata compaction periodically fails/hangs #12261

liiang-huang · 2024-11-15T03:23:45Z

Describe the problem you faced

Hi Hudi community, I have a glue job that is ingesting data to a Hudi MOR table. However, this job periodically failed in the below stage

Could you help to investigate this issue? I have went through this issue, but doesn't seem like the same issue. When I deleted the requested/inflight deltacommit, also tried to increase resources, the errors still persisted. Thanks!

Environment Description

Hudi version : 0.13.1
Spark version : 3.1
Storage (HDFS/S3/GCS..) : S3

Additional context

Add any other context about the problem here.

Stacktrace

Exception in User Class: jp.ne.paypay.daas.data.exceptions.JobFatalError : Streaming batch load failed with error: Could not compact s3://pay2-datalake-prod-standard/datasets/bronze/payment-accounting-db1-20241010-aurora-prod/payment_accounting/sub_payments_accounting-1761348391


Job aborted due to stage failure: Task 169 in stage 87.0 failed 4 times, most recent failure: Lost task 169.3 in stage 87.0 (TID 21675) (10.12.56.40 executor 13): ExecutorLostFailure (executor 13 exited caused by one of the running tasks) Reason: Executor heartbeat timed out after 508519 ms
--

The text was updated successfully, but these errors were encountered:

ad1happy2go · 2024-11-15T10:04:43Z

@liiang-huang Can you collect more stats from metadata table? I see executors getting lost.
You can open spark UI and executors page and see the reason for the executor loss.
How many files you see under .metadata directory? is colstats or RLI enabled. Please share the hudi configs.

liiang-huang · 2024-11-18T05:00:24Z

@ad1happy2go Yes, the reason is

Executor heartbeat timed out after 636587 ms

There are 229 objects in .hoodie/metadata/.hoodie folder, there is a column_stats in metadata folder. Let me know what should I look for further!

liiang-huang changed the title ~~[SUPPORT] Metadata compaction periodically failure/hang~~ [SUPPORT] Metadata compaction periodically fails/hangs Nov 15, 2024

ad1happy2go added metadata metadata table priority:critical production down; pipelines stalled; Need help asap. labels Nov 15, 2024

ad1happy2go added this to Hudi Issue Support Nov 15, 2024

github-project-automation bot moved this to ⏳ Awaiting Triage in Hudi Issue Support Nov 15, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SUPPORT] Metadata compaction periodically fails/hangs #12261

[SUPPORT] Metadata compaction periodically fails/hangs #12261

liiang-huang commented Nov 15, 2024 •

edited

Loading

ad1happy2go commented Nov 15, 2024

liiang-huang commented Nov 18, 2024 •

edited

Loading

[SUPPORT] Metadata compaction periodically fails/hangs #12261

[SUPPORT] Metadata compaction periodically fails/hangs #12261

Comments

liiang-huang commented Nov 15, 2024 • edited Loading

ad1happy2go commented Nov 15, 2024

liiang-huang commented Nov 18, 2024 • edited Loading

liiang-huang commented Nov 15, 2024 •

edited

Loading

liiang-huang commented Nov 18, 2024 •

edited

Loading