You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi Hudi community, I have a glue job that is ingesting data to a Hudi MOR table. However, this job periodically failed in the below stage
Could you help to investigate this issue? I have went through this issue, but doesn't seem like the same issue. When I deleted the requested/inflight deltacommit, also tried to increase resources, the errors still persisted. Thanks!
Environment Description
Hudi version : 0.13.1
Spark version : 3.1
Storage (HDFS/S3/GCS..) : S3
Additional context
Add any other context about the problem here.
Stacktrace
Exception in User Class: jp.ne.paypay.daas.data.exceptions.JobFatalError : Streaming batch load failed with error: Could not compact s3://pay2-datalake-prod-standard/datasets/bronze/payment-accounting-db1-20241010-aurora-prod/payment_accounting/sub_payments_accounting-1761348391
Job aborted due to stage failure: Task 169 in stage 87.0 failed 4 times, most recent failure: Lost task 169.3 in stage 87.0 (TID 21675) (10.12.56.40 executor 13): ExecutorLostFailure (executor 13 exited caused by one of the running tasks) Reason: Executor heartbeat timed out after 508519 ms
--
The text was updated successfully, but these errors were encountered:
liiang-huang
changed the title
[SUPPORT] Metadata compaction periodically failure/hang
[SUPPORT] Metadata compaction periodically fails/hangs
Nov 15, 2024
@liiang-huang Can you collect more stats from metadata table? I see executors getting lost.
You can open spark UI and executors page and see the reason for the executor loss.
How many files you see under .metadata directory? is colstats or RLI enabled. Please share the hudi configs.
Describe the problem you faced
Hi Hudi community, I have a glue job that is ingesting data to a Hudi MOR table. However, this job periodically failed in the below stage
Could you help to investigate this issue? I have went through this issue, but doesn't seem like the same issue. When I deleted the requested/inflight deltacommit, also tried to increase resources, the errors still persisted. Thanks!
Environment Description
Hudi version : 0.13.1
Spark version : 3.1
Storage (HDFS/S3/GCS..) : S3
Additional context
Add any other context about the problem here.
Stacktrace
The text was updated successfully, but these errors were encountered: