-
Notifications
You must be signed in to change notification settings - Fork 20
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DPR2-147: Fix streaming job not ingesting events after running idle #3604
Conversation
fb8d6b3
to
ae68de6
Compare
|
1 similar comment
|
ae68de6
to
335c560
Compare
|
|
6d81c2f
to
092d1b8
Compare
|
|
|
@@ -17,21 +17,22 @@ module "glue_reporting_hub_job" { | |||
job_language = "scala" | |||
create_security_configuration = local.create_sec_conf | |||
temp_dir = "s3://${module.s3_glue_job_bucket.bucket_id}/tmp/${local.project}-reporting-hub-${local.env}/" | |||
checkpoint_dir = "s3://${module.s3_glue_job_bucket.bucket_id}/checkpoint/${local.project}-reporting-hub-${local.env}/" | |||
# Using s3a for checkpoint because to align with Hadoop 3 supports | |||
checkpoint_dir = "s3a://${module.s3_glue_job_bucket.bucket_id}/checkpoint/${local.project}-reporting-hub-${local.env}/" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How come we're using s3a?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There other Hadoop connectors to S3. Only S3A is actively maintained by the Hadoop project itself.
https://hadoop.apache.org/docs/stable/hadoop-aws/tools/hadoop-aws/index.html#Other_S3_Connectors
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah so up until this point we've been using Amazon's connector which supports s3:// scheme. I'm confused why specifically change the checkpoint prefix to s3a but not all the other paths which also use the connector? dpr.raw.s3.path for example.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The issue only affects the checkpointing where an warning below appears in the logs:
23/10/06 21:58:17 WARN CheckpointFileManager: Could not use FileContext API for managing Structured Streaming checkpoint files at s3://dpr-glue-jobs-development/checkpoint/dpr-reporting-hub-development. Using FileSystem API instead for managing log files. If the implementation of FileSystem.rename() is not atomic, then the correctness and fault-tolerance ofyour Structured Streaming is not guaranteed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The variable is called reporting_hub_idle_time_between_reads_in_millis
but it looks like seconds are used. Can you confirm
According to the AWS doc here |
This fixes the streaming job not ingesting events in dev after running idle for an extended period.
This uses s3a for the checkpoint location as supported by Hadoop 3 and also provides configurable arguments to add an idle time between reads.