Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DPR2-147: Fix streaming job not ingesting events after running idle #3604

Merged
merged 4 commits into from
Oct 12, 2023

Conversation

koladeadewuyi-moj
Copy link
Contributor

@koladeadewuyi-moj koladeadewuyi-moj commented Oct 6, 2023

This fixes the streaming job not ingesting events in dev after running idle for an extended period.
This uses s3a for the checkpoint location as supported by Hadoop 3 and also provides configurable arguments to add an idle time between reads.

@github-actions github-actions bot added the environments-repository Used to exclude PRs from this repo in our Slack PR update label Oct 6, 2023
@github-actions
Copy link
Contributor

github-actions bot commented Oct 6, 2023

TFSEC Scan Success

Show Output
*****************************

TFSEC will check the following folders:

Checkov Scan Success

Show Output
*****************************

Checkov will check the following folders:

CTFLint Scan Success

Show Output
*****************************

Setting default tflint config...
Running tflint --init...
Installing `terraform` plugin...
Installed `terraform` (source: github.com/terraform-linters/tflint-ruleset-terraform, version: 0.2.1)
tflint will check the following folders:

1 similar comment
@github-actions
Copy link
Contributor

github-actions bot commented Oct 6, 2023

TFSEC Scan Success

Show Output
*****************************

TFSEC will check the following folders:

Checkov Scan Success

Show Output
*****************************

Checkov will check the following folders:

CTFLint Scan Success

Show Output
*****************************

Setting default tflint config...
Running tflint --init...
Installing `terraform` plugin...
Installed `terraform` (source: github.com/terraform-linters/tflint-ruleset-terraform, version: 0.2.1)
tflint will check the following folders:

@koladeadewuyi-moj koladeadewuyi-moj temporarily deployed to digital-prison-reporting-development October 6, 2023 16:57 — with GitHub Actions Inactive
@koladeadewuyi-moj koladeadewuyi-moj had a problem deploying to digital-prison-reporting-test October 6, 2023 16:57 — with GitHub Actions Failure
@github-actions
Copy link
Contributor

github-actions bot commented Oct 6, 2023

TFSEC Scan Success

Show Output
*****************************

TFSEC will check the following folders:

Checkov Scan Success

Show Output
*****************************

Checkov will check the following folders:

CTFLint Scan Success

Show Output
*****************************

Setting default tflint config...
Running tflint --init...
Installing `terraform` plugin...
Installed `terraform` (source: github.com/terraform-linters/tflint-ruleset-terraform, version: 0.2.1)
tflint will check the following folders:

@koladeadewuyi-moj koladeadewuyi-moj had a problem deploying to digital-prison-reporting-test October 9, 2023 14:10 — with GitHub Actions Failure
@koladeadewuyi-moj koladeadewuyi-moj had a problem deploying to digital-prison-reporting-development October 9, 2023 14:11 — with GitHub Actions Failure
@github-actions
Copy link
Contributor

github-actions bot commented Oct 9, 2023

TFSEC Scan Success

Show Output
*****************************

TFSEC will check the following folders:

Checkov Scan Success

Show Output
*****************************

Checkov will check the following folders:

CTFLint Scan Success

Show Output
*****************************

Setting default tflint config...
Running tflint --init...
Installing `terraform` plugin...
Installed `terraform` (source: github.com/terraform-linters/tflint-ruleset-terraform, version: 0.2.1)
tflint will check the following folders:

@github-actions
Copy link
Contributor

github-actions bot commented Oct 9, 2023

TFSEC Scan Success

Show Output
*****************************

TFSEC will check the following folders:

Checkov Scan Success

Show Output
*****************************

Checkov will check the following folders:

CTFLint Scan Success

Show Output
*****************************

Setting default tflint config...
Running tflint --init...
Installing `terraform` plugin...
Installed `terraform` (source: github.com/terraform-linters/tflint-ruleset-terraform, version: 0.2.1)
tflint will check the following folders:

@koladeadewuyi-moj koladeadewuyi-moj had a problem deploying to digital-prison-reporting-development October 10, 2023 22:16 — with GitHub Actions Failure
@koladeadewuyi-moj koladeadewuyi-moj had a problem deploying to digital-prison-reporting-test October 10, 2023 22:16 — with GitHub Actions Failure
@github-actions
Copy link
Contributor

TFSEC Scan Success

Show Output
*****************************

TFSEC will check the following folders:

Checkov Scan Success

Show Output
*****************************

Checkov will check the following folders:

CTFLint Scan Success

Show Output
*****************************

Setting default tflint config...
Running tflint --init...
Installing `terraform` plugin...
Installed `terraform` (source: github.com/terraform-linters/tflint-ruleset-terraform, version: 0.2.1)
tflint will check the following folders:

@koladeadewuyi-moj koladeadewuyi-moj temporarily deployed to digital-prison-reporting-development October 12, 2023 04:38 — with GitHub Actions Inactive
@koladeadewuyi-moj koladeadewuyi-moj temporarily deployed to digital-prison-reporting-test October 12, 2023 04:38 — with GitHub Actions Inactive
@github-actions
Copy link
Contributor

TFSEC Scan Success

Show Output
*****************************

TFSEC will check the following folders:

Checkov Scan Success

Show Output
*****************************

Checkov will check the following folders:

CTFLint Scan Success

Show Output
*****************************

Setting default tflint config...
Running tflint --init...
Installing `terraform` plugin...
Installed `terraform` (source: github.com/terraform-linters/tflint-ruleset-terraform, version: 0.2.1)
tflint will check the following folders:

@koladeadewuyi-moj koladeadewuyi-moj changed the title DPR2-147: Create empty kinesis stream DPR2-147: Make idle_time_between_reads configurable Oct 12, 2023
@koladeadewuyi-moj koladeadewuyi-moj marked this pull request as ready for review October 12, 2023 08:22
@koladeadewuyi-moj koladeadewuyi-moj requested review from a team as code owners October 12, 2023 08:22
@koladeadewuyi-moj koladeadewuyi-moj had a problem deploying to digital-prison-reporting-test October 12, 2023 08:23 — with GitHub Actions Failure
@koladeadewuyi-moj koladeadewuyi-moj had a problem deploying to digital-prison-reporting-development October 12, 2023 08:23 — with GitHub Actions Failure
@koladeadewuyi-moj koladeadewuyi-moj changed the title DPR2-147: Make idle_time_between_reads configurable DPR2-147: Fix streaming job not ingesting events after running idle Oct 12, 2023
@koladeadewuyi-moj koladeadewuyi-moj had a problem deploying to digital-prison-reporting-development October 12, 2023 08:47 — with GitHub Actions Failure
@koladeadewuyi-moj koladeadewuyi-moj had a problem deploying to digital-prison-reporting-test October 12, 2023 08:47 — with GitHub Actions Failure
@koladeadewuyi-moj koladeadewuyi-moj had a problem deploying to digital-prison-reporting-development October 12, 2023 08:50 — with GitHub Actions Failure
@koladeadewuyi-moj koladeadewuyi-moj had a problem deploying to digital-prison-reporting-test October 12, 2023 08:50 — with GitHub Actions Failure
@@ -17,21 +17,22 @@ module "glue_reporting_hub_job" {
job_language = "scala"
create_security_configuration = local.create_sec_conf
temp_dir = "s3://${module.s3_glue_job_bucket.bucket_id}/tmp/${local.project}-reporting-hub-${local.env}/"
checkpoint_dir = "s3://${module.s3_glue_job_bucket.bucket_id}/checkpoint/${local.project}-reporting-hub-${local.env}/"
# Using s3a for checkpoint because to align with Hadoop 3 supports
checkpoint_dir = "s3a://${module.s3_glue_job_bucket.bucket_id}/checkpoint/${local.project}-reporting-hub-${local.env}/"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How come we're using s3a?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There other Hadoop connectors to S3. Only S3A is actively maintained by the Hadoop project itself.

https://hadoop.apache.org/docs/stable/hadoop-aws/tools/hadoop-aws/index.html#Other_S3_Connectors

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah so up until this point we've been using Amazon's connector which supports s3:// scheme. I'm confused why specifically change the checkpoint prefix to s3a but not all the other paths which also use the connector? dpr.raw.s3.path for example.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The issue only affects the checkpointing where an warning below appears in the logs:

23/10/06 21:58:17 WARN CheckpointFileManager: Could not use FileContext API for managing Structured Streaming checkpoint files at s3://dpr-glue-jobs-development/checkpoint/dpr-reporting-hub-development. Using FileSystem API instead for managing log files. If the implementation of FileSystem.rename() is not atomic, then the correctness and fault-tolerance ofyour Structured Streaming is not guaranteed.

Copy link
Contributor

@dmessengermoj dmessengermoj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The variable is called reporting_hub_idle_time_between_reads_in_millis but it looks like seconds are used. Can you confirm

@koladeadewuyi-moj
Copy link
Contributor Author

The variable is called reporting_hub_idle_time_between_reads_in_millis but it looks like seconds are used. Can you confirm

According to the AWS doc here IdleTimeBetweenReadsInMs only allows millis.

@koladeadewuyi-moj koladeadewuyi-moj merged commit f9c762b into main Oct 12, 2023
39 of 45 checks passed
@koladeadewuyi-moj koladeadewuyi-moj deleted the DPR2-147 branch October 12, 2023 11:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
environments-repository Used to exclude PRs from this repo in our Slack PR update
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants