DPR2-147: Fix streaming job not ingesting events after running idle #3604

koladeadewuyi-moj · 2023-10-06T16:47:38Z

This fixes the streaming job not ingesting events in dev after running idle for an extended period.
This uses s3a for the checkpoint location as supported by Hadoop 3 and also provides configurable arguments to add an idle time between reads.

github-actions · 2023-10-06T16:51:10Z

`TFSEC Scan` Success

Show Output

*****************************

TFSEC will check the following folders:

`Checkov Scan` Success

Show Output

*****************************

Checkov will check the following folders:

`CTFLint Scan` Success

Show Output

*****************************

Setting default tflint config...
Running tflint --init...
Installing `terraform` plugin...
Installed `terraform` (source: github.com/terraform-linters/tflint-ruleset-terraform, version: 0.2.1)
tflint will check the following folders:

github-actions · 2023-10-06T16:52:22Z

`TFSEC Scan` Success

Show Output

*****************************

TFSEC will check the following folders:

`Checkov Scan` Success

Show Output

*****************************

Checkov will check the following folders:

`CTFLint Scan` Success

Show Output

*****************************

Setting default tflint config...
Running tflint --init...
Installing `terraform` plugin...
Installed `terraform` (source: github.com/terraform-linters/tflint-ruleset-terraform, version: 0.2.1)
tflint will check the following folders:

github-actions · 2023-10-06T16:58:09Z

`TFSEC Scan` Success

Show Output

*****************************

TFSEC will check the following folders:

`Checkov Scan` Success

Show Output

*****************************

Checkov will check the following folders:

`CTFLint Scan` Success

Show Output

*****************************

Setting default tflint config...
Running tflint --init...
Installing `terraform` plugin...
Installed `terraform` (source: github.com/terraform-linters/tflint-ruleset-terraform, version: 0.2.1)
tflint will check the following folders:

github-actions · 2023-10-09T14:13:02Z

`TFSEC Scan` Success

Show Output

*****************************

TFSEC will check the following folders:

`Checkov Scan` Success

Show Output

*****************************

Checkov will check the following folders:

`CTFLint Scan` Success

Show Output

*****************************

Setting default tflint config...
Running tflint --init...
Installing `terraform` plugin...
Installed `terraform` (source: github.com/terraform-linters/tflint-ruleset-terraform, version: 0.2.1)
tflint will check the following folders:

github-actions · 2023-10-09T14:16:37Z

`TFSEC Scan` Success

Show Output

*****************************

TFSEC will check the following folders:

`Checkov Scan` Success

Show Output

*****************************

Checkov will check the following folders:

`CTFLint Scan` Success

Show Output

*****************************

Setting default tflint config...
Running tflint --init...
Installing `terraform` plugin...
Installed `terraform` (source: github.com/terraform-linters/tflint-ruleset-terraform, version: 0.2.1)
tflint will check the following folders:

github-actions · 2023-10-10T22:17:34Z

`TFSEC Scan` Success

Show Output

*****************************

TFSEC will check the following folders:

`Checkov Scan` Success

Show Output

*****************************

Checkov will check the following folders:

`CTFLint Scan` Success

Show Output

*****************************

Setting default tflint config...
Running tflint --init...
Installing `terraform` plugin...
Installed `terraform` (source: github.com/terraform-linters/tflint-ruleset-terraform, version: 0.2.1)
tflint will check the following folders:

github-actions · 2023-10-12T04:39:50Z

`TFSEC Scan` Success

Show Output

*****************************

TFSEC will check the following folders:

`Checkov Scan` Success

Show Output

*****************************

Checkov will check the following folders:

`CTFLint Scan` Success

Show Output

*****************************

Setting default tflint config...
Running tflint --init...
Installing `terraform` plugin...
Installed `terraform` (source: github.com/terraform-linters/tflint-ruleset-terraform, version: 0.2.1)
tflint will check the following folders:

tom-ogle-moj · 2023-10-12T09:03:01Z

terraform/environments/digital-prison-reporting/main.tf

@@ -17,21 +17,22 @@ module "glue_reporting_hub_job" {
  job_language                  = "scala"
  create_security_configuration = local.create_sec_conf
  temp_dir                      = "s3://${module.s3_glue_job_bucket.bucket_id}/tmp/${local.project}-reporting-hub-${local.env}/"
-  checkpoint_dir                = "s3://${module.s3_glue_job_bucket.bucket_id}/checkpoint/${local.project}-reporting-hub-${local.env}/"
+  # Using s3a for checkpoint because to align with Hadoop 3 supports
+  checkpoint_dir                = "s3a://${module.s3_glue_job_bucket.bucket_id}/checkpoint/${local.project}-reporting-hub-${local.env}/"


How come we're using s3a?

There other Hadoop connectors to S3. Only S3A is actively maintained by the Hadoop project itself.

https://hadoop.apache.org/docs/stable/hadoop-aws/tools/hadoop-aws/index.html#Other_S3_Connectors

Yeah so up until this point we've been using Amazon's connector which supports s3:// scheme. I'm confused why specifically change the checkpoint prefix to s3a but not all the other paths which also use the connector? dpr.raw.s3.path for example.

The issue only affects the checkpointing where an warning below appears in the logs:

23/10/06 21:58:17 WARN CheckpointFileManager: Could not use FileContext API for managing Structured Streaming checkpoint files at s3://dpr-glue-jobs-development/checkpoint/dpr-reporting-hub-development. Using FileSystem API instead for managing log files. If the implementation of FileSystem.rename() is not atomic, then the correctness and fault-tolerance ofyour Structured Streaming is not guaranteed.

dmessengermoj

The variable is called reporting_hub_idle_time_between_reads_in_millis but it looks like seconds are used. Can you confirm

koladeadewuyi-moj · 2023-10-12T09:57:02Z

The variable is called reporting_hub_idle_time_between_reads_in_millis but it looks like seconds are used. Can you confirm

According to the AWS doc here IdleTimeBetweenReadsInMs only allows millis.

github-actions bot added the environments-repository Used to exclude PRs from this repo in our Slack PR update label Oct 6, 2023

koladeadewuyi-moj force-pushed the DPR2-147 branch from fb8d6b3 to ae68de6 Compare October 6, 2023 16:49

DPR2-147: Create empty kinesis stream

335c560

koladeadewuyi-moj force-pushed the DPR2-147 branch from ae68de6 to 335c560 Compare October 6, 2023 16:55

koladeadewuyi-moj temporarily deployed to digital-prison-reporting-development October 6, 2023 16:57 — with GitHub Actions Inactive

koladeadewuyi-moj had a problem deploying to digital-prison-reporting-test October 6, 2023 16:57 — with GitHub Actions Failure

koladeadewuyi-moj had a problem deploying to digital-prison-reporting-test October 9, 2023 14:10 — with GitHub Actions Failure

koladeadewuyi-moj had a problem deploying to digital-prison-reporting-development October 9, 2023 14:11 — with GitHub Actions Failure

DPR2-147: Add idle time between reads and s3a for checkpoint

092d1b8

koladeadewuyi-moj force-pushed the DPR2-147 branch from 6d81c2f to 092d1b8 Compare October 9, 2023 14:13

koladeadewuyi-moj had a problem deploying to digital-prison-reporting-test October 9, 2023 14:15 — with GitHub Actions Failure

koladeadewuyi-moj had a problem deploying to digital-prison-reporting-development October 9, 2023 14:15 — with GitHub Actions Failure

DPR2-147: Use s3a for hadoop checkpoints

ffca5e2

koladeadewuyi-moj had a problem deploying to digital-prison-reporting-development October 10, 2023 22:16 — with GitHub Actions Failure

koladeadewuyi-moj had a problem deploying to digital-prison-reporting-test October 10, 2023 22:16 — with GitHub Actions Failure

DPR2-147: Make idle_time_between_reads configurable

8c97001

koladeadewuyi-moj temporarily deployed to digital-prison-reporting-development October 12, 2023 04:38 — with GitHub Actions Inactive

koladeadewuyi-moj temporarily deployed to digital-prison-reporting-test October 12, 2023 04:38 — with GitHub Actions Inactive

koladeadewuyi-moj changed the title ~~DPR2-147: Create empty kinesis stream~~ DPR2-147: Make idle_time_between_reads configurable Oct 12, 2023

koladeadewuyi-moj marked this pull request as ready for review October 12, 2023 08:22

koladeadewuyi-moj requested review from a team as code owners October 12, 2023 08:22

koladeadewuyi-moj had a problem deploying to digital-prison-reporting-test October 12, 2023 08:23 — with GitHub Actions Failure

koladeadewuyi-moj had a problem deploying to digital-prison-reporting-development October 12, 2023 08:23 — with GitHub Actions Failure

koladeadewuyi-moj changed the title ~~DPR2-147: Make idle_time_between_reads configurable~~ DPR2-147: Fix streaming job not ingesting events after running idle Oct 12, 2023

koladeadewuyi-moj had a problem deploying to digital-prison-reporting-development October 12, 2023 08:47 — with GitHub Actions Failure

koladeadewuyi-moj had a problem deploying to digital-prison-reporting-test October 12, 2023 08:47 — with GitHub Actions Failure

koladeadewuyi-moj had a problem deploying to digital-prison-reporting-development October 12, 2023 08:50 — with GitHub Actions Failure

koladeadewuyi-moj had a problem deploying to digital-prison-reporting-test October 12, 2023 08:50 — with GitHub Actions Failure

tom-ogle-moj reviewed Oct 12, 2023

View reviewed changes

tom-ogle-moj approved these changes Oct 12, 2023

View reviewed changes

dmessengermoj approved these changes Oct 12, 2023

View reviewed changes

koladeadewuyi-moj merged commit f9c762b into main Oct 12, 2023
39 of 45 checks passed

koladeadewuyi-moj deleted the DPR2-147 branch October 12, 2023 11:28

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DPR2-147: Fix streaming job not ingesting events after running idle #3604

DPR2-147: Fix streaming job not ingesting events after running idle #3604

koladeadewuyi-moj commented Oct 6, 2023 •

edited

Loading

github-actions bot commented Oct 6, 2023

github-actions bot commented Oct 6, 2023

github-actions bot commented Oct 6, 2023

github-actions bot commented Oct 9, 2023

github-actions bot commented Oct 9, 2023

github-actions bot commented Oct 10, 2023

github-actions bot commented Oct 12, 2023

tom-ogle-moj Oct 12, 2023

koladeadewuyi-moj Oct 12, 2023

tom-ogle-moj Oct 12, 2023

koladeadewuyi-moj Oct 12, 2023

dmessengermoj left a comment

koladeadewuyi-moj commented Oct 12, 2023

DPR2-147: Fix streaming job not ingesting events after running idle #3604

DPR2-147: Fix streaming job not ingesting events after running idle #3604

Conversation

koladeadewuyi-moj commented Oct 6, 2023 • edited Loading

github-actions bot commented Oct 6, 2023

TFSEC Scan Success

Checkov Scan Success

CTFLint Scan Success

github-actions bot commented Oct 6, 2023

TFSEC Scan Success

Checkov Scan Success

CTFLint Scan Success

github-actions bot commented Oct 6, 2023

TFSEC Scan Success

Checkov Scan Success

CTFLint Scan Success

github-actions bot commented Oct 9, 2023

TFSEC Scan Success

Checkov Scan Success

CTFLint Scan Success

github-actions bot commented Oct 9, 2023

TFSEC Scan Success

Checkov Scan Success

CTFLint Scan Success

github-actions bot commented Oct 10, 2023

TFSEC Scan Success

Checkov Scan Success

CTFLint Scan Success

github-actions bot commented Oct 12, 2023

TFSEC Scan Success

Checkov Scan Success

CTFLint Scan Success

tom-ogle-moj Oct 12, 2023

Choose a reason for hiding this comment

koladeadewuyi-moj Oct 12, 2023

Choose a reason for hiding this comment

tom-ogle-moj Oct 12, 2023

Choose a reason for hiding this comment

koladeadewuyi-moj Oct 12, 2023

Choose a reason for hiding this comment

dmessengermoj left a comment

Choose a reason for hiding this comment

koladeadewuyi-moj commented Oct 12, 2023

koladeadewuyi-moj commented Oct 6, 2023 •

edited

Loading

`TFSEC Scan` Success

`Checkov Scan` Success

`CTFLint Scan` Success

`TFSEC Scan` Success

`Checkov Scan` Success

`CTFLint Scan` Success

`TFSEC Scan` Success

`Checkov Scan` Success

`CTFLint Scan` Success

`TFSEC Scan` Success

`Checkov Scan` Success

`CTFLint Scan` Success

`TFSEC Scan` Success

`Checkov Scan` Success

`CTFLint Scan` Success

`TFSEC Scan` Success

`Checkov Scan` Success

`CTFLint Scan` Success

`TFSEC Scan` Success

`Checkov Scan` Success

`CTFLint Scan` Success