Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Setup configuration to run job that generates extracts of Opportunity DB tables for analytics daily #3160

Closed
2 tasks
chouinar opened this issue Dec 10, 2024 · 8 comments · Fixed by #3511
Closed
2 tasks
Assignees

Comments

@chouinar
Copy link
Collaborator

chouinar commented Dec 10, 2024

Summary

Scheduling the job + configuring

Prerequisite work needs to be done to create an s3 bucket for us to write to

Acceptance criteria

  • Job should run daily (early morning, let's say 5am)
  • Job needs the ANALYTICS_DB_CSV_FILE_PATH env var set with a path to an s3 bucket that both the API and analytics can reach (API will write output here)
@chouinar
Copy link
Collaborator Author

Can't configure this until we have a bucket that the API can write to and analytics code read from.

@chouinar chouinar moved this from Blocked to In Progress in Simpler.Grants.gov Product Backlog Dec 17, 2024
@chouinar
Copy link
Collaborator Author

Assigning @coilysiren , we need The analytics analytics-dev-app user needs to have access to the new api-dev-api-analytics-transfer bucket, it’s missing s3:GetObject permissions (maybe more, s3 always needs like 3 specific ones to read).

@chouinar chouinar moved this from In Progress to Todo in Simpler.Grants.gov Product Backlog Dec 20, 2024
coilysiren added a commit that referenced this issue Jan 3, 2025
## Summary

Relates to #3160

### Time to review: __2 mins__

## Changes proposed

- Writes the iterative s3 bucket ids and arns to ssm params
- Reads the api's variants of those ssm params into the analytics
application
- Gives the analytics application access to the api's s3 buckets

## ⚠️ Deploy Warning ⚠️ 

This creates a dependency between the API infra and Analytics infra.
Because of this, there will be a race condition between the two
deployments. If a deploy is broken, then the fix is just to deploy
again.
@chouinar chouinar moved this from Todo to In Progress in Simpler.Grants.gov Product Backlog Jan 6, 2025
@chouinar chouinar moved this from In Progress to Blocked in Simpler.Grants.gov Product Backlog Jan 6, 2025
@chouinar
Copy link
Collaborator Author

chouinar commented Jan 6, 2025

@coilysiren - After the PR, I'm still seeing (AccessDenied) when calling the GetObject operation when calling as the analytics-dev-app user.

@coilysiren
Copy link
Collaborator

👀

@coilysiren
Copy link
Collaborator

The deploy failed, so that's probably the cause

@coilysiren
Copy link
Collaborator

@chouinar try it again!

@chouinar chouinar moved this from Blocked to In Progress in Simpler.Grants.gov Product Backlog Jan 6, 2025
@coilysiren coilysiren removed their assignment Jan 6, 2025
@chouinar
Copy link
Collaborator Author

chouinar commented Jan 6, 2025

@coilysiren - Copying from thread:

Need to make both the API and analytics terraform both be able to reference the API-analytics bucket for an env var. Easy to do for the API, put it in s3_buckets.tf - but not sure how the analytics code would reference it since it doesn't have that file.

coilysiren added a commit that referenced this issue Jan 8, 2025
## Summary

Relates to #3160

### Time to review: __2 mins__

## Changes

Adds env vars for both API and analytics sides of the transfer bucket

## Testing

I ran a test deploy on both API and analytics
coilysiren added a commit that referenced this issue Jan 9, 2025
## Summary

Relates to #3160

### Time to review: __2 mins__

## Changes

Adds env vars for both API and analytics sides of the transfer bucket

## Testing

I ran a test deploy on both API and analytics
@coilysiren coilysiren moved this from In Progress to In Review in Simpler.Grants.gov Product Backlog Jan 9, 2025
@chouinar chouinar moved this from In Review to In Progress in Simpler.Grants.gov Product Backlog Jan 13, 2025
chouinar added a commit that referenced this issue Jan 14, 2025
## Summary
Fixes #3160

### Time to review: __5 mins__

## Changes proposed
Configures an API job to create a CSV DB extract to run daily

Configure an Analytics job to parse the CSV DB extract to run daily

Env var renames to match what is configured in terraform

## Context for reviewers
Jobs are pretty simple, just set them up to run 60 minutes apart for
now. From testing, the first job took less than a minute to run in dev,
so no concern with it finishing in time.

## Additional information
Reran the jobs locally to verify every env var is connected properly
coilysiren added a commit that referenced this issue Jan 21, 2025
)

## Summary

Relates to #3160

### Time to review: __1 mins__

## Changes proposed

I put this particular IAM permission in the wrong place, I'm fairly sure

## Testing

I cant test this right now because I'm being blocked by
#3590
coilysiren added a commit that referenced this issue Jan 21, 2025
## Summary

Relates to #3160

### Time to review: __1 mins__

## Context for reviewers

#3589 was half done, this
PR finishes it

## Testing

I've deployed this to make sure the infra works. I tested the step
function and:

<img width="1318" alt="image"
src="https://github.com/user-attachments/assets/dc25faed-c2dd-4a5e-af4a-bced51ed2abd"
/>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Development

Successfully merging a pull request may close this issue.

2 participants