Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remediate findings from Provenance Script Testing #55

Closed
sjoshi-jpl opened this issue Aug 2, 2023 · 6 comments
Closed

Remediate findings from Provenance Script Testing #55

sjoshi-jpl opened this issue Aug 2, 2023 · 6 comments
Assignees
Labels

Comments

@sjoshi-jpl
Copy link
Contributor

While testing the new registry-sweepers I am noticing that for all the nodes (domains), when the provenance script reaches the point where it's trying to write files to the db, its taking up a significant amount of FreeStorageSpace for that specific node cluster.
Ex: When running IMG provenance task (1 vCPU, 8GB RAM - doesn't seem enough as it runs for longer than an hour), it brought down the FreeStorageSpace from 43 GB to 2 GB. The storage space returns to normal once the task completes.

Per discussion with @jordanpadams @tloubrieu-jpl @alexdunnjpl this is expected behavior for heavy-writes. Following are my remediation suggestions.

  1. Increase CloudWatch evaluation period for all alarms to 5 mins instead of 1 min (did this already after noticing alerts today). This should give provenance task some additional time to complete without throwing alerts but it won't help in all cases because some nodes run much longer than others (ex: GEO, IMG).
  2. Increase the volume size of the OpenSearch nodes for which the provenance task is significantly impacting the FreeStorageSpace (EN, GEO, IMG, RMS, SBNPSI).
  3. For nodes that are heavily used, we can increase the provenance task vCPU / memory.
@sjoshi-jpl sjoshi-jpl self-assigned this Aug 2, 2023
@sjoshi-jpl
Copy link
Contributor Author

EN - Increase EBS volume to 60GB
ATM - No change
IMG - Increase EBS volume to 60GB, Increase task size to 2 vCPU, 12 GB RAM
RMS - Increase task size to 1 vCPU, 8 GB
GEO - Increase task size to 2 vCPU, 12 GB
NAIF - No change
PPI - No change
PSA - No change
SBNPSI - No change (for now, although it threw an alert increasing the evaluation period should help here)
SBNUMD - No change

@sjoshi-jpl sjoshi-jpl transferred this issue from NASA-PDS/nasa-pds.github.io Aug 2, 2023
@sjoshi-jpl
Copy link
Contributor Author

sjoshi-jpl commented Aug 2, 2023

Opened DSIO #4280 for increasing EBS volume size (IMG and EN OpenSearch nodes)

@sjoshi-jpl
Copy link
Contributor Author

@tloubrieu-jpl @jordanpadams after weighing all available options, it looks like our best bet here is to increase the volume size from 100 to 120 per node. Approval received from Jordan, will work with SA team.

@tloubrieu-jpl
Copy link
Member

That sounds good, thanks @sjoshi-jpl

@sjoshi-jpl
Copy link
Contributor Author

sjoshi-jpl commented Aug 8, 2023

DSIO-4306 created with SA team. Once completed, I'll need to revise each task definition for registry-sweeper to write to it's own log group.

@sjoshi-jpl
Copy link
Contributor Author

All tasks completed. We have individual log groups for each node.

@github-project-automation github-project-automation bot moved this from Release Backlog to 🏁 Done in B14.0 Sep 7, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
No open projects
Status: 🏁 Done
Development

No branches or pull requests

3 participants