Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

release v0.90.0 #1417

Closed
wants to merge 8 commits into from
Closed

release v0.90.0 #1417

wants to merge 8 commits into from

Conversation

kishore03109
Copy link
Contributor

New

Dependencies

  • chore(deps): minor upgrade aws #1416
  • chore(deps): bump braces from 3.0.2 to 3.0.3 #1413

Tests

feat(monitoring): add scheduler functionality #1383

Screenshot 2024-05-21 at 11 20 59 AM on deployment, assert that you see these logs. it is ok for there to be multiple instances of this log (it directly corresponds to the number of instances that we have) since bullmq is smart enough to only create one queue, and one repeatable job over multiple instances.

feat(monitoring): add dns reporter #1376

in server.ts add:
monitoringService.driver()

should see this in the logs:

Screenshot 2024-05-15 at 5.48.05 PM.png

* fix: off-by-one error for month number (#1294) #1309

fix(dockerfile): revert to copy . #1304
ci(sidecar): add deploy files for prod #1285

See base PR!

ref(app): shift support flows into separate folder + ecs service #1269

These tests have been done on staging, but the success has been checked only via email (ie, receive success email = good enough)

  1. submit the staging site launch form
  2. should receive an email regarding the dns records to update
  3. submit the staging site creation form
  4. should receive a success email
  5. submit staging audit logs form
  6. should receive success email
  7. submit staging site link checker form
  8. should have logs ("*link*" in isomer-infra or isomer (not entirely sure why, suspect it's because our project tag in infra repo is isomer-infra which leads to the tag being isomer-infra)
    Screenshot 2024-04-04 at 1 40 03 PM
  9. submit staging site repair form
  10. should see success email
fix: off-by-one error for month number #1294
  • Use the form for staging (find in 1PW) to request for the site audit logs
  • Request for logs for the previous calendar month for a site that exists on staging (and you are either an Isomer Admin or collaborator of)
  • Verify that the commits in the audit logs only show those in the previous calendar month (i.e. March)

* fix(server): server should die if unable to connect to db (#1265) #1273

Improve APM spans (no more <anonymous>) #1267
  • Load the CMS and perform some actions, all happy paths should work
  • Find a span for a Review compare, and verify the span name is compareDiff (NOT <anonymous>)

* chore(package): use npm (#1237) #1248

feat(dd): add traces to gitfilesysteM #1240
  • Log in on staging, make an edit and save. Action should be successful
fix(dockerfile): add dig to image #1244
  • ecs exec into staging
  • run dig www.google.com

* build(deps): bump @aws-sdk/client-secrets-manager (#1218) #1235

fix(link checker): wrong error reported #1227
perf(I/O): rm blocking fs calls #1220
  • Run the following command from the command line:
grep -rE '(accessSync|appendFileSync|chmodSync|chownSync|closeSync|copyFileSync|cpSync|existsSync|fchmodSync|fchownSync|fdatasyncSync|fstatSync|fsyncSync|ftruncateSync|futimesSync|lchmodSync|lchownSync|lutimesSync|linkSync|lstatSync|mkdirSync|mkdtempSync|opendirSync|openSync|readdirSync|readFileSync|readlinkSync|readSync|readvSync|realpathSync|renameSync|rmdirSync|rmSync|statSync|statfsSync|symlinkSync|truncateSync|unlinkSync|utimesSync|writeFileSync|writeSync|writevSync)\b' src 

the only results should be from the GitFileSystemService.spec.ts which is fine since this test file runs locally and not in prod line

  • Submit a form here for a repo in staging efs, and assert that the attachments are sent properly

  • submit the site create form

* fix(otp): increment instead of update for concurrency (#1186) #1202

#1186 - @alexanderleegs

  • Use the script provided in the VAPT report on page 17 and 18
  • Adjust the URL to point to your test instance
  • Adjust the email address to be one that is valid (i.e. your own account) and attempt to log in (without keying in the correct OTP)
  • Run the script and verify that you hit the max attempts after 5 tries

#1196 - @alexanderleegs

  • connect to ogp vpn
  • run node ddos.js
  • assert that the remaining counter fell from 100
    Screenshot 2024-03-08 at 9 09 39 AM
  • note the reset time (this is the window time, and by extension the amount of time to wait for this test)
  • unconnect from vpn
  • run node ddos.js
  • assert that the remaining counter fell from 100
    Screenshot 2024-03-08 at 9 09 39 AM
  • After the reset time is achieved, do above steps again and verify that after the reset time, the counters for both the simulated user resets.
    Screenshot 2024-03-08 at 9 14 36 AM

#1197 - @dcshzj

Check that the following endpoints do not throw an error from validation:

  • Create collaborator
  • Feedback
  • Create Review Request
  • Update review request
  • Create Comment
  • get preview info
  • Verify email otp
  • Verify mobile otp
    • Specifically, verify that the /mobile/verifyOtp endpoint no longer accepts an array for mobile
  • Sgid login

fix(repoChecker): unintended alarms #1176

Screenshot 2024-02-29 at 8 59 26 AM

  • once in staging, use the form to run the checker on multiple repos in efs
  • notice the lack of "SiteCheckerError" AND the lack of "failed to push some refs". this is important as to not create alarms

Release/0.66.2 #1145

Release/0.66.1 #1143

Deploy Notes

feat(monitoring): add scheduler functionality #1383

corresponding infra pr should be deployed to production and only then should the redis host value be populated into the 1pw for production.

Additionally, post approval of this pr, add two alarms, one for
Error running monitoring service and another for Monitoring service has failed. These are errors when the job has failed to be initalised, and when there is a new error.

New environment variables:

  • REDIS_HOST : Redis host
    • added env var to 1PW + SSM script (fetch_ssm_parameters.sh)

New dependencies:

  • bullmq : scheduler of choice

feat(monitoring): add dns reporter #1376

New environment variables:

  • KEYCDN_API_KEY : to get all the zones that we own in keycdn
  • S3_BUCKET_NAME: bucket name
    • HAVE NOT added env var to 1PW + SSM script

(fetch_ssm_parameters.sh)

New scripts:

  • script : script details

New dependencies:

  • dependency : dependency details

New dev dependencies:

  • dependency : dependency details

* fix: off-by-one error for month number (#1294) #1309

fix(dockerfile): revert to copy . #1304

New environment variables:

  • env var : env var details
    • added env var to 1PW + SSM script (fetch_ssm_parameters.sh)

New scripts:

  • script : script details

New dependencies:

  • dependency : dependency details

New dev dependencies:

  • dependency : dependency details
fix: off-by-one error for month number #1294

None

Full Changelog: https://github.com/isomerpages/isomercms-backend/compare/v0.78.1..v0.79.0

* fix(server): server should die if unable to connect to db (#1265) #1273

Full Changelog: https://github.com/isomerpages/isomercms-backend/compare/v0.75.0..v0.76.0

* chore(package): use npm (#1237) #1248

feat(dd): add traces to gitfilesysteM #1240

This PR records a lot of new spans into traces. Basically ALL GitFileSystem operations are now instrumented.
During release, close attention need to be given to system load to ensure the new instrumentation is not adding too high a CPU cost to the system.

Full Changelog: https://github.com/isomerpages/isomercms-backend/compare/v0.72.0..v0.73.0

* build(deps): bump @aws-sdk/client-secrets-manager (#1218) #1235

fix(link checker): wrong error reported #1227

New environment variables:

  • env var : env var details
    • added env var to 1PW + SSM script (fetch_ssm_parameters.sh)

New scripts:

  • script : script details

New dependencies:

  • dependency : dependency details

New dev dependencies:

  • dependency : dependency details
perf(I/O): rm blocking fs calls #1220

New environment variables:

  • env var : env var details
    • added env var to 1PW + SSM script (fetch_ssm_parameters.sh)

New scripts:

  • script : script details

New dependencies:

  • dependency : dependency details

New dev dependencies:

  • dependency : dependency details

Full Changelog: https://github.com/isomerpages/isomercms-backend/compare/v0.71.0..v0.72.0

Release/0.66.2 #1145

New environment variables:

  • env var : env var details
    • added env var to 1PW + SSM script (fetch_ssm_parameters.sh)

New scripts:

  • script : script details

New dependencies:

  • dependency : dependency details

New dev dependencies:

  • dependency : dependency details

Release/0.66.1 #1143

New environment variables:

  • env var : env var details
    • added env var to 1PW + SSM script (fetch_ssm_parameters.sh)

New scripts:

  • script : script details

New dependencies:

  • dependency : dependency details

New dev dependencies:

  • dependency : dependency details

Full Changelog: https://github.com/isomerpages/isomercms-backend/compare/v0.89.0..v0.90.0

alexanderleegs and others added 8 commits June 13, 2024 14:57
The following vulnerabilities are fixed with an upgrade:
- https://snyk.io/vuln/SNYK-JS-WS-7266574

Co-authored-by: snyk-bot <[email protected]>
Bumps [braces](https://github.com/micromatch/braces) from 3.0.2 to 3.0.3.
- [Changelog](https://github.com/micromatch/braces/blob/master/CHANGELOG.md)
- [Commits](micromatch/braces@3.0.2...3.0.3)

---
updated-dependencies:
- dependency-name: braces
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
## Problem

This is a first pr that is up to add some level of sane reporting. 
While scheduling is part of this feature, it is not within the scope of this pr. This pr only adds (currently dead code) logic to grab the domains that we own in isomer, and do a dns dig. This is meant to be verbose, and in the future alarms can be added based on the results of this. 


This is not meant to replace monitoring, it is just meant to fine tune some blind spots that uptime robot currently has + some sane checker during incident response to show history of dns records for a site that we manage.

I am opting to log it directly in our backend to keep things simple. will add alarms + the scheduler in subsequent prs. 

## Solution

grab ALL domains from keycdn + amplify + redirection records + log dns records on them. 

**Breaking Changes**

<!-- Does this PR contain any backward incompatible changes? If so, what are they and should there be special considerations for release? -->

- [ ] Yes - this PR contains breaking changes
  - Details ...
- [X] No - this PR is backwards compatible with ALL of the following feature flags in this [doc](https://www.notion.so/opengov/Existing-feature-flags-518ad2cdc325420893a105e88c432be5)


## Tests

<!-- What tests should be run to confirm functionality? -->

in server.ts add: 
`monitoringService.driver()`

should see this in the logs:

![Screenshot 2024-05-15 at 5.48.05 PM.png](https://graphite-user-uploaded-assets-prod.s3.amazonaws.com/4JosFH65rhzwIvkZw2J6/2bf61e7f-0ec4-466f-87b7-ec7e1d84993e.png)


## Deploy Notes

<!-- Notes regarding deployment of the contained body of work.  -->
<!-- These should note any new dependencies, new scripts, etc. -->

**New environment variables**:

- `KEYCDN_API_KEY` : to get all the zones that we own in keycdn
- `S3_BUCKET_NAME`: bucket name 
    - [ ] HAVE NOT added env var to 1PW + SSM script


 (`fetch_ssm_parameters.sh`)

**New scripts**:

- `script` : script details

**New dependencies**:

- `dependency` : dependency details

**New dev dependencies**:

- `dependency` : dependency details
## Problem

This is the second part of the monitoring feature that we want to build. This PR only cares about adding a scheduler + the related infra needed for this to function. this will make the monitor run once every 5 mins, for oncalls to pick any related alarms from this.

Adding the alarms is done in the downstream PR . 

## Solution
Using bullmq to conveniently create a queue, a worker and a repeatable job over multiple instances. We do some level of exponential backoff retries since it is a nice to have and easy to implement.  The original `/site-up` code has since been refactored to return an `err` or a `ok`, depending on whether the configuration is ideal. 
Unfortunately, this caused quite a number of edge cases to pop up. Due to the nature of this, a more loose check of whether the isomer logo is present is being used to determine if a site is up. 
Even with this loose check, we have a `workplacelearning.gov.sg` who have modified their site to not have the Isomer logo. Have used gb to code white list this weird site. Potentially, if tomorrow we have an alarm of a site going down, but this is expected to prolong, we can go to growthbook and change the config for this to be whitelisted. 

**Breaking Changes**

<!-- Does this PR contain any backward incompatible changes? If so, what are they and should there be special considerations for release? -->

- [ ] Yes - this PR contains breaking changes
  - Details ...
- [X] No - this PR is backwards compatible with ALL of the following feature flags in this [doc](https://www.notion.so/opengov/Existing-feature-flags-518ad2cdc325420893a105e88c432be5)

## Tests
<img width="951" alt="Screenshot 2024-05-21 at 11 20 59 AM" src="https://github.com/isomerpages/isomercms-backend/assets/42832651/2a79df20-75c5-4c47-8d69-f030ca64cf3d">
on deployment, assert that you see these logs. it is ok for there to be multiple instances of this log (it directly corresponds to the number of instances that we have) since bullmq is smart enough to only create one queue, and one repeatable job over multiple instances. 
<!-- What tests should be run to confirm functionality? -->

## Deploy Notes



corresponding infra pr should be deployed to production and only then should the redis host value be populated into the 1pw for production. 

Additionally, post approval of this pr, add two alarms, one for 
`Error running monitoring service` and another for `Monitoring service has failed`. These are errors when the job has failed to be initalised, and when there is a new error. 

**New environment variables**:

- `REDIS_HOST` : Redis host 
    - [ ] added env var to 1PW + SSM script (`fetch_ssm_parameters.sh`)

**New dependencies**:

- `bullmq` : scheduler of choice
@kishore03109 kishore03109 mentioned this pull request Jun 27, 2024
@kishore03109 kishore03109 deleted the release_v0.90.0 branch June 27, 2024 07:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants