Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ref(app): shift support flows into separate folder + ecs service #1269

Merged
merged 39 commits into from
Apr 15, 2024

Conversation

seaerchin
Copy link
Contributor

@seaerchin seaerchin commented Apr 4, 2024

NOTE: production deploy stuff is in #1285

Problem

At present, some internal flows trigger a huge spike in CPU usage, which will erroneously trigger an alarm. additionally, this CPU spike might cause a degradation on performance of the actual app.

Closes ISOM-811

Solution

  1. use a separate ecs service to do it. we did not go with a sidecar container as was initially suggested. this is because ecs scaling is done via metrics and the most fine-grained metric we have at present (that is relevant) is ecs cpu usage by service. as the sidecar is still part of the service, this would cause the support container to trigger scaling (and still alarms) on the service as a whole, which doesn't solve the problem.
  2. shift shared services out into /common -> this is to facilitate sharing of services between support and src. the new service listens on port 8082 - this is because on local, we want to expose both services ports to the host machine (hence cannot use same port).
  3. added deployment step for staging only for support - added 2 more params to the aws_deploy workflow to support this. (as service + dockerfile is now different)
  4. add task-def for support - we cannot reuse existing one because it's a new ecs service.

Breaking Changes

  • Yes - this PR contains breaking changes
    • not user facing, but the existing forms will not work and will require an update to <stg | prod>-support-api/v2/infra/formsg

Tests

These tests have been done on staging, but the success has been checked only via email (ie, receive success email = good enough)

  1. submit the staging site launch form
  2. should receive an email regarding the dns records to update
  3. submit the staging site creation form
  4. should receive a success email
  5. submit staging audit logs form
  6. should receive success email
  7. submit staging site link checker form
  8. should have logs ("*link*" in isomer-infra or isomer (not entirely sure why, suspect it's because our project tag in infra repo is isomer-infra which leads to the tag being isomer-infra)
    Screenshot 2024-04-04 at 1 40 03 PM
  9. submit staging site repair form
  10. should see success email

@seaerchin seaerchin changed the title Ref/sidecar ref(app): shift support flows into separate folder + ecs service Apr 4, 2024
@seaerchin seaerchin marked this pull request as ready for review April 4, 2024 05:44
@seaerchin seaerchin requested a review from a team April 4, 2024 05:44
"logConfiguration": {
"logDriver": "awslogs",
"options": {
"awslogs-group": "/aws/elasticbeanstalk/cms-backend-staging-node18/var/log/web.stdout.log",
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

open to creating another log group, nbd

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 to creating another log group

Maybe we can drop elasticbeanstalk from the name too...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ya can, i think i'll change it to an ecs-prefixed log group post tear down of eb

"name": "support",
"portMappings": [
{
"containerPort": 8082,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

only change between this and backend

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm why do we not hahve a stg variant of this ah?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this deploys to staging! i think u mean prod? will do in a folloow up PR

common/index.ts Outdated
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no big changes, just copied out services from server.js -> here + changed from require to import

Comment on lines +12 to +18
COPY ../src ./src
COPY ../common ./common
COPY ../support ./support
COPY ../package.json .
COPY ../package-lock.json .
COPY ../scripts ./scripts
COPY ../tsconfig.json .
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

change 1 - we selectively copy rather than just COPY . .; we should be able to omit COPY ../src ./src if we are rigorous about separation of concerns but for now this is ok. build times are not impacted too much cos both jobs run in parallel

RUN echo " name = Isomer Admin" >> /home/webapp/.gitconfig
RUN echo " email = [email protected]" >> /home/webapp/.gitconfig

EXPOSE "8082"
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

need this because we using port 8082

Copy link
Contributor

@kishore03109 kishore03109 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm, i felt that we should use the side car for more intensive operations only (eg site launch + create does not seem like intensive ops, but this is a nit)

"name": "support",
"portMappings": [
{
"containerPort": 8082,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm why do we not hahve a stg variant of this ah?

res: Response<ResBody, Locals>,
next: NextFunction
) => void
> = ExpressHandler<P, ResBody, ReqBody, ReqQuery, Locals>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm context behind this change? did we type this wrongly ah?

Copy link
Contributor Author

@seaerchin seaerchin Apr 8, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the change was made to keep our types neater. we already have a global request.d.ts, which overrides the Request type exported originally!

this means that the original typing here is superfluous and hence, removed

RUN adduser -u 900 webapp -D -h /home/webapp -s /bin/sh
USER webapp

COPY ../src ./src
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if we are spiltting this up, should we split this up for the main docker file as well?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

to clarify, we want to omit copying of support into our main Dockerfile?

we totally can honestly, will make the change. best is if we can omit copying src to the support dockerfile but it doesn't impact build times much because the src folder is small.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done, as a sense check, i only copied folders not prefixed with a . - these correspond to visible folders on our filesystem.

Screenshot 2024-04-09 at 12 20 59 PM

of the folders seen above, i omitted node_modules (via .dockerignore), support (not used) and docs (no runtime impact).

lmk if it should be otherwise. i've also copied patches over to Dockerfile.support to support dd-trace w/ neverthrow

Copy link
Contributor

@timotheeg timotheeg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking good!

Is the PR missing a file for prod?

.aws/deploy/support-task-definition.prod.json

support/routes/v2/formsg/index.ts Show resolved Hide resolved
Comment on lines +122 to +125
{ "name": "OTP_EXPIRY", "valueFrom": "STAGING_OTP_EXPIRY" },
{ "name": "OTP_SECRET", "valueFrom": "STAGING_OTP_SECRET" },
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should some of these variables removed from the support container? I assume they are not useful there? I don't have a sense of the full list of what's needed and not, but general principle is to only give the app what it needs.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i copied it because i didn't want to sift through and manually find out what env vars are/are not used.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bit painful yes, but we should do it anyway for good hygiene. If the service doesn't need to know of certain secrets or values, we shouldn't pass them in.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sounds good! you ok if i do this in a follow-up? my immediate priority now is to get functionality up and merged - will address this hygiene issue downstream

Comment on lines 249 to 257
"name": "DD_SERVICE",
"value": "isomer"
},
{
"name": "DD_TAGS",
"value": "service:isomer"
},
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should use a different service name for this (maybe isomer-support), so we can also see the ECS infra metrics and the endpoint performance under the service

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i don't mind this - should we also shift logs?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes! Since we are creating a new service, we need to have the telemetry set up and linked well, so we all the drill down capability of DD work.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sgtm!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

changed to isomer-support (keep things consistent throughout between files/service)

i've changed it on 3 areas:

  1. support container itself (done via hard-coded env vars instead of via secrets injection through SSM - idt secrets injection adds much here and obfuscates the actual value itself, which is annoying)
  2. updated env vars on dd-agent container too (in support-task-def)
  3. updated dockerLabels on dd-agent

this should allow us to get the tagging separately, do check if i missed out anything!

"logConfiguration": {
"logDriver": "awslogs",
"options": {
"awslogs-group": "/aws/elasticbeanstalk/cms-backend-staging-node18/var/log/web.stdout.log",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 to creating another log group

Maybe we can drop elasticbeanstalk from the name too...

.aws/deploy/support-task-definition.staging.json Outdated Show resolved Hide resolved
@seaerchin
Copy link
Contributor Author

seaerchin commented Apr 8, 2024

Looking good!

Is the PR missing a file for prod?

.aws/deploy/support-task-definition.prod.json

@timotheeg yes it is! as mentioned, the CNAME setup + deploy workflow hasn't been done and will be in another PR

@seaerchin
Copy link
Contributor Author

@timotheeg

for the issues listed below, i'll do in another PR

  1. removing extra env vars -> functionality now is intact, want to get this out and fix hygiene issues later (will raise issue)
  2. creating separate log group -> this is because teh creation of a separate log group requires pulumi up + the PR itself to be approved on infra prior to the PR being raised

@seaerchin seaerchin requested a review from timotheeg April 9, 2024 05:13
@seaerchin seaerchin requested a review from a team April 9, 2024 09:30
},
{
"name": "DD_TAGS",
"value": "team:isomer"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In case the infra linkage doesn't work (happened in armoury), we might need to set the service here too:

Suggested change
"value": "team:isomer"
"value": "team:isomer,service:isomer-support"

cicd-role: "arn:aws:iam::095733531422:role/isomer-infra-github-oidc-role-16ea937"
ecr-repository: "isomer-infra-staging-ecr"
ecs-cluster-name: "isomer-stg-ecs"
ecs-web-service-name: "stg-support-ecs-service"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Naming convention doens't seem to match the deploy-app name? Did you mean

Suggested change
ecs-web-service-name: "stg-support-ecs-service"
ecs-web-service-name: "isomer-support-stg-ecs-service"

@seaerchin seaerchin merged commit b821ef3 into develop Apr 15, 2024
5 checks passed
@seaerchin seaerchin deleted the ref/sidecar branch April 15, 2024 11:37
seaerchin added a commit that referenced this pull request Apr 15, 2024
* fix: off-by-one error for month number (#1294)

* ref(app): shift support flows into separate folder + ecs service (#1269)

* feat(caddyfile): add initial caddyfile

* refactor(infra): shift infra stuff to new folder

* chore(formsg): shift stuff into sub folder

* refactor(server): shift infra stuff out

* refactor(common): shift shared stuff into common folder

* build(package/tsconfig): update packages + paths

* refactor(server): swap to ts

* refactor(request): update type

* feat(infra/routes): add new router

* refactor(router): shift initialisation in

* refactor(middleware): shift shared stuff out

* chore(exports): add exports from indexs

* chore(featureflag): disable single export

* feat(docker): add containers in docker-compose for infra

* feat(deploy): update deploy stuff

* chore(cleanup): remove unused/extra code in infra/server

* fix(aws_deploy): update deploy file

* fix(task def): removed portmappings

* fix(dockerfile): copy src files over

* fix(docker): update port mappings as ecs uses same network

* fix(caddy): update to use local dockerfile

* fix(caddyfile): upate proxies

* fix(caddyfile): try to netweork via loopback

* revert(aws_deploy): revert previous changes

* revert(deploy_staging): revert

* revert(task def): revert be task def

* revert(src): revert files that were used for sidecar

* refactor(support): shift into new container

* fix(docker): update dockerfile + add health check

* fix(aws_deploy): update tag

* refactor(index): remove unsnued

* fix(app spec): do replacement for container name and port

* fix(ci): update image names

* feat(support): connect db

* chore(request): remove unused import

* refactor(dockerfile): update to explicitly copy

* chore(index): add a line space

* refactor(support task def): update so that dd-service is `isomer-support` and team is isomer

* chore(formsg tests): shift tests from `src` -> `support`

* ci(sidecar): add deploy files for prod (#1285)

* feat(deploy_prod): update workflow to include deploy_support

* feat(support-task-def): add prod

* fix(task def): update dd tags to include service

* fix(dockerfile): revert to copy . (#1304)

* chore: bump version to v0.79.0

---------

Co-authored-by: Harish <[email protected]>
Co-authored-by: Hsu Zhong Jun <[email protected]>
Co-authored-by: Timothee Groleau <[email protected]>
This was referenced Jun 27, 2024
@dcshzj dcshzj mentioned this pull request Jun 27, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants