Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC for mobile app backend stack #170

Draft
wants to merge 3 commits into
base: main
Choose a base branch
from
Draft
Show file tree
Hide file tree
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
80 changes: 80 additions & 0 deletions rfc-170-mobile-app-backend-stack.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,80 @@
---
status: proposed
implementation: proposed
status_last_reviewed:
---

# Back end stack for GOV.UK App backends
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This decision feels like it's heavily entwined with GDS' future organisation design, which is itself somewhat unknown.

There are two paths I think we could follow - be consistent with GOV.UK, or be consistent with DI. If the app ends up close to DI in terms of organisation structure, then it will have been better to have been consistent with DI. If the app ends up close to GOV.UK, then it will have been better to have been consistent with GOV.UK.

A big reason for that is support - who will be supporting this infrastructure? Does it need 24x7 support? If there's any intent to piggy back on GOV.UK or DI's existing arrangements, then being consistent with the infrastructure those teams currently use will be a requirement (or at the very least, highly desirable).


## Summary

This RFC proposes a back end architecture and tech stack for internet-facing services (i.e. APIs) which underpin the GOV.UK mobile app. It draws on the existing work carried out on the Digital Identity programme to define an extensible, scalable architecture which meets the needs of a high-volume mobile application, and create a solid base for future development.

The proposal is to adopt an AWS serverless stack, in line with DI, with the predominant implementation approach being to run Node.js Lambda functions to execute business logic.

## Problem

The new GOV.UK App is going to require back end services to enable many of its features. The app is a shared product with Digital Identity, where the existing ID Check app that they have in production will be superseded by a new, single app that incorporates both Digital Identity and GOV.UK features.

The DI Mobile team have an existing set of back ends which are implemented in line with the Digital Identity architecture - namely, AWS serverless using Lambda functions (primarily Node.js with some Java). The decision making process behind this is recorded in the Digital Identity architecture repostitory (for example, [ADR-0001](https://github.com/govuk-one-login/architecture/blob/main/adr/0001-auth-hosting.md)) _(N.B. this is currently a private repo)_.

As GOV.UK, we need to determine a tech stack for back end services which underpin the GOV.UK elements of the app.

The following are key considerations we need to take into account:

1. There is a single app, and it would be most straightforward to integrate the app with a single back end. This means we only need a single set of configuration rather than multiple settings for different features. Additionally, we can centralise concerns like logging, monitoring and alerting.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm interested why exactly we'd need a single backend. Have we considered the implications DI running a separate API endpoint for the identity functionality, vs separate APIs for (potentially hosted in GOV.UK's platform) for other features.

Clearly defined APIs for these supporting services could facilitate the decoupling of these backend services from the App team. I'm concerned that without this separation, the scope of the App team's responsibilities could expand excessively, encompassing a wide range of backend APIs. These could instead be developed and maintained independently by other relevant teams.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Particularly if these backend services have the potential to be leveraged by not just the App.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Particularly if these backend services have the potential to be leveraged by not just the App.

Totally agree, I'm sure both AI and App have some common requirements for such an API.

1. In all likelihood, the various mobile back end services across both DI and GOV.UK will _eventually_ end up being supported and maintained by the same group of people. Therefore, parity of skillset is an advantage.
1. There are broader conversations across GDS looking into our strategy around tech stacks, and we should fold in to those

## Options

We have considered two principal options for tech stacks:

1. Adopt a serverless stack on AWS
2. Adopt a Ruby-based containerised stack on AWS

### Option 1 - Adopt a serverless stack on AWS

![Architecture diagram showing a simplified approach to running serverless applications on AWS](rfc-170/serverless.png)

Under this option, we would adopt a serverless stack which broadly mirrors the approach taken on DI. Connectivity would be provided by API Gateways (see note below about the proposed approach to this). Compute would be based on Lambda functions, running the Node.JS runtime.

The Lambda functions would be able to call out to other AWS services as needed. While the subject of this RFC is predominantly the connectivity and compute (i.e. API Gateway and Lambda), the proposal would also be to adopt other cloud-native services which commonly form part of a serverless approach, namely using DynamoDB for storage.

This option has the following advantages:
* Essentially unlimited scalability without the need for active management
* Cost-efficient, pay-for-what-you-use pricing (we can get some cost reports from DI if this would be helpful)
* A step towards no-ops, where we are not responsible for or have the cost/burden of infra maintenance as this would be handled by AWS, so there are no [tricky manual steps](https://guides.rubyonrails.org/upgrading_ruby_on_rails.html) to upgrade to latest versions of underpinning software (only deprecated runtimes need to be considered)
* Lambda functions are small and self-contained, thus easy to reason about. They are written in JavaScript, which is familiar to many developers and therefore the learning curve is not too steep
* Aligns with the DI tech stack so that the entire mobile back end estate could be maintained by a single group of people
* Benefit from existing DI infrastructure for NFRs like monitoring / logging / alerting
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are advantages if the app sits close to DI in the org structure. They're disadvantages if it sits close to GOV.UK (e.g. if GOV.UK needed to support this infrastructure using its existing structures, it would be problematic to have to learn new monitoring / logging tools)


The principal drawback of this option is that it is not currently a stack that is widely used and understood across GOV.UK, so would require some upskilling - particularly in terms of infrastructure management and use of AWS SAM (although we could look at an alternative to this like CDK).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there are quite a few more details we'll need to work out if we go down this route:

  • Which AWS organisation would our accounts be part of? Would we follow DI's approach with control tower / many AWS accounts per service?
  • What would the deployment pipeline look like? Would we have integration / staging / production environments like GOV.UK? Or would we have a different pattern like DI presumably have?
  • Which team would be responsible for infrastructure support? This could be the app team itself, or DI's infrastructure team, or GOV.UK's platform engineering team. The latter would probably not be able to support such a divergent estate


### Option 2 - Adopt a Ruby-based containerised stack on AWS
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
### Option 2 - Adopt a Ruby-based containerised stack on AWS
### Option 2 - Adopt a Ruby-based containerised stack on AWS

(nit: non-breaking space)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think option 2 should be more of a "consistent with GOV.UK" option. If we were going down that route, I would suggest using our existing kubernetes clusters, and building services using Ruby on Rails with Postgres for persistence. I wouldn't expect you to have new AWS accounts or really any new infrastructure in this model at all - we'd just reuse what we have.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to second this - we've already build a platform to run containerised applications in GOV.UK, which I'd thought we can just reuse. If there are concerns as to why we couldn't, it be good to understand what those are.


![Architecture diagram showing a possible approach to running containerised Rails applications on AWS](rfc-170/containers.png)

In this option, we would build a Rails application (potentially more than one, if we deemed it correct from an architectural perspective). This Rails application would sit in a Docker container, and we would host it on Amazon Elastic Container Service (ECS) - or Elastic Kubernetes Service (EKS) if deemed a better fit.

As noted in the diagram, there are multiple ways to route internet traffic through to ECS or EKS, of which API Gateway is only one. It is not strictly needed because the Rails apps themselves can handle routing of requests through their controller classes, but it is included here as it would help to standardise the approach along with DI - and would potentially allow us to still use a single endpoint for all app back end concerns.

The principal advantage of this approach is that it draws on the existing skills and knowledge within GOV.UK around building and running Rails apps. Containers are a recognised, modern way of running web framework-based services on cloud providers.

There are however numerous downsides to this approach:
* Inconsistent tech stack across the app means we could end up with one team having to maintain services built in very different paradigms. Additionally we would have to create separate monitoring and logging stacks across the two approaches
* Whilst containers (and container-hosting services) are designed to scale up and down flexibly, there is manual configuration, intervention and adjustment needed to do this effectively

## Proposal

The proposal is to adopt option 1, serverless applications on AWS, using Lambda functions written in Node.js. There should be some flex around some of the specifics - for example, we may decide that proxying to static files on S3 is a better fit for some workloads, or that DynamoDB is not the right option for data storage for some requirements. But the guiding principle should be that for compute, we use Lambda functions.

## Appendix - Potential sub-options

The following are potential sub-approaches we could take, which may address some of the considerations above. They are open to discussion.

### Option 1A
Use serverless but write Lambda functions in Ruby. This approach enables us to make use of the existing Ruby expertise within GOV.UK, but still allows us to get the benefits of the serverless paradigm and align broadly with the DI tech stack. The reason this isn’t the preferred option is that it introduces an additional language into the ‘target stack’. Both Ruby and Node.js are easy to work with on Lambda and so the difference is judged to be minimal.

### Option 2A
Use containers but embed them within AWS Fargate. Fargate is a semi-managed ‘serverless’ compute engine for containers which takes care of some concerns such as scaling and server management. DI do use Fargate for some front ends, but the feedback from the team is that it is quite difficult to work with, still requires quite a lot of manual intervention, and can be expensive. Therefore we have discounted this option here.
Binary file added rfc-170/containers.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added rfc-170/serverless.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.