-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add ETag-caching and AWS SDK v2 support #287
Conversation
49f9a7f
to
9b755d7
Compare
trait S3Client { | ||
def get(bucket: String, path: String): Future[FaciaResult] | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note the com.gu.facia.client.S3Client
trait still exists, it's just moved to the fapi-client-core
artifact - couldn't delete it without badly breaking binary compatability for existing users!
9b755d7
to
54a1fc5
Compare
This is a hack, and it probably won't be able to fully conceal that fact. The facia-scala-client, in normal use, is asked for collections that don't yet have collection JSON, and may never have it (apparently, if a collection has never been edited, its collection JSON won't exist). In the past, this would be handled by generating a `FaciaNotFound` response for `com.gu.facia.client.S3Client.get()` which `ApiClient` would cheerfully _not_ throw an exception for: https://github.com/guardian/facia-scala-client/blob/f9830fd286d409662be850ff441792ec03c1c2e2/facia-json/src/main/scala/com/gu/facia/client/ApiClient.scala#L25-L27 ...so encountering the missing resource would not be logged as an exception, **which is the desired behaviour**, see guardian/facia-scala-client#32 However, as my attempt (guardian/facia-scala-client#287) to introduce ETag-caching & AWS SDK v2 support rejects the `com.gu.facia.client.S3Client` trait, in favour of adapting `com.gu.facia.client.ApiClient`, I need to provide that same exception-supressing behaviour. The task is complicated because currently `ETagCache` holds a scaffeine cache with **value** type `ETaggedData[V]` - there's no successful response for the cache loader to return for keys that can denote 'resource missing' - any successful value must have an ETag for one thing, as `ETaggedData` requires it, but when a resource is missing, we have no ETag. This commit tried to deal with the problem by replacing the `S3Exception` with a `CancellationException`, which caffeine special-cases as exception to _not_ log: ben-manes/caffeine@1e52b10 ...however this gets even yuckier than we might have thought, because it means that `facia-scala-client`, as the consumer of `etag-caching`, has to know that it must now try to catch `CancellationException`, rather than `S3Exception`, when it is trying to do it's _own_ suppression of exception-logging.
As discussed at standup, I'm going to do a release of |
This is a small upgrade, catching up with the recent dependency updates of https://github.com/guardian/facia-scala-client/releases/tag/v4.0.6, before the more extensive update in guardian/facia-scala-client#287 is introduced. This update has already been tested in Ophan's PromotionPoller with guardian/ophan#5540, successfully deployed to Prodution. I've switched from `fapi-client-play27` to `fapi-client-play28` - the `editors-picks-uploader` is an AWS Lambda, it doesn't actually run the Play framework, so specifying a higher Play version doesn't really affect it! The `fapi-client` uses `play-json` to do its JSON-parsing (that's all that the `-play2x` suffix indicates), so we might as well use the latest version that's available (allowing `facia-scala-client` to drop support for `play27` at some future point).
This is a small upgrade, catching up with the recent dependency updates of https://github.com/guardian/facia-scala-client/releases/tag/v4.0.6, before the more extensive update in guardian/facia-scala-client#287 is introduced. This update has already been tested in Ophan's PromotionPoller with guardian/ophan#5540, successfully deployed to Prodution.
This is a small upgrade, catching up with the recent dependency updates of https://github.com/guardian/facia-scala-client/releases/tag/v4.0.6, before the more extensive update in guardian/facia-scala-client#287 is introduced. This update has already been tested in Ophan's PromotionPoller with guardian/ophan#5540, successfully deployed to Prodution.
This is a small upgrade, catching up with the recent dependency updates of https://github.com/guardian/facia-scala-client/releases/tag/v4.0.6, before the more extensive update in guardian/facia-scala-client#287 is introduced. This update has already been tested in Ophan's PromotionPoller with guardian/ophan#5540, successfully deployed to Prodution. Note that as https://github.com/guardian/frontend is _already_ using Play 2.8, it should probably be using `fapi-client-play28`, rather than `fapi-client-play27`, so I've also updated that.
This is a hack, and it probably won't be able to fully conceal that fact. The facia-scala-client, in normal use, is asked for collections that don't yet have collection JSON, and may never have it (apparently, if a collection has never been edited, its collection JSON won't exist). In the past, this would be handled by generating a `FaciaNotFound` response for `com.gu.facia.client.S3Client.get()` which `ApiClient` would cheerfully _not_ throw an exception for: https://github.com/guardian/facia-scala-client/blob/f9830fd286d409662be850ff441792ec03c1c2e2/facia-json/src/main/scala/com/gu/facia/client/ApiClient.scala#L25-L27 ...so encountering the missing resource would not be logged as an exception, **which is the desired behaviour**, see guardian/facia-scala-client#32 However, as my attempt (guardian/facia-scala-client#287) to introduce ETag-caching & AWS SDK v2 support rejects the `com.gu.facia.client.S3Client` trait, in favour of adapting `com.gu.facia.client.ApiClient`, I need to provide that same exception-supressing behaviour. The task is complicated because currently `ETagCache` holds a scaffeine cache with **value** type `ETaggedData[V]` - there's no successful response for the cache loader to return for keys that can denote 'resource missing' - any successful value must have an ETag for one thing, as `ETaggedData` requires it, but when a resource is missing, we have no ETag. This commit tried to deal with the problem by replacing the `S3Exception` with a `CancellationException`, which caffeine special-cases as exception to _not_ log: ben-manes/caffeine@1e52b10 ...however this gets even yuckier than we might have thought, because it means that `facia-scala-client`, as the consumer of `etag-caching`, has to know that it must now try to catch `CancellationException`, rather than `S3Exception`, when it is trying to do it's _own_ suppression of exception-logging.
54a1fc5
to
93211ce
Compare
98baf91
to
b1fbfdb
Compare
adc4af5
to
00f2690
Compare
private def retrieve[A: Format](key: String): Future[Option[A]] = s3Client.get(bucket, key) map { | ||
case FaciaSuccess(bytes) => | ||
Some(Json.fromJson[A](Json.parse(new String(bytes, Encoding))) getOrElse { | ||
throw new JsonDeserialisationError(s"Could not deserialize JSON in $bucket, $key") | ||
}) | ||
case FaciaNotAuthorized(message) => throw new BackendError(message) | ||
case FaciaNotFound(_) => None | ||
case FaciaUnknownError(message) => throw new BackendError(message) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This code needed to be refactored a bit to extract the new parseBytes()
method (so it could be used for the parsing phase of the new caching code) - consequently lines 22-24 above go into that new method.
62fdb9c
to
ba8e60c
Compare
@rtyley has published a preview version of this PR with release workflow run #59, based on commit ba8e60c: 13.1.1-PREVIEW.add-etag-caching-support.2025-01-07T1516.ba8e60c2 Want to make another preview release?Click 'Run workflow' in the GitHub UI, specifying the add-etag-caching-support branch, or use the GitHub CLI command: gh workflow run release.yml --ref add-etag-caching-support Want to make a full release after this PR is merged?Click 'Run workflow' in the GitHub UI, leaving the branch as the default, or use the GitHub CLI command: gh workflow run release.yml |
releaseVersion := fromAggregatedAssessedCompatibilityWithLatestRelease().value, | ||
// releaseVersion := fromAggregatedAssessedCompatibilityWithLatestRelease().value, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have to comment this out due to this issue, as we're introducing the new fapi-client-core
artifact:
...we'll re-enable it immediately after the next release.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Re-enabled with 42fa0bd.
ba8e60c
to
2175db0
Compare
This change adds these improvements: * Facia data is only re-downloaded & re-parsed if the S3 content has _changed_ (thanks to [ETag-caching](https://github.com/guardian/etag-caching)) * Independence from AWS SDK version (v1 vs v2) _(this PR can replace #286 The [ETag-caching](https://github.com/guardian/etag-caching) library itself is also being used in DotCom PROD, introduced with guardian/frontend#26338. ### Usage ```scala import com.gu.etagcaching.aws.sdkv2.s3.S3ObjectFetching import com.gu.facia.client.{ApiClient, Environment} import software.amazon.awssdk.services.s3.S3AsyncClient val s3AsyncClient = S3AsyncClient.builder().region(...).credentialsProvider(...).build() val apiClient = ApiClient.withCaching( "facia-tool-store", Environment.Prod, S3ObjectFetching.byteArraysWith(s3AsyncClient) ) ``` _PR using this updated version of the FAPI client: https://github.com/guardian/ophan/pull/6741_ ### Independence from AWS SDK version (v1 vs v2) Ideally, the whole of `facia-scala-client` would be independent of AWS SDK version - we'd _like_ consumers of this library to be able to use whatever AWS SDK version they want, without us pulling in dependency on either SDK version. For `facia-scala-client` this is an attainable goal, as the only AWS API action it performs is fetching from S3, and [guardian/etag-caching](https://github.com/guardian/etag-caching) provides the [`S3ByteArrayFetching`](https://github.com/guardian/etag-caching/blob/v6.0.0/aws-s3/base/src/main/scala/com/gu/etagcaching/aws/s3/package.scala#L6-L22) abstraction that encapsulates this action without tying to a specific AWS SDK version. Due to legacy code compatibility, we can't completely remove AWS SDK v1 from `fapi-client` for now, but we _have_ removed it from `fapi-client-core`, which is the new home of `com.gu.facia.client.ApiClient`, which is now a trait, with 2 constructor methods that provide different implementations: * **`ApiClient()`** - legacy, using the existing `com.gu.facia.client.S3Client` abstraction on S3 behaviour * **`ApiClient.withCaching()`** - provides ETag-based caching and is independent of AWS SDK version - the consumer just needs to provide an appropriate instance of `com.gu.etagcaching.aws.sdkv2.s3.S3ObjectFetching` (ie with `"com.gu.etag-caching" %% "aws-s3-sdk-v2"` and `S3ObjectFetching.byteArraysWith(s3AsyncClient)`, introduced with guardian/etag-caching#65) ### Solved problems * **Noisy logging associated with absent collection JSON** - the `etag-caching` library has been updated with guardian/etag-caching#56 to avoid excessive logging that would occur in the Facia client, due to it typically trying to access collections that aren't yet persisted: #32.
2175db0
to
4db52b9
Compare
@rtyley has published a preview version of this PR with release workflow run #62, based on commit 4db52b9: 14.0.1-PREVIEW.add-etag-caching-support.2025-01-08T1105.4db52b99 Want to make another preview release?Click 'Run workflow' in the GitHub UI, specifying the add-etag-caching-support branch, or use the GitHub CLI command: gh workflow run release.yml --ref add-etag-caching-support Want to make a full release after this PR is merged?Click 'Run workflow' in the GitHub UI, leaving the branch as the default, or use the GitHub CLI command: gh workflow run release.yml |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking great 😁
}) | ||
private def parseBytes[B: Format](bytes: Array[Byte]): B = | ||
Json.fromJson[B](Json.parse(new String(bytes, Encoding))) getOrElse { | ||
throw JsonDeserialisationError(s"Could not deserialize JSON") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It might be nice to log out the actual error here - maybe one for a future pr? @rtyley
With guardian/facia-scala-client#287, the Facia client has been updated so that it can take advantage of ETag-caching, using the https://github.com/guardian/etag-caching library already used in Frontend since August 2023 with #26338. As some changes were necessary to the `etag-caching` library to facilitate this, we need to update the version of `etag-caching` used by Frontend to v7.0.0. We also need to use an updated version of CAPI, as the CAPI version used by Facia was recently updated with guardian/facia-scala-client#341. Although this change upgrades to the latest version of the Facia client, it doesn't take advantage of the new ETag-caching support there - that can come in a later PR, as it does involve switching to AWS SDK v2 (not everywhere in Frontend, just for where we use the Facia client, but it's still a significant change as the AWS STS credential creation has got more elaborate with AWS SDK v2).
With guardian/facia-scala-client#287, the Facia client has been updated so that it can take advantage of ETag-caching, using the https://github.com/guardian/etag-caching library already used in Frontend since August 2023 with #26338. As some changes were necessary to the `etag-caching` library to facilitate this, we need to update the version of `etag-caching` used by Frontend to v7.0.0. Although this change upgrades to the latest version of the Facia client, it doesn't take advantage of the new ETag-caching support there - that can come in a later PR, as it does involve switching to AWS SDK v2 (not everywhere in Frontend, just for where we use the Facia client, but it's still a significant change as AWS STS credential creation has got more elaborate with AWS SDK v2).
With guardian/facia-scala-client#287, the Facia client has been updated so that it can take advantage of ETag-caching, using the https://github.com/guardian/etag-caching library already used in Frontend since August 2023 with #26338. As some changes were necessary to the `etag-caching` library to facilitate this, we need to update the version of `etag-caching` used by Frontend to v7.0.0. This does involve switching to AWS SDK v2 - not everywhere in Frontend, just for where we use the Facia client!
With guardian/facia-scala-client#287, the Facia client has been updated so that it can take advantage of ETag-caching, using the https://github.com/guardian/etag-caching library already used in Frontend since August 2023 with #26338. As some changes were necessary to the `etag-caching` library to facilitate this, we need to update the version of `etag-caching` used by Frontend to v7.0.0. This does involve switching to AWS SDK v2 - not everywhere in Frontend, just for where we use the Facia client!
With guardian/facia-scala-client#287, the Facia client has been updated so that it can take advantage of ETag-caching, using the https://github.com/guardian/etag-caching library already used in Frontend since August 2023 with #26338. As some changes were necessary to the `etag-caching` library to facilitate this, we need to update the version of `etag-caching` used by Frontend to v7.0.0. This does involve switching to AWS SDK v2 - not everywhere in Frontend, just for where we use the Facia client!
With guardian/facia-scala-client#287, the Facia client has been updated so that it can take advantage of ETag-caching, using the https://github.com/guardian/etag-caching library already used in Frontend since August 2023 with #26338. As some changes were necessary to the `etag-caching` library to facilitate this, we need to update the version of `etag-caching` used by Frontend to v7.0.0. This does involve switching to AWS SDK v2 - not everywhere in Frontend, just for where we use the Facia client!
With guardian/facia-scala-client#287, the Facia client has been updated so that it can take advantage of ETag-caching, using the https://github.com/guardian/etag-caching library already used in Frontend since August 2023 with #26338. As some changes were necessary to the `etag-caching` library to facilitate this, we need to update the version of `etag-caching` used by Frontend to v7.0.0. This does involve switching to AWS SDK v2 - not everywhere in Frontend, just for where we use the Facia client!
With guardian/facia-scala-client#287, the Facia client has been updated so that it can take advantage of ETag-caching, using the https://github.com/guardian/etag-caching library already used in Frontend since August 2023 with #26338. As some changes were necessary to the `etag-caching` library to facilitate this, we need to update the version of `etag-caching` used by Frontend to v7.0.0. This does involve switching to AWS SDK v2 - not everywhere in Frontend, just for where we use the Facia client!
With guardian/facia-scala-client#287, the Facia client has been updated so that it can take advantage of ETag-caching, using the https://github.com/guardian/etag-caching library already used in Frontend since August 2023 with #26338. As some changes were necessary in the `etag-caching` library to facilitate this, we need to update the version of `etag-caching` used by Frontend to v7.0.0. We're also upgrading the S3 client (used by Frontend's instance of Facia's `ApiClient`) to AWS SDK v2, as `etag-caching` already provides a convenient implementation of `S3ByteArrayFetching` (see guardian/etag-caching#65) for AWS SDK v2 - if we continued using AWS SDK v1 here in Frontend, we'd have to create an implementation for it. ## See also * guardian/maintaining-scala-projects#9 * guardian/facia-scala-client#287
With guardian/facia-scala-client#287, the Facia client has been updated so that it can take advantage of ETag-caching, using the https://github.com/guardian/etag-caching library already used in Frontend since August 2023 with #26338. As some changes were necessary in the `etag-caching` library to facilitate this, we need to update the version of `etag-caching` used by Frontend to v7.0.0. We're also upgrading the S3 client (used by Frontend's instance of Facia's `ApiClient`) to AWS SDK v2, as `etag-caching` already provides a convenient implementation of `S3ByteArrayFetching` (see guardian/etag-caching#65) for AWS SDK v2 - if we continued using AWS SDK v1 here in Frontend, we'd have to create an implementation for it. ## See also * guardian/maintaining-scala-projects#9 * guardian/facia-scala-client#287
https://github.com/guardian/etag-caching is a Scala library that can help reduce CPU & Network consumption for any service that polls-&-parses S3 or HTTP resources. It also provides abstractions that can allow API-independence from AWS SDK version (ie S3 use not directly tied to AWS SDK v1 or v2). It's already used by these projects: * guardian/frontend#26338 * guardian/facia-scala-client#287 Doing a quick search for S3-polling code at the Guardian there are plenty of other examples of where it could be used: * https://github.com/guardian/typerighter/blob/b6f8a3daa959ee1a9478e7260ba8badf8797eb9c/apps/checker/app/services/MatcherProvisionerService.scala#L72-L83 * https://github.com/guardian/mobile-n10n/blob/75cfbfff1ec52a332f1d04a63680f126fcbd6cf1/notification/app/notification/NotificationApplicationLoader.scala#L75-L77 * https://github.com/guardian/pan-domain-authentication/blob/1db3864e00aae3258f8506c95a713e9e52487e61/pan-domain-auth-core/src/main/scala/com/gu/pandomainauth/PanDomainAuthSettingsRefresher.scala#L35-L36 * https://github.com/guardian/permissions/blob/b1a91466838024be7d0230a9d5a846ef7b83361a/client/src/main/scala/com/gu/permissions/PermissionsProvider.scala#L140-L150
This change adds these improvements:
The ETag-caching library itself is also being used in DotCom PROD, introduced with guardian/frontend#26338.
Usage
PR using this updated version of the FAPI client: https://github.com/guardian/ophan/pull/6741
Independence from AWS SDK version (v1 vs v2)
Ideally, the whole of
facia-scala-client
would be independent of AWS SDK version - we'd like consumers of this library to be able to use whatever AWS SDK version they want, without us pulling in dependency on either SDK version.For
facia-scala-client
this is an attainable goal, as the only AWS API action it performs is fetching from S3, and guardian/etag-caching provides theS3ByteArrayFetching
abstraction that encapsulates this action without tying to a specific AWS SDK version.Due to legacy code compatibility, we can't completely remove AWS SDK v1 from
fapi-client
for now, but we have removed it fromfapi-client-core
, which is the new home ofcom.gu.facia.client.ApiClient
, which is now a trait, with 2 constructor methods that provide different implementations:ApiClient()
- legacy, using the existingcom.gu.facia.client.S3Client
abstraction on S3 behaviourApiClient.withCaching()
- provides ETag-based caching and is independent of AWS SDK version - the consumer just needs to provide an appropriate instance ofcom.gu.etagcaching.aws.sdkv2.s3.S3ObjectFetching
(ie with"com.gu.etag-caching" %% "aws-s3-sdk-v2"
andS3ObjectFetching.byteArraysWith(s3AsyncClient)
, introduced with Improve API discoverability/clarity for devs withS3ByteArrayFetching
etag-caching#65)Solved problems
etag-caching
library has been updated with Avoid Caffeine exceptions on missing values etag-caching#56 to avoid excessive logging that would occur in the Facia client, due to it typically trying to access collections that aren't yet persisted: Don't flake out on 404s #32.