Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: during live requests, the server returns a cursor for the client to use for cache-busting #1826

Merged
merged 13 commits into from
Oct 10, 2024

Conversation

KyleAMathews
Copy link
Contributor

@KyleAMathews KyleAMathews commented Oct 9, 2024

This PR is a fix for inconsistencies in caching in http proxying while clients are long-polling. It also adds public to our cache-control header as that's required by some http proxies in order to cache.

HTTP Proxies don't treat the max-age in cache-control exactly the same way. Some start counting the age of the cache from the beginning of the request while others count from the end of the request.

This inconsistency makes it difficult to reliably control caching and request collapsing behavior for long-polling requests.

My previous PR in this area #1656 made request collapsing work nicely with proxies with the first behavior as they'd collapse all requests within the time from the start of a long-poll and the end of the max-age. And when the client went to request again after the long-poll had ended, the previous request cache had expired already so a new request would get sent to the origin.

However, this approach caused issues with proxies with the second behavior as request collapsing would work but when the client re-polled, the cache hadn't yet expired so the client would go into an infinite loop requesting the same cached response over and over.

So this PR adds a cursor generated by the server that clients use as part of live requests. This skips by any caches from the previous live request (which on proxies with the first behavior, would have expired already).

The cursor is generated by finding the next alignment boundary. I.e. if the timeout is 20 seconds (which it is now but this could change) then we calculate the alignment boundary by taking the current unix timestamp and subtracting the Electric Epoch of October 9th, 2024 then dividing by 20 and rounding up and the multiplying by 20 again.

In practice this partitions caches for live requests for a given offset into 20 second windows.

Copy link

netlify bot commented Oct 9, 2024

Deploy Preview for electric-next ready!

Name Link
🔨 Latest commit b63ee9a
🔍 Latest deploy log https://app.netlify.com/sites/electric-next/deploys/6707c8e7287e6e00087d44c1
😎 Deploy Preview https://deploy-preview-1826--electric-next.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site configuration.

Copy link
Contributor

@balegas balegas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, this will work. Now we can increase the max-age if we want, right? More collapsing, more data at the edge.
It also helps with nextjs caching behaviour, which I was sidestepping by adding a random to the URL... sounds familiar.

I've approved, but might be useful if team has a look at the Elixir before merging.

@KyleAMathews
Copy link
Contributor Author

Now we can increase the max-age if we want, right

Yup — we could make it configurable. Probably also in practice we'll need to let clients set the long-polling seconds as well as there's some proxies that have pretty short timeouts. But the default could be much higher.

Copy link
Contributor

@msfstef msfstef left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What an annoying issue from lack of standardisation! I like having a custom cache cursor though for the live requests, relatively cheap to implement and maintain and gives us better control.

I've left some comments and questions for clarification

packages/typescript-client/src/constants.ts Outdated Show resolved Hide resolved
packages/typescript-client/src/client.ts Outdated Show resolved Hide resolved
@@ -143,6 +144,7 @@ export class ShapeStream<T extends Row<unknown> = Row>
>()

#lastOffset: Offset
#nextLiveCursor: string // Seconds since our Electric Epoch 😎
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: slight inconsistency in naming between "live next cursor" and "next live cursor" - maybe we can even name it "live cache buster" or "live cache cursor" to make its purpose explicit in its name?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

true — I'll fix

@@ -153,6 +155,7 @@ export class ShapeStream<T extends Row<unknown> = Row>
validateOptions(options)
this.options = { subscribe: true, ...options }
this.#lastOffset = this.options.offset ?? `-1`
this.#nextLiveCursor = ``
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what does this mean for the request collapsing behaviour on the first live request, before a shared cursor is retrieved? as I'm thinking about it I don't think it's anything serious but worth clarifying

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah the initial request is different so collapsing/caching will be different. I couldn't think of any way around this but it's also not that big of deal as it's just one more request basically getting to Electric.

website/electric-api.yaml Show resolved Hide resolved
now = DateTime.utc_now()

diff_in_seconds = DateTime.diff(now, oct9th2024, :second)
next_interval = ceil(diff_in_seconds / 20) * 20
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is an arbitrary 20 here even though it's supposed to be related to the long poll timeout - perhaps make this function take an "interval size" as a parameter and use conn.assigns.config[:long_poll_timeout] when it gets called to ensure it stays consistent?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh sweet! I didn't know there was a config already for this — I'll switch to it.

@@ -23,6 +23,18 @@ defmodule Electric.Plug.ServeShapePlug do
@up_to_date [Jason.encode!(%{headers: %{control: "up-to-date"}})]
@must_refetch Jason.encode!([%{headers: %{control: "must-refetch"}}])

defmodule TimeUtils do
def seconds_since_oct9th_2024_next_interval do
oct9th2024 = DateTime.from_naive!(~N[2024-10-09 00:00:00], "Etc/UTC")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you can store this value as an alias in this module, like

@oct9th2024 DateTime.from_naive!(~N[2024-10-09 00:00:00], "Etc/UTC")
def seconds_since_oct9th_2024_next_interval do
...

so it gets calculated once and reused rather than parsing the date on every call

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok good idea as this will be called a ton

@KyleAMathews KyleAMathews merged commit 41845cb into main Oct 10, 2024
22 of 23 checks passed
@KyleAMathews KyleAMathews deleted the live-cursor branch October 10, 2024 15:37
@thruflo thruflo mentioned this pull request Oct 10, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants