-
Notifications
You must be signed in to change notification settings - Fork 33
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
API Interoperability first draft #72
base: main
Are you sure you want to change the base?
Conversation
Thanks for getting this written down! A few questions (which are maybe also questions about the linked CoEpi API?):
|
Thanks for the comments @hdevalence! I think 1 and 2 might be questions for the CoEpi API, I'm not sure what the exact plan was for intervals.
Thanks! |
Re (4), the difference is just that in one case the report has an extra 64-byte signature attached to it, so that the client will have to parse (the 64 bytes) and process (verifying the signatures) the response differently. So either the server could signal in-band that the data is one or the other kind, or else we could have two endpoints, like |
|
|
Actually, I have another recommendation for defining the batches that allows us to keep the interval length tunable and does not look as awkward as it does now. Each batch can be defined by a combination of dayNumber (seconds since epoch / 24 x 3600) and intervalNumber that corresponds to the interval starting on the day. So for example, if intervalLength is 6 hours, intervalNumber will be in range [0, 3]. So, something like this:
This will allow us to keep the interval_length flexible starting on a new day and the client does not have to worry about time zones. How does that sound ? |
interoperability.md
Outdated
| -------- | ---------------------- | | ||
| intervalNumber | **Required:** Positive integer that corresponds to a specific fixed time interval and can be calculated as (unix_time / time_interval_ms) | | ||
| date | **Optional:** Date in RFC3339 standard in UTC, without the time component. If not provided, default value of today's date is used | | ||
| intervalLengthMs | **Optional:** The interval length in milliseconds used by the client to calculate intervalNumber. The server will respond with a 401 if the interval length does not match the expected length. | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If intervalNumber is provided, this is not optional
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @bhushanRamnani, I have updated the doc :)
interoperability.md
Outdated
##### Query Parameters | ||
| Name | Description | | ||
| -------- | ---------------------- | | ||
| intervalNumber | **Required:** Positive integer that corresponds to a specific fixed time interval and can be calculated as (unix_time / time_interval_ms) | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also optional. If not provided, then the default is the latest interval
would it make sense to just lock the interval down and inform the client of server set interval? if we let the client choose what he can download, he can choose from about 95 intervals for each day?. it's still cacheable but do we need such flexibility? if the server can tell the client his "last/current" time and what interval its using, do we even need to consider timezones? i would prefer a simple apporach which does not need to take timezone etc. into consideration. if one implementation is bugged or localtime is off we might be behind in getting updates. |
What's the reasoning behind these intervals exactly? Is that to make caching more flexible, or... ? Why can't we just have a set interval and the user can select from one of the "blocks" of reports. Maybe I'm missing something here, but wouldn't that work and simply the implementation? |
@kiwo Essentially we are locking the interval down and not letting client choose a random interval length. However, in future the server might have to change the interval length. Hence we're asking the client to verify the intervalLength it's using for calculating the intervalNumber. If there are too few reports in an interval then we loose the obfuscation of randomization. If there are too many, then download time will increase for a single API call (we could mitigate this by adding pagination in future versions). Right now, we're also using intervals for pagination. In my previous post I recommended using dayNumber which will not depend on timezone. @calmandniceperson The interval length is essentially fixed, with an option for the server to change it in future while maintaining backwards-compatibility. The client is sending intervalLength only for verification that it used the right one. Client is not choosing the intervalLength at random. It's now allowed to. The server defines it and may change it starting on a new day. |
(Moving here the Slack discussion) I still don't fully understand why we can't just send interval number and length. Unless I'm misunderstanding its meaning, the date is derivable from interval number and length. Why can't the api calculate the date and check whether the interval length is valid using this date? It wouldn't be different than if we do this calculation in the client. Also, the point in time when the api changes the interval length could be defined using Unix time as well, so we don't have to worry about timezones anywhere. @bhushanRamnani You also wrote:
If the server changes the interval, can't it create new buckets for the old data using the new interval (without discarding the old ones, of course, for backwards compatibility)? Otherwise it sounds like the clients need to remember the valid interval lengths for specific time periods and this is complexity we'd prefer to avoid. Note also that the server can send the current interval length to the client (e.g. as part of a configuration api call that we could do when the app comes to the foreground), minimizing the time periods for which we'd have to support backwards compatibility (from weeks/months to max 1-2 days). We could even remove the need for backwards compatibility entirely, by having the server return the currently valid interval length if the client asks for an outdated/wrong one. |
@i-schuetz
This would require resharding all previous report data, replicating to a new datastore and switching the datastore in a way that does not impact availability. It will also mean all the previous reports that were cached in CDN's would need to be cached again.
Yes that's why I recommended adding an API endpoint to retrieve server length for date ranges. I understand that sending the date along with the intervalNumber is counter-intuitive without the context for backwards-compatibility, that's why I recommended going with the below approach for representing the interval, that's much more intuitive:
|
@bhushanRamnani
How often do we expect to have to change the interval? If it's seldom enough, it may be worth it? The client would always fetch the current valid interval length before starting to download reports. If the interval length is changed while it's fetching reports (it gets an invalid interval response), it just starts again. There would be no need for backwards compatibility and the logic would be much simpler. Edit: Another option which crossed my mind would be that the client doesn't send any interval data, except the timestamp where it wants to start fetching. The server sends back the data + interval length, which the client would use to calculate the next timestamp. Would that make sense? If we assume that the cache is re-created when the interval length is changed, these responses should also be cacheable? |
I think it just comes down to whether re-sharding the CDN and datastore is worth it, every time the initervalLength changes, for the sake of slightly simplifying the query format. My opinion on that is it's not. It's hard to predict right now how often we'll need to do that. It depends on the rate of reports generated at a given location.
Timestamp in the query would be too granular for caching to be effective. We could instead provide an API endpoint that provides an intervalLength for a timestamp range, but no way for the server to know whether this information was actually used to accurately derive the intervalNumber. Again, I'd like to avoid re-generating the cache and the data-store if possible. |
@bhushanRamnani I'm not thinking about the query format, but the additional logic required in the clients to support multiple interval lengths. It's finicky and more error prone than just using always one (variable) length. But I can't evaluate the effort / cost involved in re-generating the cache and have no idea how frequently it will be needed, so I'll rely on your judgement. If we need to add this logic to the client, the API service to map time periods to interval lengths would be mandatory, as there doesn't seem to be other way to determine them. A client could be offline a long while and the interval lengths could have changed multiple times during this time. Edit: |
Yes understood, there are two ways to solve this. Client sends the get request with a length it knows about for a dayNumber and interval, if server does not agree that to be correct for the day, responds back with a 401 error code and a config map in the body:
The other option is to provide this information in a simple get endpoint |
@bhushanRamnani @i-schuetz and I had a call to settle this and agreed that we shouldn’t need to include the date in v4 backend API calls, just interval length and interval number. we’re hard-coding the interval length in the clients for now, at 6h, and deferring, for now, the work to make that configurable via a server endpoint. |
As the various apps get started on backend implementations, we need to have a common API defined to ensure interoperability with minimal code redesign. This initial draft reflects discussions that have been going on in slack within the backend and server2server channels.