-
Notifications
You must be signed in to change notification settings - Fork 42
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
enumerate {source,submission,reply} UUIDs, then fetch subsets #1549
Comments
The transcripts and numbers above come from a development server running with this diff.diff --git a/securedrop/journalist_app/api.py b/securedrop/journalist_app/api.py
index f1b96531f..64d8adf00 100644
--- a/securedrop/journalist_app/api.py
+++ b/securedrop/journalist_app/api.py
@@ -153,11 +153,15 @@ def make_blueprint(config: SDConfig) -> Blueprint:
return abort(403, "Token authentication failed.")
@api.route("/sources", methods=["GET"])
- @token_required
def get_all_sources() -> Tuple[flask.Response, int]:
sources = Source.query.filter_by(pending=False, deleted_at=None).all()
return jsonify({"sources": [source.to_json() for source in sources]}), 200
+ @api.route("/source_ids", methods=["GET"])
+ def get_all_source_ids() -> Tuple[flask.Response, int]:
+ sources = Source.query.filter_by(pending=False, deleted_at=None).all() # could SELECT only Source.uuid
+ return jsonify({"sources": [source.uuid for source in sources]}), 200
+
@api.route("/sources/<source_uuid>", methods=["GET", "DELETE"])
@token_required
def single_source(source_uuid: str) -> Tuple[flask.Response, int]: With this "back-of-the-envelope" sketch done, I won't do any further work on this until the team has had a chance to consider it along with the other architectural questions raised by freedomofpress/securedrop#5104. That said, as we refine our architectural decision-making process, I'll be happy to write up a proposal for formal consideration at any time. |
We also have access to 'last_updated' for sources, so we could compare that with the local value to determine which source have changed server-side and need to be resynced. Initial syncs on large datasets would still suck, but subsequent syncs would be pretty minimal. |
The pattern I recommend for this would be to have:
For example the responses could be like this:
|
Currently, |
With an individual api call verifying whether a source conversation changed since a given timestamp, it might be worth looking more at batching API requests; so for example a syncing operation would then consist of:
With changed sources flagged as such in the client db, you could then flag them in the list view and make subsequent batch calls to update them, or if you were really lazy/bandwidth conscious you could maybe also defer updating them until they were actually displayed in the conversation view (downside of that is a display lag when a user selects an updated source). This is straight-up request bundling in terms of the patterns mentioned above, but it does have the advantages of a) being easy to reason about because you can still think in terms of api calls for individual source conversations and b) being a straightforward operation with minimal data transfer for unchanged or deleted sources. This is pretty similar to what @cfm is suggesting originally, tbh, but it also provides tunability (the batch size can be configurable) and it reduces the number of follow-up calls to the bare minimum. |
(...but all of this would as @rocodes points out elsewhere be a pretty big change to the current sync behaviour, which basically seems to mirror the sources, replies, submissions, and journalists tables en masse. It is not a quick win.) |
default_request_timeout
#1547 (among others; see below)Description
securedrop-client
'sMetadataSyncJob
currently fetches in order from the Journalist API's/{sources,submissions,replies}
endpoints, then downloads individual submissions and replies. To take just the/sources
endpoint as an example, on a development server with 350 sources, this is a significant (~1 MiB) response to fetch in every sync cycle:Instead,
securedrop-client
could fetch from a more-compact (~15 KiB)/source_uuids
endpoint:securedrop-client
could update its local datastore with the current set ofsource_uuid
s, then issue follow-up requests to/sources/<source_uuid>
for (arbitrarily-sized) subsets ofsource_uuids
. This would offer a compromise between the pagination pattern, which is challenging to reconcile with our UUID-based keying and the possibility of deletion operations initiated from both the server and the client sides; and the request-bundle pattern, which is not scalable for large numbers of sources (and submissions and replies). (Thanks to @gonzalo-bulnes for reminding me to check the literature here; other references welcome!)In future versions of this API, the
/source_uuids
endpoint could also indicate a per-record version, so thatsecuredrop-client
(or other consumers) would know what subset ofsource_uuids
needed to be refreshed. Or freedomofpress/securedrop#5104 may point towards other, richer synchronization strategies between the Journalist API and its consumers.How will this impact SecureDrop users?
A lighter-weight metadata-fetching strategy would be faster, especially across Tor; more responsive to new data, especially on heavily-loaded instances, and more resilient to network hiccups.
How would this affect the SecureDrop Workstation threat model?
No Workstation-level threat-model implications.
See also
The text was updated successfully, but these errors were encountered: