Add key cache #5184

rmol · 2020-04-01T20:05:35Z

Status

Ready for review

Description of Changes

Moves the CryptoUtil caching to Redis. This reduces total memory usage compared to keeping the caches in the WSGI processes, and eliminates the possibility that the cache would have to be repopulated if WSGI processes restart.

Adds caching of public keys, not just fingerprints.

Renames CryptoUtil.getkey and CryptoUtil.export_pubkey to better match what they do.

Adds some detail to the output of create-dev-data.py to better indicate progress.

Uses 1024-bit keys if the environment variable SECUREDROP_ENV is test or dev, which speeds up the local dev server startup with large numbers of sources.

Fixes #5183.

Testing

Run the dev server with the environment variable NUM_SOURCES set to a large number, like 500 or 1000. Use the test script from the #5100 test plan. Running it against this branch should show faster response times than master, once the caches are populated.

Checking Redis memory usage

Stop Apache with service apache2 stop.

In redis-cli:

List all keys with keys *
Delete the Redis CryptoUtil cache keys if they exist:
- del sd/crypto-util/fingerprints
- del sd/crypto-util/keys
Execute info and look for used_memory_human. It should be under
a megabyte, e.g. used_memory_human:517.83K

Start Apache with service apache2 start.

Wait for the caches to populate. It took a few minutes on my system. You can watch for the Apache and GPG processes to drop out of top, then check the number of keys in the Redis caches in redis-cli with hlen sd/crypto-util/fingerprints and hlen sd/crypto-util/keys.

Execute info in redis-cli and look for used_memory_human. With dev/test 1024-bit keys, it should be under two megabytes, e.g. used_memory_human:1.66M. With production 4096-bit keys I see used_memory_human:2.64M.

Deployment

This requires additional working memory, but as suggested in the test plan, caching 1000 1024-bit keys and fingerprints in Redis takes under 2 megabytes. Production 4096-bit keys will result in a proportional increase.

Checklist

If you made changes to the server application code:

Linting (make lint) and tests (make test) pass in the development container

If you made non-trivial code changes:

I have written a test plan and validated it for this PR

It actually returns a fingerprint, not a key.

Also rename the fingerprint cache and increase its maximum size.

lgtm-com · 2020-04-01T21:07:20Z

This pull request introduces 1 alert when merging b0b411c into 8576e0b - view on LGTM.com

new alerts:

1 for Wrong number of arguments in a call

As sources are being added, include the count in the output, to give a better idea of progress.

kushaldas

Did a visual review of the change, looks good to me.

redshiftzero

These changes look good! I haven't ran through the test plan yet, but I've dropped a couple minor comments inline. Will re-review first up on my next work day so we can merge for 1.3.0

One other question: regarding the 100 MB for 1000 sources number - scaling linearly from that (well, not necessarily the best thing to do since only a subset of that 100 MB is due to the caches) but it has got me wondering about the memory usage - can you share an estimate of the memory usage when the caches are fully populated?

securedrop/crypto_util.py

securedrop/create-dev-data.py

securedrop/crypto_util.py

Improve the order of operations when cleaning deleted keys out of the CryptoUtil caches. Fix index of deleted journalist in create-dev-data.py.

By default, mod_wsgi waits for the first request to load the application. By adding the "process-group" and "application-group" options to the "WSGIScriptAlias" directive, we can tell it to load on process start instead, so the caches get populated then instead of forcing the first request to wait. See: https://modwsgi.readthedocs.io/en/latest/configuration-directives/WSGIScriptAlias.html

install_files/ansible-base/roles/app/templates/sites-available/journalist.conf

sssoleileraaa · 2020-04-08T01:56:06Z

Since I am still seeing issues with timeouts, I think it's important to test this PR against a staging server with 200, 300, and 1000 sources if there's time (which would close freedomofpress/securedrop-client#1007 if it succeeds), before it's merged. Also, remember to deleted the local database between runs so you can see whether or not the source list is populated.

rmol · 2020-04-08T15:04:17Z

I've updated the PR description to add detail about the switch to Redis. @redshiftzero this is ready for another look.

@creviera In testing with 1000 sources, I see the client source list populated, with messages being downloaded, in under two minutes. The initial sync completes in around 20 seconds.

rmol · 2020-04-08T16:14:29Z

The timing in my last comment was observed with the client running in sd-app, using the proxy and Tor, talking to my staging VM populated with 1000 sources, each with two submissions and replies, using 1024-bit keys.

kushaldas

I am approving this. My only suggestion would be an extra script to run after every reboot, to repopulate the redis cache from the existing fingerprints/keys. That can be useful for the first time syncing by the journalists. The same script should also run after an update via the package.

rmol · 2020-04-09T13:58:50Z

@kushaldas The cache is populated when Apache starts the mod_wsgi processes (from the WSGIScriptAlias documentation: "If both process-group and application-group options are set, the WSGI script file will be pre-loaded when the process it is to run in is started, rather than being lazily loaded on the first request.")

redshiftzero

all my comments were addressed, I just tested the upgrade in staging and otherwise ran through the test plan and all lgtm. thanks!

rmol requested review from kushaldas and redshiftzero as code owners April 1, 2020 20:05

redshiftzero mentioned this pull request Apr 1, 2020

Sync times out with 1000 sources freedomofpress/securedrop-client#1007

Open

rmol added 2 commits April 1, 2020 16:53

Rename CryptoUtil.getkey

7d0f3fb

It actually returns a fingerprint, not a key.

Add caching of source public keys

e3df3ed

Also rename the fingerprint cache and increase its maximum size.

rmol force-pushed the add-key-cache branch from 5ab291f to b0b411c Compare April 1, 2020 20:57

Add source detail to create-dev-data output

5769c23

As sources are being added, include the count in the output, to give a better idea of progress.

rmol force-pushed the add-key-cache branch from b0b411c to 5769c23 Compare April 1, 2020 21:13

eloquence added this to the 1.3.0 milestone Apr 1, 2020

kushaldas reviewed Apr 2, 2020

View reviewed changes

redshiftzero reviewed Apr 2, 2020

View reviewed changes

securedrop/crypto_util.py Outdated Show resolved Hide resolved

securedrop/create-dev-data.py Outdated Show resolved Hide resolved

securedrop/crypto_util.py Outdated Show resolved Hide resolved

rmol added 2 commits April 3, 2020 15:01

Fix: CryptoUtil cache cleaning, create-dev-data.py deleted journo

f40fca2

Improve the order of operations when cleaning deleted keys out of the CryptoUtil caches. Fix index of deleted journalist in create-dev-data.py.

rmol requested review from conorsch and emkll as code owners April 4, 2020 00:07

redshiftzero reviewed Apr 7, 2020

View reviewed changes

install_files/ansible-base/roles/app/templates/sites-available/journalist.conf Show resolved Hide resolved

sssoleileraaa mentioned this pull request Apr 7, 2020

Investigate and document timeouts across components freedomofpress/securedrop-workstation#525

Closed

rmol added 3 commits April 7, 2020 17:48

Use Redis for source key caching

639ecf8

Fix WSGI app init in journalist Apache config

b9a7717

Revert CryptoUtil.get_fingerprint UID matching

07e4708

Rename CryptoUtil Redis keys.

5aca092

kushaldas approved these changes Apr 9, 2020

View reviewed changes

redshiftzero approved these changes Apr 9, 2020

View reviewed changes

redshiftzero merged commit a0e2674 into develop Apr 9, 2020

redshiftzero deleted the add-key-cache branch April 9, 2020 23:06

kushaldas mentioned this pull request Apr 30, 2020

Release SecureDrop 1.3.0 #5205

Closed

22 tasks

cfm mentioned this pull request Oct 26, 2023

consider removing Redis key cache #7027

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add key cache #5184

Add key cache #5184

rmol commented Apr 1, 2020 •

edited

Loading

lgtm-com bot commented Apr 1, 2020

kushaldas left a comment

redshiftzero left a comment

sssoleileraaa commented Apr 8, 2020

rmol commented Apr 8, 2020

rmol commented Apr 8, 2020

kushaldas left a comment

rmol commented Apr 9, 2020

redshiftzero left a comment

Add key cache #5184

Add key cache #5184

Conversation

rmol commented Apr 1, 2020 • edited Loading

Status

Description of Changes

Testing

Checking Redis memory usage

Deployment

Checklist

If you made changes to the server application code:

If you made non-trivial code changes:

lgtm-com bot commented Apr 1, 2020

kushaldas left a comment

Choose a reason for hiding this comment

redshiftzero left a comment

Choose a reason for hiding this comment

sssoleileraaa commented Apr 8, 2020

rmol commented Apr 8, 2020

rmol commented Apr 8, 2020

kushaldas left a comment

Choose a reason for hiding this comment

rmol commented Apr 9, 2020

redshiftzero left a comment

Choose a reason for hiding this comment

rmol commented Apr 1, 2020 •

edited

Loading