Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add key cache #5184

Merged
merged 9 commits into from
Apr 9, 2020
Merged

Add key cache #5184

merged 9 commits into from
Apr 9, 2020

Conversation

rmol
Copy link
Contributor

@rmol rmol commented Apr 1, 2020

Status

Ready for review

Description of Changes

Moves the CryptoUtil caching to Redis. This reduces total memory usage compared to keeping the caches in the WSGI processes, and eliminates the possibility that the cache would have to be repopulated if WSGI processes restart.

Adds caching of public keys, not just fingerprints.

Renames CryptoUtil.getkey and CryptoUtil.export_pubkey to better match what they do.

Adds some detail to the output of create-dev-data.py to better indicate progress.

Uses 1024-bit keys if the environment variable SECUREDROP_ENV is test or dev, which speeds up the local dev server startup with large numbers of sources.

Fixes #5183.

Testing

Run the dev server with the environment variable NUM_SOURCES set to a large number, like 500 or 1000. Use the test script from the #5100 test plan. Running it against this branch should show faster response times than master, once the caches are populated.

Checking Redis memory usage

Stop Apache with service apache2 stop.

In redis-cli:

  • List all keys with keys *
  • Delete the Redis CryptoUtil cache keys if they exist:
    • del sd/crypto-util/fingerprints
    • del sd/crypto-util/keys
  • Execute info and look for used_memory_human. It should be under
    a megabyte, e.g. used_memory_human:517.83K

Start Apache with service apache2 start.

Wait for the caches to populate. It took a few minutes on my system. You can watch for the Apache and GPG processes to drop out of top, then check the number of keys in the Redis caches in redis-cli with hlen sd/crypto-util/fingerprints and hlen sd/crypto-util/keys.

Execute info in redis-cli and look for used_memory_human. With dev/test 1024-bit keys, it should be under two megabytes, e.g. used_memory_human:1.66M. With production 4096-bit keys I see used_memory_human:2.64M.

Deployment

This requires additional working memory, but as suggested in the test plan, caching 1000 1024-bit keys and fingerprints in Redis takes under 2 megabytes. Production 4096-bit keys will result in a proportional increase.

Checklist

If you made changes to the server application code:

  • Linting (make lint) and tests (make test) pass in the development container

If you made non-trivial code changes:

  • I have written a test plan and validated it for this PR

rmol added 2 commits April 1, 2020 16:53
It actually returns a fingerprint, not a key.
Also rename the fingerprint cache and increase its maximum size.
@lgtm-com
Copy link

lgtm-com bot commented Apr 1, 2020

This pull request introduces 1 alert when merging b0b411c into 8576e0b - view on LGTM.com

new alerts:

  • 1 for Wrong number of arguments in a call

As sources are being added, include the count in the output, to give a
better idea of progress.
@eloquence eloquence added this to the 1.3.0 milestone Apr 1, 2020
Copy link
Contributor

@kushaldas kushaldas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did a visual review of the change, looks good to me.

Copy link
Contributor

@redshiftzero redshiftzero left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These changes look good! I haven't ran through the test plan yet, but I've dropped a couple minor comments inline. Will re-review first up on my next work day so we can merge for 1.3.0

One other question: regarding the 100 MB for 1000 sources number - scaling linearly from that (well, not necessarily the best thing to do since only a subset of that 100 MB is due to the caches) but it has got me wondering about the memory usage - can you share an estimate of the memory usage when the caches are fully populated?

securedrop/crypto_util.py Outdated Show resolved Hide resolved
securedrop/create-dev-data.py Outdated Show resolved Hide resolved
securedrop/crypto_util.py Outdated Show resolved Hide resolved
rmol added 2 commits April 3, 2020 15:01
Improve the order of operations when cleaning deleted keys out of the
CryptoUtil caches.

Fix index of deleted journalist in create-dev-data.py.
By default, mod_wsgi waits for the first request to load the
application. By adding the "process-group" and "application-group"
options to the "WSGIScriptAlias" directive, we can tell it to load on
process start instead, so the caches get populated then instead of
forcing the first request to wait. See:

https://modwsgi.readthedocs.io/en/latest/configuration-directives/WSGIScriptAlias.html
@rmol rmol requested review from conorsch and emkll as code owners April 4, 2020 00:07
@sssoleileraaa
Copy link
Contributor

Since I am still seeing issues with timeouts, I think it's important to test this PR against a staging server with 200, 300, and 1000 sources if there's time (which would close freedomofpress/securedrop-client#1007 if it succeeds), before it's merged. Also, remember to deleted the local database between runs so you can see whether or not the source list is populated.

@rmol
Copy link
Contributor Author

rmol commented Apr 8, 2020

I've updated the PR description to add detail about the switch to Redis. @redshiftzero this is ready for another look.

@creviera In testing with 1000 sources, I see the client source list populated, with messages being downloaded, in under two minutes. The initial sync completes in around 20 seconds.

@rmol
Copy link
Contributor Author

rmol commented Apr 8, 2020

The timing in my last comment was observed with the client running in sd-app, using the proxy and Tor, talking to my staging VM populated with 1000 sources, each with two submissions and replies, using 1024-bit keys.

Copy link
Contributor

@kushaldas kushaldas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am approving this. My only suggestion would be an extra script to run after every reboot, to repopulate the redis cache from the existing fingerprints/keys. That can be useful for the first time syncing by the journalists. The same script should also run after an update via the package.

@rmol
Copy link
Contributor Author

rmol commented Apr 9, 2020

@kushaldas The cache is populated when Apache starts the mod_wsgi processes (from the WSGIScriptAlias documentation: "If both process-group and application-group options are set, the WSGI script file will be pre-loaded when the process it is to run in is started, rather than being lazily loaded on the first request.")

Copy link
Contributor

@redshiftzero redshiftzero left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

all my comments were addressed, I just tested the upgrade in staging and otherwise ran through the test plan and all lgtm. thanks!

@redshiftzero redshiftzero merged commit a0e2674 into develop Apr 9, 2020
@redshiftzero redshiftzero deleted the add-key-cache branch April 9, 2020 23:06
@kushaldas kushaldas mentioned this pull request Apr 30, 2020
22 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Journalist API endpoint get_all_sources is still too slow
5 participants