Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Upgrade Flask dependency to 2.0.2 and refactor applications to improve static type checks #6217

Merged
merged 15 commits into from
Feb 1, 2022

Conversation

zenmonkeykstop
Copy link
Contributor

@zenmonkeykstop zenmonkeykstop commented Jan 10, 2022

Status

Ready for review (but test plan TK)

Description of Changes

Fixes #6154.

This PR updates the application Flask version to 2.0.2 and updates associated dependencies. It also refactors out the app.storage and app.instance_config dynamic attributes and corrects several typing errors, in order to allow for successul static typechecking via mypy.

  • Dependency updates:
    • click from 6.7 to 8.0.3
    • flask-babel from 011.2 to 2.0.0
    • flask-wtf from 0.14.2 to 1.0.0
    • Flask from 1.0.2 to 2.0.2
    • itsdangerous from 0.24 to 2.0.1
    • jinja2 from 2.11.3 to 3.0.2
    • markupsafe from 1.1.1 to 2.0.1
    • redis from 3.3.6 to 3.5.3
    • rq from 1.1.0 to 1.10.0
    • werkzeug from 0.16.0 to 2.0.2

Application refactor:

  • Flask app.storage attribute has been removed in favour of a global Storage object, accessed via a .get_default() class method that creates the global object if it doesn't already exist, and then returns it.
  • Flask app.instance_config attribute has been removed in favour of a global InstanceConfig object, accessed similarly to Storage.
  • Numerous typing fixes have been made.
  • datetime variables have been updated to be timezone-aware as required by Flask 2.0.2

Test updates:

Tests have been updated to account for the refactor above. An app_storage fixture was added, returning a function-scoped Storage object. For tests which access storage directly, that fixture was added to their args and references like current_app.storage.whatever() were changed to app_storage.whatever(). For tests that access storage indirectly (for example, via a test journalist or source application), the Storage.get_default() method must be mocked to return the app_storage fixture. This patch has been added in the source_app and journalist_app fixtures, so it will be applied automatically in tests that use them.

General notes:

  • There a few TODOs not addressed by this PR for various reasons
    • The babel_instance Flask app dynamic attribute has not been refactored out - it's added by flask-babel and an upstream change is probably required Fixed in Stop using app.babel_instance #6220
    • type checking is disabled for the CSRFError error handlers due to an issue similar to: error handler type check fails pallets/flask#4295
    • type checking is disabled in the generic error handler - error handling should be updated to remove the use of Flask's internal app.error_handler_spec structure
    • during initial testing, it was found that test requirements would override app requirements in the test Docker environment, meaning that updates to app requirements weren't always being applied. As a quick fix, the order in which requirements were installed was switched, so that app requirements would always be applied last. A better fix would be to clean up those requirements files to either keep them in sync or to remove application dependencies from the test requirements files altogether.

Testing

Storage-specific tests

Secure temporary files

  • check out this branch and run make dev-tor (some latency will help when observing the tempfile behaviour)
  • in another teminal, connect to the dev env with docker exec -t securedrop-dev-0 bash
  • in the dev env, monitor /tmp with watch -n 1 ls -l /tmp
  • connect to the Source Interface(SI) (you'll find the address listed in docker output just before the applications start) and upload a file greater than 512k.
    • confirm that 2 files with a .aes extension are written and then deleted as the upload completes (the first is the gzipped initial submission, second is decompressed version)
  • (optional) for bonus points, retain both files by adding a delete=False argument to the super().__init() in the SecureTemporaryFile class and submitting another file larger than 512k
    • verify that both files contents "look" random.
    • verify that the size of the first file matches the gzipped size of the submission and the size of the second matches that of the original file.

Submissions and replies

  • are files being saved to the right location?
    • check out this branch and run make dev
    • in another teminal, connect to the dev env with docker exec -t securedrop-dev-0 bash
    • in the SI, start at least 2 source sessions, upload distinct files to both
    • verify that they were stored to the correct /var/lib/securedrop/store/<UUID>/ paths for the respective sources
  • are they being attributed to the correct source in the JI?
    • in the Journalist Interface(JI), verify that the submissions are listed under the correct source page
  • are replies only visible to the intended recipient in the SI?
    • in the JI, reply to both sources
    • in SI, confirm for each source that they can see their own reply only
  • are they encrypted?
    • in the JI, download the submissions and confirm that they can be decrypted with the submission key
  • are they being deleted correctly (individually, en masse, or as part of a source wipe)
    • delete one source's submissions and replies via the "files and messages" only option, confirm they are no longer present
    • delete an individual submission from another source, confirm that the file was removed from the store
    • delete the source in its entirety, confirm that the source store directory was deleted
  • are source store directories being recreated correctly?
    • in the SI, create another source and submit some docs/messages.
    • delete all files and messages from a source while preserving the source itself
    • in the dev env shell, manually delete the source's store directory
    • attempt to submit a new doc as that source and confirm that the directory is recreated and the source submission is present

Instance config tests

  • check out this branch and run make dev, then log in to the JI
  • on the Admin > Instance Config page, update the instance name.
    • verify that the name is saved successfully and that the JI titles change to reflect it.
    • load the SI and verify that the new name is used there as well.
  • on the Admin > Instance Config page, disable file uploads.
    • load the SI and verify that sources only have the option to submit messages, not files.

End-to-end tests

  • Perform as much of the Acceptance Tests section of the 2.1.0 test plan as you have time and patience for.
    • verify that acceptance tests pass.

Upgrade scenario

  • Follow the upgrade scenario docs to verify an upgrade from 2.1.0 to locally-built packages.
    • verify that the upgrade scenario completes successfully
    • verify that basic functionality is available after the upgrade.

Deployment

These application changes will be deployed with the next release containing them. There are no schema changes or data migrations, but 2 of the alembic migration scripts were modified to reference the new global Storage object - as such, attention should be paid during testing to QA scenarios involving upgrades to verify that they work as expected.

Checklist

If you made changes to the server application code:

  • Linting (make lint) and tests (make test) pass in the development container

If you made changes to the system configuration:

If you made non-trivial code changes:

  • I have written a test plan and validated it for this PR TK

Choose one of the following:

  • I have opened a PR in the docs repo for these changes, or will do so later
  • I would appreciate help with the documentation
  • These changes do not require documentation

If you added or updated a production code dependency:

Production code dependencies are defined in:

  • admin/requirements.in
  • admin/requirements-ansible.in
  • securedrop/requirements/python3/securedrop-app-code-requirements.in

If you changed another requirements.in file that applies only to development
or testing environments, then no diff review is required, and you can skip
(remove) this section.

Choose one of the following:

  • I have performed a diff review and pasted the contents to the packaging wiki
  • I would like someone else to do the diff review - given the scope, looking for volunteers. So far we have:

@zenmonkeykstop zenmonkeykstop added this to the 2.2.0 milestone Jan 10, 2022
Copy link
Member

@legoktm legoktm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As I mentioned earlier, I think this could be split a bit to make easier to review:

  • datetime changes: utcnow() -> now(utc)
  • Introduction of and switch to Storage.get_default()
  • Introduction of and switch to InstanceConfig.get_default()

@@ -110,14 +110,10 @@ def _handle_http_exception(
def expire_blacklisted_tokens() -> None:
cleanup_expired_revoked_tokens()

@app.before_request
def load_instance_config() -> None:
app.instance_config = InstanceConfig.get_current()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I'm understanding this correctly, this is a behavior shift now in that InstanceConfig is just initialized once at startup rather than on each request. Is that intentional? I haven't figured out how that works with the "valid_until" field yet...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually the instance config is now retrieved every time InstanceConfig.get_default() is called, so it's actually hitting the DB as much as before (or potentially more often if a single request has multiple .get_default()s) This reduces the benefit of a global object, so there's probably some opportunity for optimization there. The trick would be to detect configuration changes when they're made in the journalist interface, as we want those to be reflected immediately in the source app.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated the InstanceConfig get_default() method to be lazier - it will now only hit the db if explicitly requested or if the global is None. With some tweaks to @app.before_request-decorate functions, changes should show up immediately in both apps without extra DB overhead.

Also, re valid_until - it might be a bit counter-intuitive. It records the historical info about previous configs, and when they were invalidated. (TBH I'm not sure we need to store said previous configs, but it does provide an audit trail for instance_config changes.)

filename: typing.Optional[str],
stream: 'IO[bytes]') -> str:

sanitized_filename = secure_filename("unknown.file")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this would be easier to read if it was in an else block below.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure, updated.

@@ -137,15 +138,15 @@ def download_single_file(filesystem_id: str, fn: str) -> werkzeug.Response:
reply = Reply.query.filter(Reply.filename == fn).one()
mark_seen([reply], journalist)
elif fn.endswith("-doc.gz.gpg") or fn.endswith("doc.zip.gpg"):
file = Submission.query.filter(Submission.filename == fn).one()
mark_seen([file], journalist)
the_file = Submission.query.filter(Submission.filename == fn).one()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we're renaming this maybe a better name would be submitted_file?

This whole block could be more DRY by not duplicating the mark_seen call for each if stanza...but that can be done separately.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 updated (the name, not the mark_seen() calls)

@@ -82,7 +84,7 @@ def create() -> werkzeug.Response:
create_source_user(
db_session=db.session,
source_passphrase=codename,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is also documented as taking a DicewarePassphrase, so I don't understand why mypy is happy with it but below wants you to wrap it. In any case, shouldn't the codename variable be turned into a DicewarePassphrase as soon as its pulled out of the session object?

@zenmonkeykstop
Copy link
Contributor Author

As I mentioned earlier, I think this could be split a bit to make easier to review:

* datetime changes: utcnow() -> now(utc) 
* Introduction of and switch to Storage.get_default()
* Introduction of and switch to InstanceConfig.get_default()

This makes sense - I've reordered and merged commits to simplify things and group related changes. Let me know if it's helpful!

@codecov-commenter
Copy link

codecov-commenter commented Jan 11, 2022

Codecov Report

Merging #6217 (aba1749) into develop (f768a1a) will increase coverage by 0.01%.
The diff coverage is 88.67%.

Impacted file tree graph

@@             Coverage Diff             @@
##           develop    #6217      +/-   ##
===========================================
+ Coverage    85.13%   85.15%   +0.01%     
===========================================
  Files           59       59              
  Lines         4090     4102      +12     
  Branches       487      490       +3     
===========================================
+ Hits          3482     3493      +11     
  Misses         491      491              
- Partials       117      118       +1     
Impacted Files Coverage Δ
securedrop/journalist_app/admin.py 89.40% <ø> (ø)
...rsions/3da3fcab826a_delete_orphaned_submissions.py 31.91% <20.00%> (ø)
.../b58139cfdc8c_add_checksum_columns_revoke_table.py 36.36% <33.33%> (ø)
securedrop/source_app/forms.py 94.44% <50.00%> (-0.30%) ⬇️
securedrop/store.py 90.09% <75.00%> (-0.11%) ⬇️
securedrop/source_app/main.py 94.47% <85.71%> (+0.03%) ⬆️
securedrop/i18n.py 93.25% <100.00%> (ø)
securedrop/journalist_app/__init__.py 90.09% <100.00%> (-0.18%) ⬇️
securedrop/journalist_app/api.py 94.16% <100.00%> (ø)
securedrop/journalist_app/col.py 81.01% <100.00%> (+0.24%) ⬆️
... and 8 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update f768a1a...aba1749. Read the comment docs.

Comment on lines -111 to 114
flask-babel==0.11.2 \
--hash=sha256:462a3c599b0ccf426ca1757cc612f1db383844efd346d14170da04c8c76dd521 \
--hash=sha256:c0d75710bd4b0fe866f9f2347de6e19208712f9cec006436b4c1c15d4cb0c939
flask-babel==2.0.0 \
--hash=sha256:e6820a052a8d344e178cdd36dd4bb8aea09b4bda3d5f9fa9f008df2c7f2f5468 \
--hash=sha256:f9faf45cdb2e1a32ea2ec14403587d4295108f35017a7821a2b1acb8cfd9257d
# via -r requirements/python3/securedrop-app-code-requirements.in
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment on lines -145 to +149
if app.instance_config.organization_name:
g.organization_name = app.instance_config.organization_name
if InstanceConfig.get_default().organization_name:
g.organization_name = \
InstanceConfig.get_default().organization_name # pylint: disable=assigning-non-slot
else:
g.organization_name = gettext('SecureDrop')
g.organization_name = gettext('SecureDrop') # pylint: disable=assigning-non-slot
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit in passing: I wondered if this could be simplified, since InstanceConfig.organization_name has default="SecureDrop", but it looks like that default's not applied in the upgrade migration.

redis==3.3.6 \
--hash=sha256:45682ecf226c7611efe731974c4fa3390170ba045b9cdb26f0051114a5c2a68b \
--hash=sha256:f2609a85e5f37f489ba3b5652e1175dc3711c4d7a7818c4f657615810afd23df
redis==3.5.3 \
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@zenmonkeykstop zenmonkeykstop marked this pull request as ready for review January 13, 2022 23:20
@zenmonkeykstop zenmonkeykstop requested a review from a team as a code owner January 13, 2022 23:20
@conorsch
Copy link
Contributor

Diff reviews complete for click and itsdangerous. Regarding the latter, we've got an upcoming deprecation tracked in #6224. For the purposes of this PR, see the comments in pallets/itsdangerous#257 about token length; I have not tried to reproduce what's reported there, but given the scope of these changes, makes sense to evaluate as part of review here.

Copy link
Contributor

@conorsch conorsch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like securedrop/requirements/python3/test-requirements.txt still needs to be updated; it contains itsdangerous==0.24 (the old version) and also Flash v1 still.

@zenmonkeykstop
Copy link
Contributor Author

Looks like securedrop/requirements/python3/test-requirements.txt still needs to be updated; it contains itsdangerous==0.24 (the old version) and also Flash v1 still.

Yup, that's an outstanding task in the TODOs - current workaround also described there.

@conorsch conorsch dismissed their stale review January 14, 2022 00:06

already tracked in todos

@legoktm
Copy link
Member

legoktm commented Jan 14, 2022

jinja notes:

  • evalcontextfilter is deprecated, there's a new sample nl2br snippet: https://jinja.palletsprojects.com/en/3.0.x/api/?highlight=nl2br#custom-filters
  • The with_ and autoescape extensions are deprecated (now always built-in AIUI), so maybe they can be removed from babel.cfg?
  • It seems Markup should be imported from markupsafe rather than jinja now
  • Do we have a regression test that verifies jinja's autoescaping is turned on? Like assert Template("{{foo}}").render(foo="<script>") == "&lt;script>" or something.

I reviewed all the doc and src/ changes, ended up skipping the tests/ changes due to a lack of time.

# via
# flask
# flask-wtf
jinja2==3.0.2 \
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@@ -877,6 +879,13 @@ def copy(self) -> "InstanceConfig":

return new

@classmethod
def get_default(cls, refresh: Optional[bool] = False) -> "InstanceConfig":
Copy link
Contributor

@nabla-c0d3 nabla-c0d3 Jan 15, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would put refresh: bool = False ; not sure there's a need for refresh to ever be None?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✔️ - updated.

@@ -32,30 +32,29 @@ def create_file_in_source_dir(config, filesystem_id, filename):
return source_directory, file_path
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To further simplify the tests in this file, I would recommend:

  • Removing all mentions of the journalist_app, as it is out of scope for what is being tested here (only the Storage class is being tested; not the full app).
  • Not using the storage fixture as I would argue that creating a Storage class should be done in the test code as that's what's being tested: it's not "setup" code. The benefit will be that the tests won't be coupled with the config fixture (which is used in a lot of tests and seems unnecessary here) and the tests will be easier to read/follow.

You can see an example of tis approach above in TestPassphrasesGenerator.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The journalist_app fixture is unused in the majority of the tests alright - there are a few that do require an app context - eg. the shredder tests, but they could probably be moved to a separate file.

I'm less convinced about removing the app_storage fixture in all cases. Some of the tests that use it and the config fixture are essentially verifying that files are being created in the store in the expected location based on config parameters, and for those it would seem either tautological to just remove config, or duplicative to effectively recreate both fixtures in the test. But it can definitely be removed from tests not using config and replaced with a test-specific Storage object.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually on further reflection there are a lot more cases where the latter makes sense...

# Given a source user
with source_app.app_context():
source_user = create_source_user(
db_session=db.session,
source_passphrase=PassphraseGenerator.get_default().generate_passphrase(),
source_app_storage=source_app.storage,
source_app_storage=app_storage,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not super important, but you might now be able to remove the source_app fixture from this test and the one below 🙌

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like yes to the one below, but no to this one without independently setting up the db and creating an app and passing its context within the test.

@zenmonkeykstop zenmonkeykstop force-pushed the stg-upgrade-flask-2.0 branch 2 times, most recently from 6f89f27 to aba1749 Compare January 17, 2022 18:17
if filename is not None:
sanitized_filename = secure_filename(filename)
else:
sanitized_filename = secure_filename("unknown.file")
Copy link
Contributor

@nabla-c0d3 nabla-c0d3 Jan 17, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Not very important) You could simplify this by switching the argument's declaration to:

filename: str = "unknown.file",

Then no if/else would be needed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I kindof like it as is, I think @legoktm's previous point about readability is a good one.

Comment on lines +281 to +283
rq==1.10.0 \
--hash=sha256:92950a3e60863de48dd1800882939bbaf089a37497ebf9f2ecf7c9fd0a4c4a95 \
--hash=sha256:be09ec43fae9a75a4d26ea3cd520e5fa3ea2ea8cf481be33e6ec9416f0369cac
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@eloquence eloquence removed this from the 2.2.0 milestone Jan 19, 2022
@legoktm
Copy link
Member

legoktm commented Jan 21, 2022

I got through the first half of the test plan today, should finish the rest on Monday.

Originally, test requirements were installed in the Docker environment after
application requirements. Some dependencies are duplicated in both, with older
versions present in the test requirements. The order of installation was switched
to ensure application requirements were always present and not overridden.
 - click from 6.7 to 8.0.3
 - flask-babel from 011.2 to 2.0.0
 - flask-wtf from 0.14.2 to 1.0.0
 - Flask from 1.0.2 to 2.0.2
 - itsdangerous from 0.24 to 2.0.1
 - jinja2 from 2.11.3 to 3.0.2
 - markupsafe from 1.1.1 to 2.0.1
 - redis from 3.3.6 to 3.5.3
 - rq from 1.1.0 to 1.10.0
 - werkzeug from 0.16.0 to 2.0.2
Updated tests to use timezone-aware datetimes.
 - added global Storage object
 - updated source app to use global storage
 - updated journalist app to use global storage
 - updated loaddata.py to use global storage
 - updated shredder to use global storage
 - updated alembic migration scripts to use global storage object
 - disabled broken typechecking for csrf error handlers
 - disabled typechecking for blueprint 404 error handling workaround
 - updated save_session() with expected type
 - updated _secure_file_stream args to match those expected by Werkzeug 2.0
 - fixed to_json() return type
 - updated file submissions to use expected types and handle case with missing filename
 - ignore babel_instance attribute dynamically added by flask-babel
 - explicitly cast source passphrase for login
 - use expected SessionMixin type instead of Dict[Any, Any] for sessions
 - use BytesIO instead of BufferedBaseIO
 - changed invalid variable name 'file' to 'the_file'
…nctions.

In models.py, the Submission and Reply constructors reference the Storage object to calculate file size information. Originally they used Storage.get_default() directly, but passing the Storage object as a parameter instead more cleanly separates the model from the application and simplifies the tests.

Similar considerations apply to the async_add_checksum_for_file() function in store.py.
Some test changes were needed to account for the use of a global Storage object. An app_storage fixture was added, returning a function-scoped Storage object. For tests which access storage directly, that fixture was added to their args and references ito current_app.storage were changed to app_storage. For tests that access storage indirectly (for example, via a test journalist or source application), the Storage.get_default() method must be mocked to return the app_storage fixture. This patch has been added in the source_app and journalist_app fixtures, so it will be applied automatically in tests that use them.
If the instance config is updated in the Journalist Interface, the change should
be reflected in both web applications immediately. To allow for this, an
optional refresh argument forcing a database lookup was added, and the source and
journalist apps were updated to use the refresh option before each request.
Subsequent uses of the InstanceConfig within a request should not need to hit the
database.
@conorsch conorsch self-requested a review February 1, 2022 20:45
Copy link
Contributor

@conorsch conorsch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changes look great. Try as I might, I'm unable to break it, which is grand. =) Tested via the prod VM upgrade scenario, and ran through the full test plan. Diff is quite readable, and post-reorg, the code is a bit more intuitive IMO. Great work on this one, @zenmonkeykstop—and thanks also to @legoktm for detailed review.

@conorsch conorsch merged commit a20c843 into develop Feb 1, 2022
cfm added a commit to freedomofpress/securedrop-client that referenced this pull request Nov 2, 2022
gonzalo-bulnes pushed a commit to freedomofpress/securedrop-client that referenced this pull request Dec 28, 2022
gonzalo-bulnes pushed a commit to freedomofpress/securedrop-client that referenced this pull request Dec 28, 2022
gonzalo-bulnes pushed a commit to freedomofpress/securedrop-client that referenced this pull request Jan 4, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Update Flask version to 2.0.*, along with associated requirements
8 participants