Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: squash Alembic migrations; eliminate non-self-contained data-migration tests #1517

Merged
merged 10 commits into from
Jul 5, 2022

Conversation

cfm
Copy link
Member

@cfm cfm commented Jun 10, 2022

Description

BREAKING CHANGE. Fixes #1500 by:

  1. squashing Alembic migrations as of the current head, 9ba8d7524871;
  2. removing data-migration tests for deleted Alembic migrations;
  3. removing version-insensitive utilities for testing data migrations; and
  4. documenting, in the readme and the pull-request template, requirements for data migrations and their tests to avoid data migrations will fail if subsequent versions add columns to back-referenced tables #1500 in the future.

Test Plan

Against a staging or production SecureDrop server, follow these steps in sd-app rather than sd-dev.

  • In-place upgrade fails:
(.venv) user@sd-dev:~/securedrop-client$ git checkout main
Switched to branch 'main'
Your branch is up to date with 'origin/main'.
(.venv) user@sd-dev:~/securedrop-client$ ls -al ~/.securedrop_client/svs.sqlite
lrwxrwxrwx 1 user user 39 Jun  9 16:43 /home/user/.securedrop_client/svs.sqlite -> /home/user/securedrop-client/svs.sqlite
(.venv) user@sd-dev:~/securedrop-client$ rm svs.sqlite
(.venv) user@sd-dev:~/securedrop-client$ alembic upgrade head
INFO  [alembic.runtime.migration] Context impl SQLiteImpl.
INFO  [alembic.runtime.migration] Will assume non-transactional DDL.
INFO  [alembic.runtime.migration] Running upgrade  -> 2f363b3d680e, init
INFO  [alembic.runtime.migration] Running upgrade 2f363b3d680e -> fecf1191b6f0, remove decryption_vs_content contraint
INFO  [alembic.runtime.migration] Running upgrade fecf1191b6f0 -> bafdcae12f97, Add files.original_filename
INFO  [alembic.runtime.migration] Running upgrade bafdcae12f97 -> 36a79ffcfbfb, add first_name, last_name, fullname, initials
INFO  [alembic.runtime.migration] Running upgrade 36a79ffcfbfb -> 86b01b6290da, add reply draft
INFO  [alembic.runtime.migration] Running upgrade 86b01b6290da -> fb657f2ee8a7, drop File.original_filename
INFO  [alembic.runtime.migration] Running upgrade fb657f2ee8a7 -> 7f682532afa2, add download error
INFO  [alembic.runtime.migration] Running upgrade 7f682532afa2 -> a4bf1f58ce69, fix journalist association in replies table
INFO  [alembic.runtime.migration] Running upgrade a4bf1f58ce69 -> bd57477f19a2, add seen tables
INFO  [alembic.runtime.migration] Running upgrade bd57477f19a2 -> eff1387cfd0b, add deletedconversation table
INFO  [alembic.runtime.migration] Running upgrade eff1387cfd0b -> 9ba8d7524871, add deletedsource table
(.venv) user@sd-dev:~/securedrop-client$ git checkout 1500-squash-alembic
Switched to branch '1500-squash-alembic'
Your branch is up to date with 'origin/1500-squash-alembic'.
(.venv) user@sd-dev:~/securedrop-client$ alembic upgrade head
INFO  [alembic.runtime.migration] Context impl SQLiteImpl.
INFO  [alembic.runtime.migration] Will assume non-transactional DDL.
ERROR [alembic.util.messaging] Can't locate revision identified by '9ba8d7524871'
  FAILED: Can't locate revision identified by '9ba8d7524871'
  • A fresh installation succeeds:
(.venv) user@sd-dev:~/securedrop-client$ git checkout 1500-squash-alembic
Switched to branch '1500-squash-alembic'
Your branch is up to date with 'origin/1500-squash-alembic'.
(.venv) user@sd-dev:~/securedrop-client$ ls -al ~/.securedrop_client/svs.sqlite
lrwxrwxrwx 1 user user 39 Jun  9 16:43 /home/user/.securedrop_client/svs.sqlite -> /home/user/securedrop-client/svs.sqlite
(.venv) user@sd-dev:~/securedrop-client$ rm svs.sqlite 
(.venv) user@sd-dev:~/securedrop-client$ alembic upgrade head
INFO  [alembic.runtime.migration] Context impl SQLiteImpl.
INFO  [alembic.runtime.migration] Will assume non-transactional DDL.
INFO  [alembic.runtime.migration] Running upgrade  -> d7c8af95bc8e, empty message
(.venv) user@sd-dev:~/securedrop-client$ python -m securedrop_client
  • A SQL diff shows only insignificant differences in whitespace/ordering and the inlining of CONSTRAINTs:

    1. On each of main and 1500-squash-alembic, run alembic upgrade head and save the databases to (e.g.) {old,new}.sqlite.
    2. Dump them to SQL:
      (.venv) user@sd-dev:~/securedrop-client$ sqlite3 old.sqlite .dump > old.sql && sqlite3 new.sqlite .dump > new.sql
    3. Format the SQL files consistently.
    4. Diff the formatted SQL files with the tool of your choice, e.g.:
      (.venv) user@sd-dev:~/securedrop-client$ diffoscope --html-dir diff --no-default-limits old.sql new.sql
  • The Client runs normally.

Checklist

If these changes modify code paths involving cryptography, the opening of files in VMs or network (via the RPC service) traffic, Qubes testing in the staging environment is required. For fine tuning of the graphical user interface, testing in any environment in Qubes is required. Please check as applicable:

  • I have tested these changes in the appropriate Qubes environment
  • I do not have an appropriate Qubes OS workstation set up (the reviewer will need to test these changes)
  • These changes should not need testing in Qubes

If these changes add or remove files other than client code, the AppArmor profile may need to be updated. Please check as applicable:

  • I have updated the AppArmor profile
  • No update to the AppArmor profile is required for these changes
  • I don't know and would appreciate guidance

If these changes modify the database schema, you should include a database migration. Please check as applicable:

  • I have written a migration and upgraded a test database based on main and confirmed that the migration applies cleanly
  • I have written a migration but have not upgraded a test database based on main and would like the reviewer to do so
  • I need help writing a database migration
  • No database schema changes are needed
  • This is a breaking change, and release coordination is required

@cfm cfm added release blocker database SQLite, SQLAlchemy, and data model labels Jun 10, 2022
@cfm cfm force-pushed the 1500-squash-alembic branch 2 times, most recently from d05b558 to d390264 Compare June 14, 2022 01:32
@cfm
Copy link
Member Author

cfm commented Jun 14, 2022

Not captured by alembic revision --autogenerate or the test_alembic.py suite:

# Set the initial in-progress send statuses: PENDING, FAILED
conn = op.get_bind()
conn.execute(
"""
INSERT INTO replysendstatuses
('name')
VALUES
('PENDING'),
('FAILED');
"""
)

To do:

  • Add this INSERT statement at the end of the new Alembic base migration d7c8af95bc8e
  • Add a test to prove it's there (and demonstrate what such a self-contained test looks like)
  • Test in the Client
  • Follow-up: Why do we hard-code these values this way?

I've confirmed with @creviera that this pull request (with these changes) does not need to be ready for tomorrow's release-readiness check, so this will fall through to testing in a subsequent securedrop-client nightly or release candidate.

@cfm cfm force-pushed the 1500-squash-alembic branch from 9c02a85 to 2a6ebb1 Compare June 14, 2022 02:25
@cfm cfm force-pushed the 1500-squash-alembic branch 2 times, most recently from e3c81a4 to 557ab3a Compare June 28, 2022 01:10
@cfm cfm marked this pull request as ready for review June 28, 2022 01:56
@cfm cfm requested a review from a team as a code owner June 28, 2022 01:56
@cfm
Copy link
Member Author

cfm commented Jun 28, 2022

Note that, as this is a breaking change requiring a clean installation (or at least deleting svs.sqlite), strict semantic versioning would force the release that includes it to be v1.0.0. That determination is left to the discretion of the release manager!

@gonzalo-bulnes
Copy link
Contributor

Reviewing... : )

@gonzalo-bulnes gonzalo-bulnes self-requested a review June 30, 2022 02:06
Copy link
Contributor

@gonzalo-bulnes gonzalo-bulnes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Test Plan

Against a staging or production SecureDrop server, follow these steps in sd-app rather than sd-dev.

⚠️ I've tested this on a sd-dev instance, against the recommendation, because I didn't have an sd-app at hand.

  • In-place upgrade fails: 🍏
git checkout main
ls -al ~/.securedrop_client/svs.sqlite
rm svs.sqlite
alembic upgrade head
git checkout 1500-squash-alembic
alembic upgrade head
# INFO  [alembic.runtime.migration] Context impl SQLiteImpl.
# INFO  [alembic.runtime.migration] Will assume non-transactional DDL.
# ERROR [alembic.util.messaging] Can't locate revision identified by '9ba8d7524871'
#  FAILED: Can't locate revision identified by '9ba8d7524871'
  • A fresh installation succeeds: 🍏
git checkout 1500-squash-alembic
ls -al ~/.securedrop_client/svs.sqlite
rm svs.sqlite 
alembic upgrade head
# INFO  [alembic.runtime.migration] Context impl SQLiteImpl.
# INFO  [alembic.runtime.migration] Will assume non-transactional DDL.
# INFO  [alembic.runtime.migration] Running upgrade  -> d7c8af95bc8e, empty message

⚠️ There are changes here in how I tested that the app ran normally, agian because of sd-dev instead of sd-app:

# Finally:
cd ~/src/securedrop-client
git checkout 1500-squash-alembic
rm svs.sqlite 
alembic upgrade head
cp svs.sqlite ~/.securedrop-client/svs.sqlite
sha256sum ~/.securedrop-client/svs.sqlite # for future reference
# ebabf002228da8a1e936158bb6ffc3a494c4ef6d693dd1a394f263fb1941a3bb  /home/user/.securedrop-client/svs.sqlite
LOGLEVEL=debug ./run.sh --sdc-home ~/.securedrop-client/

# Downloaded files, deleted some, deleted source account...
sha256sum ~/.securedrop-client/svs.sqlite # Changes are expected, to make sure the database in use was the one previously re-created by alembic.
# c71f3b916bd72c2a840659ab64bee498a794b82d5cd48bbbf111b83dad8608fd  /home/user/.securedrop-client/svs.sqlite
# OK
  • The Client runs normally.

Additional tests

I've directly compared the resulting databases for both migration paths, as follows:

# Get a text representation of the original database
cd ~/src/securedrop-client
git checkout main
rm svs.sqlite
alembic upgrade head
sha256sum svs.sqlite
# 39a47c16e9ad9376e9d0339df3bc61462fd2a83bcc42d3e89e2701ebe573dc3f  svs.sqlite
sqlite3 svs.sqlite .dump > old.sql

# Get a text representation of the new database
git checkout 1500-squash-alembic
rm svs.sqlite
alembic upgrade head
sha256sum svs.sqlite
# 57dd72783fa1cc6194905b05e2c5858a24e45d9c1d589fc87206466f8c324729  svs.sqlite
sqlite3 svs.sqlite .dump > new.sql

# Compare them:
diff -u old.sql new.sql
# See result in the gist below.

👉 Result of the comparison between old.sql and new.sql`. I pasted it in a gist because it's too long to be comfortable to read inline. (gist)

There are some order and indentation changes that are not meaningful. After eliminating those, the only significant differences are:

--- old.sql	2022-06-30 12:50:00.742905752 +1000
+++ new.sql	2022-06-30 12:49:34.854907411 +1000

+ INSERT INTO replysendstatuses VALUES(1,'PENDING');
+ INSERT INTO replysendstatuses VALUES(2,'FAILED');
+ INSERT INTO downloaderrors VALUES(1,'CHECKSUM_ERROR');
+ INSERT INTO downloaderrors VALUES(2,'DECRYPTION_ERROR');
  • Both migration paths result in equivalent databases 🍊

I'm kinda stopping for a moment before ticking that last box. However:

  • @cfm My understanding is that you've got these changes well in mind, I just don't quite understand them. (It might be I need to read the context again with a fresh mind!)
  • In any case, they're additions, that I don't see could cause trouble, even if it turned out were not present in the database after following the original migration path.

So things are looking pretty good to me. On the testing side (as in files in tests/), I'll take a fresh look next week, but the rationale makes complete sense and the corresponding docs / PR template additions are 👍. 🍏 🍏

@cfm
Copy link
Member Author

cfm commented Jun 30, 2022

Thanks for reviewing, @gonzalo-bulnes! Diffing the .dumps is a great extra test. It's caught one substantive omission, which I'll fix (and explain). I'll also add a minimal-diff check to the test plan.

@cfm cfm force-pushed the 1500-squash-alembic branch from 557ab3a to d1fe011 Compare July 1, 2022 01:25
@cfm
Copy link
Member Author

cfm commented Jul 1, 2022

Diffing the .dumps is a great extra test. It's caught one substantive omission, which I'll fix (and explain).

The context: These INSERTs populate tables corresponding to the Enums defined in securedrop_client.db. (I'm not convinced that this extra level of indirection gets us anything over using the Enums directly, but that's for another day.)

The omission: The downloaderrors enum table. That's tested for and fixed in the refactored fa45d89 and expanded 03f063a, respectively.

I'll also add a minimal-diff check to the test plan.

This is done, with the expected diff itself minimized by 3211967. Let me know what you think, @gonzalo-bulnes!

@cfm cfm force-pushed the 1500-squash-alembic branch 2 times, most recently from a14db6c to 3211967 Compare July 1, 2022 02:08
@gonzalo-bulnes
Copy link
Contributor

gonzalo-bulnes commented Jul 1, 2022

style: reorder table and column operations to minimize SQL diff

❤️

The addition to the test plan looks great @cfm! (I didn't think of reformatting the SQL files automatically and removed the common lines by hand 🙈)

I'll review again and see if the enums situation is still unclear to me 🙂

P.S.: I think I'm with you on the enum values in the database. Maybe that was a initially cautious move for an enum which values turned out not to change that often?

@gonzalo-bulnes gonzalo-bulnes self-requested a review July 5, 2022 00:26
gonzalo-bulnes
gonzalo-bulnes previously approved these changes Jul 5, 2022
Copy link
Contributor

@gonzalo-bulnes gonzalo-bulnes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Enum values

Nice, I'm happy to report that the statements that insert enum values are still there, but clearly were there in the old database too. So I have no more questions on that side. 👍

  • The diff between the current and the new migration path is clean. 🍏

Tests

We are not testing the outcome of that initial migration. Are we okay with that? (I am okay with that, since migrations are immutable, I don't see a lot of value in having automated test suites for them anyway.)

Nitpicking

Currently, the only migration is called d7c8af95bc8e_.py, maybe we could call it something like d7c8af95bc8e_intial_dababase_structure.py so that we don't have to modify it when adding new migration with nice names? (And to avoid that trailing underscore? 😬)


Making this an approving review (feel free to ignore the naming suggestion @cfm). If we're happy with the automated testing level, I think were ready to ship this 🙂

@cfm
Copy link
Member Author

cfm commented Jul 5, 2022

Thanks, @gonzalo-bulnes!

We are not testing the outcome of that initial migration. Are we okay with that? (I am okay with that, since migrations are immutable, I don't see a lot of value in having automated test suites for them anyway.)

What testing would you look for here beyond that done by

def test_alembic_head_matches_db_models(tmpdir):
"""
This test is to make sure that our database models in `db.py` are always in sync with the schema
generated by `alembic upgrade head`.
"""
? At the schema level, I suppose that's only testing the nth migration, and here n happens to be the initial 1. But we do test that each individual migration succeeds, too.

Currently, the only migration is called d7c8af95bc8e_.py, maybe we could call it something like d7c8af95bc8e_intial_dababase_structure.py so that we don't have to modify it when adding new migration with nice names? (And to avoid that trailing underscore? grimacing)

I tend to favor minimal Alembic version names (as opposed to descriptions). But in the revised c75bf83 I've done a quick git mv alembic/versions/d7c8af95bc8e_.py alembic/versions/d7c8af95bc8e_initial.py. :-)

cfm added 3 commits July 5, 2022 12:57
BREAKING CHANGE:  The "securedrop-client" entry-point will expect a
nonexistent or empty database and migrate it to new revision
d7c8af95bc8e.  It will error out if a database already exists at
another, prior revision.
Data-migration tests MUST not use models from securedrop_client.db,
which will always contain the definitions from Git's head, not those
corresponding to the Alembic head under test.
cfm added 7 commits July 5, 2022 12:57
Data-migration tests MUST be self-contained in order to test the schema
as of the Alembic version under test, not the current Git head.
The test scaffolding in tests.test_alembic shouldn't be coupled to the
implementation under test in securedrop_client.db.
The SQL statements generated by "sqlite3 svs.sqlite .dump" replicate the
database's tables and columns in the order in which they were originally
added.  For the current database schema, that's a hybrid of the order in
which they're defined in securedrop_client.db and subsequent additions
in migrations; in the new database schema, that's strictly their
ordering in securedrop_client.db.  The latter artificially inflates the
diff of comparing "sqlite3 svs.sqlite .dump" for the old and new
schemas.

For ease of review, here we attempt to replicate the current SQL
statements as closely as possible, for as small a diff as possible.
@cfm cfm force-pushed the 1500-squash-alembic branch from 08d73da to 658a04c Compare July 5, 2022 19:58
Copy link
Contributor

@gonzalo-bulnes gonzalo-bulnes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're absolutely right on the tests @cfm! I somehow got fixated on the individual migration test files. (Which are the one that were mostly problematic. 💡) Testing that the migrations were all applied is the test I think always makes sense to have! 🍏 Thanks for the pointers : )

Thanks for indulging me on the d7c8af95bc8e_initial.py. Truth is that the following might not happen often or at all, but I've found that having a rough idea of which migration does what can be a great help with troubleshooting. Especially if the automated naming pattern accommodates for having a descriptive slug, I'd rather have one however short it is ❤️

Thank you for your thorough work @cfm! I'll shipping this 🚀

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
database SQLite, SQLAlchemy, and data model release blocker
Projects
None yet
Development

Successfully merging this pull request may close these issues.

data migrations will fail if subsequent versions add columns to back-referenced tables
2 participants