Schema migrations #1419

redshiftzero · 2016-10-11T21:23:07Z

Some new features (e.g. #1170) are going to require schema migrations. We should integrate e.g. sqlalchemy-migrate to make it easier for us to make changes to the databases on the production instances.

The text was updated successfully, but these errors were encountered:

redshiftzero · 2016-10-21T20:07:43Z

Getting this done is blocking some new and frequently requested features (e.g. #1422 - the tagging of sources and submissions and #1359 - the assignment of particular users to address a given source), so I've been testing how to go about doing this in the feature branch db-migrations. In that branch, the first steps of integrating database migration support into SecureDrop using SQLAlchemy-migrate are implemented. Here I'll describe in more detail how this might work and I welcome feedback or suggestions.

Requirements

This will need to work for both new installs and databases already created in the production instances
This should not require manual intervention by administrators for future migrations, so future versions of SecureDrop should check if a migration is needed and perform it if is necessary
If a migration fails for some reason (more below) and the database cannot be upgraded, then we will need to keep the SD instance running on the existing database version using the existing app code.

How it currently works

A new directory SECUREDROP_DATA_ROOT/db_repository tracks database versioning data.
In new installs, this versioning directory is set up at database initialization time - the code for this has been added to init_db() in db.py.
In existing installs, the versioning directory should be set up at first run of the app after update. I was thinking that this could be done in a one-off Ansible task that is run by admins for the 0.4 release (something we are going to do anyway). A new Ansible task copies the SECUREDROP_DATA_ROOT/db_repository location into the existing config.py so db.py can find it.
When a database migration needs to be performed, migrate() in db.py can be executed to perform a migration. First, the database is backed up, and then the migration is executed. This is something that pre-release can be tested to make sure that the migrations that SQLAlchemy-migrate generates work and no problems on the SQL side are encountered.

Potential Issues

We should enumerate as many possible issues that could arise and handle them if we can:

Insufficient disk space to backup the database
- In this situation, the instance should continue to run on the old version of the code, let the administrators know they need to resolve the disk space issue, and then a backup and upgrade can be attempted again in the future when the issue is resolved.
Any other failure during the migration occurs
- Pre-release testing should be done in order to ensure that no problems with the generated migrations to minimize this possibility
- However, if it does happen, SD should copy the backup back, and continue to run on the old version of the code using the existing database.

When to migrate

One thing I'm trying to figure out is when to best attempt to perform a migration. It could be done during postinst. Another option is to have the SD app code check the db models upon launch of the Flask app, see if they have changed, and if so do the migration during app initialization. I think the former is probably the better way to handle this but thoughts welcome on this.

redshiftzero · 2017-03-03T01:45:06Z

Note: I just checked the diff between 0.3.11 and develop, and there are already merged changes to the database models. I'm fine pushing e.g. #1422 to post-0.4, but in order to remove #1419 from 0.4 we will need to back out these changes in develop

redshiftzero · 2017-03-09T19:27:43Z

After discussion, we are going to push this until the next release.

redshiftzero · 2017-03-09T19:28:15Z

(Again, this requires the backing out of changes from develop)

heartsucker · 2017-10-07T17:01:21Z

I would like to add a second proposition instead of using sqlalchemy-migrate. One thing about it that feels awkward is that you have to duplicate the models to both the models.py in the main application and the migration scripts. The docs even say to do this:

To avoid the above problem, you should use SQLAlchemy schema reflection as shown above or copy-paste your table definition into each change script rather than importing parts of your application.

Also, since it's possible we might only ever support Postgres (#2225), using the SQLAlchemy automagic isn't even necessary.

For example, the Rust ORM (diesel) takes the lazy approch. There's just a sorted dir with {up,down}.sql that are run against the DB. I like this better because I want full control over the generated SQL, and I don't want to have have to print each generated migration and read over it to ensure it does what I want.

migrations/
└── 0001-init.sql
...

And the DB is versioned by the numeric prefix. I actually wrote this once for an app because we kept having problems with sqlalchemy-migrate and alembic.

import os
from flask_sqlalchemy import SQLAlchemy
from os import path
from sqlalchemy import text
from sqlalchemy.exc import InternalError, OperationalError, ProgrammingError

db = SQLAlchemy()


def migrate(migrations_dir):
    version = _initialize_db()
    migrations = [(v, s)
                  for (v, s) in _list_migrations(migrations_dir)
                  if v > version]
    for (version, sql) in migrations:
        try:
            db.session.execute(sql)
            sql = text('UPDATE db_version SET version = :version') \
                .bindparams(version=version)
            db.session.execute(sql)
            db.session.commit()
        except Exception:
            db.session.rollback()
            raise


def _initialize_db() -> int:
    version = None
    try:
        sql = text('SELECT version FROM db_version')
        version = list(db.session.execute(sql))[0][0]
    except (InternalError, OperationalError, ProgrammingError):
        db.session.rollback()
        sql = text('CREATE TABLE db_version (version INT)')
        db.session.execute(sql)
    if version is None:
        sql = text('INSERT INTO db_version (version) VALUES (0)')
        db.session.execute(sql)
        version = 0
    db.session.commit()
    return version


def _list_migrations(migrations_dir) -> list:
    migrations = []
    for migration in os.listdir(migrations_dir):
        full_path = path.join(migrations_dir, migration)
        try:
            version = int(migration.split('-')[0])
        except ValueError:
            continue
        with open(full_path, 'r') as f:
            sql = f.read()
        migrations.append((version, sql))
    return list(sorted(migrations, key=lambda x: x[0]))

This is one of those times we have to ask about whether or not it makes sense to add a dependency or if we want to just roll our own code. This is only 60 lines of Python + SQL versus having to fiddle with a DSL to get it to generate the code we actually want.

The downside to this method is that it doesn't allow arbitrary Python code in the migrations, but realistically, if you need that you're probably doing something overly complicated with your migration. i can't think of a time at work we (at work) have actually needed code for a migration.

heartsucker · 2017-10-23T05:39:01Z

Pinging @redshiftzero for thoughts on this.

heartsucker · 2018-01-14T14:30:03Z

I'm working on this now, and will be using Alembic. I'm also setting it up so that the first migration will optionally dump everything into Postgres if the SQLite database exists. This will cover new and existing instances.

heartsucker · 2018-01-14T15:03:36Z

Blocked by #2866.

redshiftzero · 2018-03-30T17:45:21Z

Here's a proposed breakdown for this ticket:

Integrate autogenerated Alembic migrations support
Add database backup in postinst
Add alembic migration in postinst
Add realistic data upload script for creating a prod-like database for testing/QA of database migrations

heartsucker · 2018-03-31T11:37:56Z

Where do we want the database backups saved? What's the naming scheme. I propose gzipped backups as /var/lib/securedrop/backups/YYYY-MM-DD-sqlite.sql.tgz.

heartsucker · 2018-03-31T11:43:45Z

Also this may go in to the current PR or may be a second PR since it will be even larger than the original, but the gist is.

For each revision A -> B there's a .sql file (because we can't use anything from models.py) that loads some data into the DB and then a test that checks some certain expected behavior works (uses can still log in, FKs aren't broken, whatever).

We have a test the does all the migrations up to A, loads the SQL, runs some sanity checks. Another test checks the downgrade from B -> A and might even run sqldump on A and then A' where A' == A -> B -> A. If A == A', this would check that alembic upgrade +1 followed by alembic downgrade -1 is idempotent.

This would mean that every migration would need to have a set up dummy data generated for it as well some manual tests written.

heartsucker · 2018-03-31T18:02:34Z

Further note. Should we do a DB downgrade when we downgrade the SD app version? This could be in the maintainer scripts.

eloquence · 2018-04-05T01:22:42Z

NB: We've moved this off the 0.7 (May 8) milestone, given that we don't think we'll have quite enough time to properly QA this for the release, but we should be able to ship it with 0.8, and will continue to work on it through this sprint and following ones.

redshiftzero mentioned this issue Oct 21, 2016

Added source-assignment functionality #1359

Closed

redshiftzero added this to the 0.4 milestone Oct 21, 2016

heartsucker mentioned this issue Dec 21, 2016

Add password validation during user creation #980

Closed

psivesely added ops/deployment app labels Mar 2, 2017

psivesely mentioned this issue Mar 2, 2017

Handle Submissions with no Sources in currently running instances. #1189

Closed

psivesely assigned psivesely, garrettr and redshiftzero Mar 3, 2017

redshiftzero removed this from the 0.4 milestone Mar 9, 2017

This was referenced Apr 26, 2017

Remove journalist assignment #1671

Merged

Backed out changes for the release of SecureDrop 0.4 #1658

Closed

redshiftzero unassigned garrettr, psivesely and redshiftzero May 10, 2017

redshiftzero added this to the 0.4.3 milestone Jun 6, 2017

redshiftzero mentioned this issue Aug 2, 2017

Re-implement customizable notices on source interface #1967

Open

redshiftzero mentioned this issue Sep 28, 2017

db.Source.journalist_designation is not a unique field #2043

Closed

redshiftzero mentioned this issue Nov 1, 2017

Journalist 2FA setup should provide backup codes #2287

Open

heartsucker self-assigned this Jan 14, 2018

heartsucker mentioned this issue Jan 14, 2018

Move custom SQLAlchemy session management to Flask-SQLAlchemy #2866

Closed

heartsucker mentioned this issue Jan 24, 2018

Meta issue: No new tests until we have cleaned up the existing test infrastructure #2877

Closed

heartsucker mentioned this issue Feb 20, 2018

Use passlib for password hashing #2918

Closed

redshiftzero mentioned this issue Feb 21, 2018

Enable administrators to edit all text on source interface #3044

Open

redshiftzero modified the milestones: 0.6, 0.7 Feb 27, 2018

heartsucker mentioned this issue Mar 31, 2018

Use alembic to version control sqlite database #3211

Merged

3 tasks

eloquence modified the milestones: 0.7, Long Term Product Backlog Apr 5, 2018

eloquence modified the milestones: Long Term Product Backlog, 0.8 Apr 6, 2018

eloquence added the epic Meta issue tracking child issues label Apr 6, 2018

heartsucker mentioned this issue Apr 7, 2018

Add test harness to ensure alembic up/downgrades work as intended #3244

Closed

redshiftzero closed this as completed in #3211 Jun 7, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Schema migrations #1419

Schema migrations #1419

redshiftzero commented Oct 11, 2016

redshiftzero commented Oct 21, 2016

redshiftzero commented Mar 3, 2017

redshiftzero commented Mar 9, 2017 •

edited

Loading

redshiftzero commented Mar 9, 2017 •

edited

Loading

heartsucker commented Oct 7, 2017 •

edited

Loading

heartsucker commented Oct 23, 2017

heartsucker commented Jan 14, 2018 •

edited

Loading

heartsucker commented Jan 14, 2018

redshiftzero commented Mar 30, 2018

heartsucker commented Mar 31, 2018

heartsucker commented Mar 31, 2018 •

edited

Loading

heartsucker commented Mar 31, 2018

eloquence commented Apr 5, 2018 •

edited

Loading

Schema migrations #1419

Schema migrations #1419

Comments

redshiftzero commented Oct 11, 2016

redshiftzero commented Oct 21, 2016

Requirements

How it currently works

Potential Issues

When to migrate

redshiftzero commented Mar 3, 2017

redshiftzero commented Mar 9, 2017 • edited Loading

redshiftzero commented Mar 9, 2017 • edited Loading

heartsucker commented Oct 7, 2017 • edited Loading

heartsucker commented Oct 23, 2017

heartsucker commented Jan 14, 2018 • edited Loading

heartsucker commented Jan 14, 2018

redshiftzero commented Mar 30, 2018

heartsucker commented Mar 31, 2018

heartsucker commented Mar 31, 2018 • edited Loading

heartsucker commented Mar 31, 2018

eloquence commented Apr 5, 2018 • edited Loading

redshiftzero commented Mar 9, 2017 •

edited

Loading

redshiftzero commented Mar 9, 2017 •

edited

Loading

heartsucker commented Oct 7, 2017 •

edited

Loading

heartsucker commented Jan 14, 2018 •

edited

Loading

heartsucker commented Mar 31, 2018 •

edited

Loading

eloquence commented Apr 5, 2018 •

edited

Loading