Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

support CHECK constraints for autogenerate #508

Open
sqlalchemy-bot opened this issue Sep 25, 2018 · 42 comments
Open

support CHECK constraints for autogenerate #508

sqlalchemy-bot opened this issue Sep 25, 2018 · 42 comments

Comments

@sqlalchemy-bot
Copy link

Migrated issue, originally created by Alex Rothberg

If I start with the following and then uncomment the CheckConstraint, alembic does not handle the migration adding it or even print a warning. If I just have alembic add the entire class the migration to add the class DOES have the CheckConstraint.

class TitleDepartmentYear(db.Model):
    id = db.Column(UUID, default=uuid.uuid4, primary_key=True)
    name = db.Column(db.String(), nullable=False)

    __table_args__ = (
#        db.CheckConstraint(
#           sqa.func.char_length(name) > 0,
#          name="title_department_year_name_min_length",
#      ),
    )
@sqlalchemy-bot
Copy link
Author

Michael Bayer (@zzzeek) wrote:

note this is a documented limitation:

http://alembic.zzzcomputing.com/en/latest/autogenerate.html#what-does-autogenerate-detect-and-what-does-it-not-detect

@sqlalchemy-bot
Copy link
Author

Changes by Michael Bayer (@zzzeek):

  • added labels: low priority

@sqlalchemy-bot
Copy link
Author

Changes by Michael Bayer (@zzzeek):

  • removed labels: bug
  • added labels: feature

@sqlalchemy-bot
Copy link
Author

Changes by Michael Bayer (@zzzeek):

  • added labels: autogenerate - detection

@sqlalchemy-bot
Copy link
Author

Changes by Michael Bayer (@zzzeek):

  • changed title from "CheckConstraint not detected" to "support CHECK constraints for autogenerate"

@sqlalchemy-bot
Copy link
Author

Alex Rothberg wrote:

I started taking a stab at this. Does it looks like I am on the right track? Is there some missing feature in sqla, etc that will lead to a dead end here?

@comparators.dispatch_for("table")
def compare_table_level(autogen_context, modify_ops,
    schemaname, tablename, conn_table, metadata_table):

    metadata_check_constraints = set(
        ck for ck in metadata_table.constraints
        if isinstance(ck, sa_schema.CheckConstraint)
    )
    conn_check_constraints = set(
        ck for ck in conn_table.constraints
        if isinstance(ck, sa_schema.CheckConstraint)
    )

    for metadata_check_constraint in metadata_check_constraints:
        # barfs on something about quoting:
        metadata_check_constraint_name = sqla_compat._get_index_final_name(autogen_context.dialect, metadata_check_constraint)
        # i'm lazy; clearly this is not efficient:
        if metadata_check_constraint_name not in (conn_check_constraint.name for conn_check_constraint in conn_check_constraints):
            modify_ops.ops.append(ops.CreateCheckConstraintOp.from_constraint(metadata_check_constraint))

# needed to do this since I am writing this as a plugin and there is already a method registered:
del renderers._registry[(ops.CreateCheckConstraintOp, 'default')]

@renderers.dispatch_for(ops.CreateCheckConstraintOp)
def _add_check_constraint(autogen_context, constraint):
    # not quite right as I need to sub in query params:
    args = ['"%s"' % constraint.constraint_name, '"%s"' % constraint.table_name, '"%s"' % constraint.condition]
    return "%(prefix)screate_check_constraint(%(args)s)" % {
        'prefix': _alembic_autogenerate_prefix(autogen_context),
        'args': ", ".join(args)
    }

@sqlalchemy-bot
Copy link
Author

Michael Bayer (@zzzeek) wrote:

that's a fragment of a beginning, sure, it would need to be in the context of the source code in compare.py to really be reviewable.

With this feature, the much bigger job is writing out the tests, which is because the constraint autogen use case has an enormous number of edge cases that are also showstoppers if they are not working correctly, because within autogen, a failed comparison means false positives. A lot of applications use Alembic's comparison logic to generate schema diffs which need to come up clean if nothing has changed, so any area where a backend either doesn't provide CHECK constraint reflection, or the content of the constraint doesn't exactly match what is in the Python model, show-stopping bug. Additionally, SQLAlchemy has datatypes Enum and Boolean which automatically generate CHECK constraints and I'm not sure of how we will will handle accommodating for this, detecting which CHECK constraints need to be explicit and which do not.

When the compare indexes and uniques feature was introduced, I spent literally years fixing bugs of this nature, which is why you will see a very large number of tests in https://github.com/zzzeek/alembic/blob/master/tests/test_autogen_indexes.py. I just did a grep, found 31 distinct issues involving unique and index autogenerate in changelog.rst, see https://gist.github.com/zzzeek/530634270e5631b973fa1d70885aba5a.

So for this feature, writing a new test suite test_autogen_check.py is a big part of getting this to work. If the "CHECK" constraint autogen feature is added as an optional feature that's enabled in the environment, that would make it easier to release without as much risk of regressions.

@sqlalchemy-bot
Copy link
Author

Alex Rothberg wrote:

Fair. In the meantime I might start working on this as a plugin with the plan to get it mature enough that it can be submitted for addition to alembic proper.

I was wondering if there is a work around to:

del renderers._registry[(ops.CreateCheckConstraintOp, 'default')]

and what is the best way to get the "real" name for the check constraint? (ie sqla_compat._get_index_final_name doesn't seem to work for check constraints).

@sqlalchemy-bot
Copy link
Author

Michael Bayer (@zzzeek) wrote:

The rendering of a CHECK constraint object is separate from the detection which is the hard part, you can submit a rendering PR right now. additionally, the renderers.dispatch_for should be able to replace the previous entry, if it doesn't do that right now I can take a PR for that too.

@sqlalchemy-bot
Copy link
Author

Michael Bayer (@zzzeek) wrote:

sure we can reverse that

@adrianschneider94
Copy link

There would be one way to detect the changed check constraint, that would be quite easy to implement and would work in the most environments:
As far as I can see, the problem is, that for example in postgres 'consrc' is not available anymore. But instead, you could add a further alembic table or extend schema_version to hold the source of the check constraint. Then you can just compare the source in the database to the one of the SQLAlchemy metadata and update (drop and recreate) it when it changes. I'm currently doing that in a project.
However, it would require the developers/admins not to mess with that table, so I think this should be a optional behaviour.

@zzzeek
Copy link
Member

zzzeek commented May 14, 2020

I think you have a good idea there, however that's the beginning of a django-like system that is going to serialize the entire model that's being built so that it doesnt even have to use database reflection for autogenerate. I'd rather someone implement the real thing rather than a tiny piece of it for some specific use case.

@adrianschneider94
Copy link

I totally see that point.

I looked a bit further into this, for postgres it should be easy to implement in the 'alembic' way:

select pgc.conname as constraint_name,
       ccu.table_schema as table_schema,
       ccu.table_name,
       ccu.column_name,
       pg_get_constraintdef(pgc.oid)
from pg_constraint pgc
join pg_namespace nsp on nsp.oid = pgc.connamespace
join pg_class  cls on pgc.conrelid = cls.oid
left join information_schema.constraint_column_usage ccu
          on pgc.conname = ccu.constraint_name
          and nsp.nspname = ccu.constraint_schema
where pgc.contype ='c'
order by pgc.conname;

gives the check constraint definitions associated with tables and columns.

I don't know about the internals of alembic. Could you tell me where to dig into it, if I wanted to implement it (for postgres)?

@adrianschneider94
Copy link

And for the comparison there is also an easy approach if you use an external library (pglast):

from pglast import parse_sql, _remove_stmt_len_and_location

statement_1 = "check article_number is null or company_id is not null" # what I wrote in SQLAlchemy
statement_2 = "CHECK (((article_number IS NULL) OR (company_id IS NOT NULL)))" # what I got from the query above

ast1 = parse_sql("SELECT " + statement_1[6:])
ast2 = parse_sql("SELECT " + statement_2[6:])

_remove_stmt_len_and_location(ast1)
_remove_stmt_len_and_location(ast2)

assert ast1 == ast2

@zzzeek
Copy link
Member

zzzeek commented May 14, 2020

sqlalchemy does reflect the constraint definition already, it's just that it doesn't necessarily match letter for letter, so if the pglast thing is useful that may be an option. seems like it is only for postgresql ?

@adrianschneider94
Copy link

Yes, unfortunately it only targets postgres.
But I guess that:

  • one could find AST parsers for the other dialects, at least in C
  • pglast is compatible to most statements that would be used in a check constraint. And otherwise it would raise an exception and let the user implement it manually

@zzzeek
Copy link
Member

zzzeek commented May 14, 2020

oh @lelit wrote this. friend of the show. We can try sqlparse here as well, and pglast optional plugin?

@adrianschneider94
Copy link

I looked into sqlparse for some minutes and it does not build a proper AST, so I'm sceptical that it would work. @lelit started pglast because of the problems with sqlparse as far as I can see from the README, so I think a postgres dedicated solution (probably as a plugin) that works for other dialects in most cases (as long as the queries are easily accessible) would be the way I would personally take.

@zzzeek
Copy link
Member

zzzeek commented May 14, 2020

well if it works it's a start and maybe @lelit wants to expand the scope of it :)

@lelit
Copy link

lelit commented May 14, 2020

I just quickly read the above, but sure, I'm willing to expand it's scope, if I can... obviously once I have a better idea of what you mean with that 😄
Feel free to ask/explain/suggest anything.

@adrianschneider94
Copy link

Hi @lelit,

what I have seen so far from pglast looks great, thanks for that!

@zzzeek is talking about extending the scope from postgres to other dialects. I guess that's quite a big task, but it would solve the check constraint issue completely if it works.

Do you see any problems ahead (for postgres only)? The task is to compare SQL statements to see if they are identical. The few examples I tried worked flawlessly.

@lelit
Copy link

lelit commented May 14, 2020

I spent considerable time trying to use sqlparse for my need, but although it does quite a good job considering it handles a wide spectrum of SQL syntaxes, I didn't find it "precise" enough, and as said by Adrian the internal model its "parser" builds is not malleable enough.
OTOH, yes, pglast is very PG specific: the underlying libpgquery is basically the very same parser used by PG itself, so it has access to everything comes out from that.

@zzzeek is talking about extending the scope from postgres to other dialects. I guess that's quite a big task, but it would solve the check constraint issue completely if it works.

No, that's definitely not doable with (current) pglast: one would need to implement the equivalent of libpgquery for other dialects, and once there, implement a new set of "printers" (in pglast parlance, a function that takes an AST and serialize it into a corresponding SQL statement). While I can easily do the latter, I'm afraid that "wrapping" other dialects parser into something usable is beyond my current cycles availability, sorry.

Do you see any problems ahead (for postgres only)? The task is to compare SQL statements to see if they are identical. The few examples I tried worked flawlessly.

No, I would say that's totally doable: pglast already supports a lot of PG syntax, and expanding it to cover what is currently missing is quite easy.

@olirice
Copy link
Contributor

olirice commented May 27, 2020

alembic_utils handles the "comparing sql stored in strings" problem by rendering the entity in a dummy schema, and comparing the reflected definition from the dummy schema to the reflected definition in the target schema.

It avoids having to parse every dialect of sql and would work on any database that supports constraint inspection. It is a kinda slow for larger projects but I'm not aware of a use case where autogenerating migrations is done in a tight loop :)

@adrianschneider94
Copy link

Ahh, that's smart 👌🏼

@zzzeek
Copy link
Member

zzzeek commented May 28, 2020

that is an interesting idea.

@zzzeek
Copy link
Member

zzzeek commented May 28, 2020

we sort of do a bit of that for datatypes now, rather than comparing each element of a reflected datatype with what the model has we compare the rendered strings. but that doesn't include adding an extra round trip to the DB. the "dummy schema" part makes me a little nervous. if you are running on your company's giant DB2 staging system it's likely not that simple to create a new schema on the fly and then drop it.

@olirice
Copy link
Contributor

olirice commented May 28, 2020

I'm not familiar with DB2, but if the dummy schema is the only part that makes you squeamish, a rolled back transaction would work too.

The example below tests if the local definition of a function matches the database's definition.

###
# SETUP
###

def get_sql_function_def(sess, schema, name):
    """Look up the SQL definition of a function from a database"""

    return sess.execute("""
        select
            pg_get_functiondef(p.oid)
        from
            pg_proc p
            join pg_namespace n
                on p.pronamespace = n.oid
        where
            n.nspname = 'public'
            and p.proname = 'to_upper'
        limit 1    
    """).fetchone()[0]

# Create an "existing" function
sess.execute("""
    create or replace function to_upper(text)
    returns text
    as
        $$ select upper($1); $$
    language sql;
""")
sess.commit()


###
# CHECK IF UPDATE IS NECESSARY
###

sess.begin(subtransactions=True)

try:
    # Definition of the entity in the database 
    existing_definition = get_sql_function_def(sess, 'public', 'to_upper') 

    # Render render the local definition and execute in a transaction
    # Note that the function's body was changed
    sess.execute("""
        create or replace function to_upper(text)
        returns text
        as
            $$ select upper($1) || 'abc' ; $$
        language sql;
    """)

    # Get the new definition
    local_definition = get_sql_function_def(sess, 'public', 'to_upper') 

    # Check if the existing definition and local definition match
    requires_update: bool = existing_definition != local_definition

finally:
    sess.rollback()

print(requires_update) # True

It does require that the target database supports a read your own writes transaction isolation level

@zzzeek
Copy link
Member

zzzeek commented May 28, 2020

I'm not familiar with DB2, but if the dummy schema is the only part that makes you squeamish, a rolled back transaction would work too.

only if there's transactional DDL available which is also not a given. Oracle for example does not support it.

@olirice
Copy link
Contributor

olirice commented May 28, 2020

wrt to your comment

if you are running on your company's giant DB2 staging system it's likely not that simple to create a new schema on the fly and then drop it

Did you mean that a company wouldn't be happy that it was happening, the end user likely wouldn't have the permissions required, or that there's something specific to DB2 that makes short lived schemas particularly dangerous?

@zzzeek
Copy link
Member

zzzeek commented May 28, 2020

I mean you wouldnt have privileges to CREATE SCHEMA. that's a less-often granted privilege.

@GuillaumeDesforges
Copy link

Any news? That would be an awesome feature.

@zzzeek
Copy link
Member

zzzeek commented Apr 22, 2021

alembic is reliant upon external contributors to keep features like these going. we don't have many resources to work on alembic outside of regressions.

@GuillaumeDesforges
Copy link

That's understandable ofc

About @olirice suggestion above, I understand that it is an implementation which makes the feature require a certain set of privileges.

Still, would alembic accept a PR implemented like that?

@zzzeek
Copy link
Member

zzzeek commented Apr 22, 2021

the thing with "dummy schema"? no, i think that's good for alembic-utils but over here that would dramatically change the workflow we put onto users and I'm not in a position to support that. also users already complain about autogenerate needing a database at all (#792)

the long term "migrations work great" idea is to emulate more of what django does with the kind of thing sketched out in #117 which does not need any database to do autogenerate, it compares the model against itself, essentially.

@ManuelBuri
Copy link

Any news? That would be an awesome feature 💪

@Tishka17
Copy link

Tishka17 commented Nov 8, 2023

From my perspective it is better to detect added/removed check constraints than detect nothing.

I tried to do it on my own and would like to know how to improve this PR so it can be merged. There no tests yet as I've just started working on it and want to clarify if the overall direction is acceptable.

https://github.com/Tishka17/alembic/pull/1/files

Currently I see, that there is no deferrable/initially support in CreateCheckConstraintOp and Inspector.get_check_constraints. Probably they should be added so we can check them too.

@CaselIT
Copy link
Member

CaselIT commented Nov 8, 2023

Hi, you should probably do a pull request to this repository.

Just had a quick glance there, and I don't think you can assume that check constraint have a name, especially the ones that are defined in the metadata

@Tishka17
Copy link

I see 3 cases for the names:

  1. Name is explicitly set
  2. Name is generated according to the naming convention
  3. Neither convention nor name are set.

In cases 1,2 constraint has the name in migrations CreateTable, but in the last one database generates the name on its own.

If we are checking only names, but not SQL expression, we cannot work in case 3. Should we log a warning and skip all the comparison?

@CaselIT
Copy link
Member

CaselIT commented Nov 14, 2023

in case 2 if I'm not mistaken the name of the constraint object in the metadata remains None, so it's not obvious that name would be used by a constraint also in this case.

Other than that, I think those are the cases to handle. For case 3 there is also the option that also in db the constraint has no name (IIRC sqlite can have unnamed constraints).

@m-s-abeer
Copy link

m-s-abeer commented Jan 24, 2024

Is anyone working on it? It will be a nice feature.
I was trying to add a check for string matching from a list. Alembic couldn't detect anything regarding this.

@valq7711
Copy link

Why not implement at least a half-solution: if it's a new table (op.create_table) - add sa.CheckConstraint along with others.
It doesn't even need to deal with diff-stuff at constraints level.
Something is better than nothing!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests