Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix airflow db check-migrations #19597

Merged
merged 1 commit into from
Nov 18, 2021

Conversation

ashb
Copy link
Member

@ashb ashb commented Nov 15, 2021

This command broke after Define datetime and StringID column types centrally in migrations #19408 was merged, but we didn't notice as Github styles timedout checks badly.


^ Add meaningful description above

Read the Pull Request Guidelines for more information.
In case of fundamental code change, Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in UPDATING.md.

@ashb ashb requested review from kaxil, uranusjr and jedcunningham and removed request for kaxil November 15, 2021 19:39
Copy link
Member

@jedcunningham jedcunningham left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This fixes the check in my environment.

@github-actions
Copy link

The PR most likely needs to run full matrix of tests because it modifies parts of the core of Airflow. However, committers might decide to merge it quickly and take the risk. If they don't merge it quickly - please rebase it to the latest main at your convenience, or amend the last commit of the PR, and push it with --force-with-lease.

@github-actions github-actions bot added the full tests needed We need to run full set of tests for this PR to merge label Nov 15, 2021
@potiuk
Copy link
Member

potiuk commented Nov 15, 2021

Shoudl we add some test for that one? It might be similarly broken in the future?

airflow/utils/db.py Outdated Show resolved Hide resolved
@ashb
Copy link
Member Author

ashb commented Nov 16, 2021

Shoudl we add some test for that one? It might be similarly broken in the future?

Yes good call. Added a simple happy-path test for now.

@ashb ashb force-pushed the fix-db-check-migrations-command branch from 034f4ca to 85799a0 Compare November 16, 2021 14:45
@potiuk
Copy link
Member

potiuk commented Nov 16, 2021

Yes good call. Added a simple happy-path test for now.

That's exactly what I had in mind :)

This command broke after "Define datetime and StringID column types
centrally in migrations" was merged, but we didn't notice as Github
styles timedout checks badly.
@ashb ashb force-pushed the fix-db-check-migrations-command branch from 85799a0 to ed09464 Compare November 17, 2021 18:04
@potiuk potiuk closed this Nov 17, 2021
@potiuk potiuk reopened this Nov 17, 2021
@potiuk
Copy link
Member

potiuk commented Nov 17, 2021

Looks like prod image building was broken by a failed runner or smth (and it was needed to test the Helm Chart). Closed/reopened to re-run it

@uranusjr
Copy link
Member

Helm Chart tests have been failing consistently for a while now.

@uranusjr uranusjr merged commit 5763065 into apache:main Nov 18, 2021
@uranusjr uranusjr deleted the fix-db-check-migrations-command branch November 18, 2021 05:55
@ashb
Copy link
Member Author

ashb commented Nov 18, 2021

Right -- I didn't think it was likely.

@potiuk
Copy link
Member

potiuk commented Nov 18, 2021

I am able to reproduce it easily locally with breeze:

  1. ./breeze kind-cluster start
  2. ./breeze kind-cluster deploy
    it will fail eventually while waiting for webserver
  3. ./breeze kind-cluster k9s:

Screenshot from 2021-11-18 11-42-52

@potiuk
Copy link
Member

potiuk commented Nov 18, 2021

(wait - it was without the latest PR merged).

@potiuk
Copy link
Member

potiuk commented Nov 18, 2021

Retrying

@ashb
Copy link
Member Author

ashb commented Nov 18, 2021

My kind cluster is now working (https://kind.sigs.k8s.io/docs/user/known-issues/#failure-to-create-cluster-with-cgroups-v2 was my problem), also trying.

@potiuk
Copy link
Member

potiuk commented Nov 18, 2021

Yeah. Reproduced it with latest PR merged too (same sequence as above):

image

@potiuk
Copy link
Member

potiuk commented Nov 18, 2021

Confirmed I have the PR in the airflow installed in the image, so this is a "legit" error

@potiuk
Copy link
Member

potiuk commented Nov 18, 2021

AAAAAAHHHHH

{{ define "wait-for-migrations-command" }}
  {{/* From Airflow 2.0.0 this can become [airflow, db, check-migrations] */}}
  - python
  - -c
  - |
        import airflow
        import logging
        import os
        import time

        from alembic.config import Config
        from alembic.runtime.migration import MigrationContext
        from alembic.script import ScriptDirectory

        from airflow import settings

        package_dir = os.path.abspath(os.path.dirname(airflow.__file__))
        directory = os.path.join(package_dir, 'migrations')
        config = Config(os.path.join(package_dir, 'alembic.ini'))
        config.set_main_option('script_location', directory)
        config.set_main_option('sqlalchemy.url', settings.SQL_ALCHEMY_CONN.replace('%', '%%'))
        script_ = ScriptDirectory.from_config(config)

        timeout=60

        with settings.engine.connect() as connection:
            context = MigrationContext.configure(connection)
            ticker = 0
            while True:
                source_heads = set(script_.get_heads())

                db_heads = set(context.get_current_heads())
                if source_heads == db_heads:
                    break

                if ticker >= timeout:
                    raise TimeoutError("There are still unapplied migrations after {} seconds.".format(ticker))
                ticker += 1
                time.sleep(1)
                logging.info('Waiting for migrations... %s second(s)', ticker)
{{- end }}

@potiuk
Copy link
Member

potiuk commented Nov 18, 2021

Not the best idea to have it in the helm chart.

@kaxil
Copy link
Member

kaxil commented Nov 18, 2021

Whhhoooopsss yeah -- I think it was in Helm Chart as previous Airflow versions didn't have support for "airflow db wait-for-migrations" command

@ashb
Copy link
Member Author

ashb commented Nov 18, 2021

OH!

@ashb
Copy link
Member Author

ashb commented Nov 18, 2021

      initContainers:
      - args:
        - python
        - -c
        - |
          import airflow
          import logging
          import os
          import time

          from alembic.config import Config
          from alembic.runtime.migration import MigrationContext
          from alembic.script import ScriptDirectory

          from airflow import settings

          package_dir = os.path.abspath(os.path.dirname(airflow.__file__))
          directory = os.path.join(package_dir, 'migrations')
          config = Config(os.path.join(package_dir, 'alembic.ini'))
          config.set_main_option('script_location', directory)
          config.set_main_option('sqlalchemy.url', settings.SQL_ALCHEMY_CONN.replace('%', '%%'))
          script_ = ScriptDirectory.from_config(config)

          timeout=60

          with settings.engine.connect() as connection:
              context = MigrationContext.configure(connection)
              ticker = 0
              while True:
                  source_heads = set(script_.get_heads())

                  db_heads = set(context.get_current_heads())
                  if source_heads == db_heads:
                      break

                  if ticker >= timeout:
                      raise TimeoutError("There are still unapplied migrations after {} seconds.".format(ticker))
                  ticker += 1
                  time.sleep(1)
                  logging.info('Waiting for migrations... %s second(s)', ticker)

@ashb
Copy link
Member Author

ashb commented Nov 18, 2021

Yeah, that's the problem. The helm chart isn't using that command.

5 mins too slow.

@potiuk
Copy link
Member

potiuk commented Nov 18, 2021

So eventually the Helm errors ARE real errors :). Happy we have the tests.

@potiuk
Copy link
Member

potiuk commented Nov 18, 2021

I think for now reverting the change is the only choice and it will be a breaking change for anyone using Helm < 1.4.0 even if we fix it there (and run airflow db wait-for-migrations for 2+)

@ashb
Copy link
Member Author

ashb commented Nov 18, 2021

Something like this will do the job:

try:
   from airflow.cli.commands.db_command import check_migrations
except ImportError:
   # existing code

@potiuk
Copy link
Member

potiuk commented Nov 18, 2021

Yeah. But the problem is for those who already use the chart

@ashb
Copy link
Member Author

ashb commented Nov 18, 2021

We could (somehow) say the Airflow 2.3 will need Helm Chart >= 1.4 I guess? Far from ideal.

@kaxil
Copy link
Member

kaxil commented Nov 18, 2021

Not possible to make it backwards compatible somehow 😬 ? #19408 is mainly just to centralize the IDs and types that are used in Migration

@potiuk
Copy link
Member

potiuk commented Nov 18, 2021

Another option is smart workaround. For example what we could do:

in settings (from airlfow import settings) we could check if this import is run from "wait-for-db-migrations" container and if so - we could run the migration using the "check_migrations" and sys.exit(0).

Not the "cleanest" solution but should work.

@potiuk
Copy link
Member

potiuk commented Nov 18, 2021

or even in "init.py" of "airflow" package - it is imported in the first line

@potiuk
Copy link
Member

potiuk commented Nov 18, 2021

Should not be too difficult - we could check what was the command python was executed with :)

@potiuk
Copy link
Member

potiuk commented Nov 18, 2021

And as of chart 1.4.0 we would do it "properly".

@ashb
Copy link
Member Author

ashb commented Nov 18, 2021

Here is the "proper" fix anyway which should make tests go green #19676

@ashb
Copy link
Member Author

ashb commented Nov 18, 2021

And back-compat fix incoming.

@ashb
Copy link
Member Author

ashb commented Nov 18, 2021

#19677

@ashb
Copy link
Member Author

ashb commented Nov 18, 2021

This is fixed in main now, and 2.3 won't need a particular version of the Helm chart either.

@jedcunningham jedcunningham added the changelog:skip Changes that should be skipped from the changelog (CI, tests, etc..) label Nov 19, 2021
@jedcunningham jedcunningham added this to the Airflow 2.3.0 milestone Dec 7, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
changelog:skip Changes that should be skipped from the changelog (CI, tests, etc..) full tests needed We need to run full set of tests for this PR to merge
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants