Skip to content

Commit

Permalink
Improve message and documentation around moved data (#19453)
Browse files Browse the repository at this point in the history
* Improve message and documentation around moved data

In Airflow 2.2.2 we introduced a fix in #18953 where the corrupted
data was moved to a separate table. However some of our users
(rightly) might not have the context. We've never had anything
like that before, so the users who treat Airflow DB as
black-boxes might get confused on what the error means and what
they should do in this case.

You can see it in #19440 converted into discussion #19444 and #19421
indicate that the message is a bit unclear for users. This PR attempts to
improve that it adds `upgrading` section to our documentation and have the
message link to it so that rather than asking questions in the issues,
users can find context and answers what they should do in our docs.

It also guides the users who treat Airflow DB as "black-box" on how they
can use their tools and airflow db shell to fix the problem.

(cherry picked from commit de43fb3)
  • Loading branch information
potiuk authored and kaxil committed Nov 11, 2021
1 parent 2aed7c1 commit d8bb3cb
Show file tree
Hide file tree
Showing 5 changed files with 101 additions and 1 deletion.
3 changes: 2 additions & 1 deletion airflow/www/templates/airflow/dags.html
Original file line number Diff line number Diff line change
Expand Up @@ -56,7 +56,8 @@
Airflow found incompatible data in the <code>{{ original_table_name }}</code> table in the
metadatabase, and has moved them to <code>{{ moved_table_name }}</code> during the database migration
to upgrade. Please inspect the moved data to decide whether you need to keep them, and manually drop
the <code>{{ moved_table_name }}</code> table to dismiss this warning.
the <code>{{ moved_table_name }}</code> table to dismiss this warning. Read more about it
in <a href={{ get_docs_url("installing/upgrading.html") }}><b>Upgrading</b></a>.
{% endcall %}
{% endfor %}
{{ super() }}
Expand Down
1 change: 1 addition & 0 deletions docs/apache-airflow/installation/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -32,6 +32,7 @@ Installation
Installing from sources <installing-from-sources>
Installing from PyPI <installing-from-pypi>
Setting up the database <setting-up-the-database>
Upgrading <upgrading>

This page describes installations options that you might use when considering how to install Airflow.
Airflow consists of many components, often distributed among many physical or virtual machines, therefore
Expand Down
3 changes: 3 additions & 0 deletions docs/apache-airflow/installation/setting-up-the-database.rst
Original file line number Diff line number Diff line change
Expand Up @@ -30,3 +30,6 @@ not running while the upgrade is being executed.

In some deployments, such as :doc:`helm-chart:index`, both initializing and running the database migration
is executed automatically when Airflow is upgraded.

Sometimes, after the upgrade, you are also supposed to do some post-migration actions.
See :doc:`/installation/upgrading` for more details about upgrading and doing post-migration actions.
94 changes: 94 additions & 0 deletions docs/apache-airflow/installation/upgrading.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,94 @@
.. Licensed to the Apache Software Foundation (ASF) under one
or more contributor license agreements. See the NOTICE file
distributed with this work for additional information
regarding copyright ownership. The ASF licenses this file
to you under the Apache License, Version 2.0 (the
"License"); you may not use this file except in compliance
with the License. You may obtain a copy of the License at
.. http://www.apache.org/licenses/LICENSE-2.0
.. Unless required by applicable law or agreed to in writing,
software distributed under the License is distributed on an
"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
KIND, either express or implied. See the License for the
specific language governing permissions and limitations
under the License.
Upgrading Airflow to a newer version
------------------------------------

Why you need to upgrade
=======================

Newer Airflow versions can contain Database migrations so it is recommended that you run
``airflow db upgrade`` to Upgrade your Database with the schema changes in the Airflow version
you are upgrading to.

When you need to upgrade
========================

If you have a custom deployment based on virtualenv or Docker Containers, you usually need to run
the DB upgrade manually as part of the upgrade process.

In some cases the upgrade happens automatically - it depends if in your deployment, the upgrade is
built-in as post-install action. For example when you are using :doc:`helm-chart:index` with
post-upgrade hooks enabled, the database upgrade happens automatically right after the new software
is installed. Similarly all Airflow-As-A-Service solutions perform the upgrade automatically for you,
when you choose to upgrade airflow via their UI.

How to upgrade
==============

In order to manually upgrade the database you should run the ``airflow db upgrade`` command in your
environment. It can be run either in your virtual environment or in the containers that give
you access to Airflow ``CLI`` :doc:`/usage-cli` and the database.

Migration best practices
========================

Depending on the size of your database and the actual migration it might take quite some time to migrate it,
so if you have long history and big database, it is recommended to make a copy of the database first and
perform a test migration to assess how long the migration will take. Typically "Major" upgrades might take
longer as adding new features require sometimes restructuring of the database.

Post-upgrade warnings
=====================

Typically you just need to successfully run ``airflow db upgrade`` command and this is all. However in
some cases, the migration might find some old, stale and probably wrong data in your database and moves it
aside to a separate table. In this case you might get warning in your webserver UI about the data found.

Typical message that you might see:

Airflow found incompatible data in the <original table> table in the
metadatabase, and has moved them to <new table> during the database migration to upgrade.
Please inspect the moved data to decide whether you need to keep them,
and manually drop the <new table> table to dismiss this warning.

When you see such message, it means that some of your data was corrupted and you should inspect it
to determine whether you would like to keep or delete some of that data. Most likely the data was corrupted
and left-over from some bugs and can be safely deleted - because this data would not be anyhow visible
and useful in Airflow. However if you have particular need for auditing or historical reasons you might
choose to store it somewhere. Unless you have specific reasons to keep the data most likely deleting it
is your best option.

There are various ways you can inspect and delete the data - if you have direct access to the
database using your own tools (often graphical tools showing the database objects), you can drop such
table or rename it or move it to another database using those tools. If you don't have such tools you
can use the ``airflow db shell`` command - this will drop you in the db shell tool for your database and you
will be able to both inspect and delete the table.

Please replace ``<table>`` in the examples with the actual table name as printed in the warning message.

Inspecting a table:

.. code-block:: sql
SELECT * FROM <table>;
Deleting a table:

.. code-block:: sql
DROP TABLE <table>;
1 change: 1 addition & 0 deletions docs/spelling_wordlist.txt
Original file line number Diff line number Diff line change
Expand Up @@ -956,6 +956,7 @@ mediawiki
memberOf
mesos
metaclass
metadatabase
metarouter
metastore
mget
Expand Down

0 comments on commit d8bb3cb

Please sign in to comment.