Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix: Mysql 5.7 id utf8mb3 #14535

Merged
merged 1 commit into from
Aug 24, 2021
Merged

Fix: Mysql 5.7 id utf8mb3 #14535

merged 1 commit into from
Aug 24, 2021

Conversation

hbrls
Copy link
Contributor

@hbrls hbrls commented Mar 1, 2021

This may not be a ready PR. The code diff best expresses my ideas.

I'm initializing a mysql 5.7 db from empty using hub.docker.com, apache/airflow:2.0.1-python3.8. The migration scripts throw errors of (1071, 'Specified key was too long; max key length is 767 bytes'). AIRFLOW__CORE__SQL_ENGINE_COLLATION_FOR_IDS: utf8mb3_general_ci does not work. (ref #7570).

After go through the issues and code, I had an assumption that utf8mb3 config was not applied to all places that it should. I applied those migration scripts manually, then it works.

I'm not confident about how the migration scripts work.

  1. Should it be generated by some tools and not modified manually?
  2. Some scripts modify existed columns. Can I directly fix the create table scripts and skip the modify table ones? How does this affect the old users wants to upgrade?

Another question.

I cannot fix the line bop.create_primary_key('pk_xcom', ['dag_id', 'task_id', 'key', 'execution_date']). My DBA always told me not to use too many columns for keys. Why in this case dag_id and task_id is not enough?

EDIT: Fixed in master.

@boring-cyborg
Copy link

boring-cyborg bot commented Mar 1, 2021

Congratulations on your first Pull Request and welcome to the Apache Airflow community! If you have any issues or are unsure about any anything please check our Contribution Guide (https://github.com/apache/airflow/blob/master/CONTRIBUTING.rst)
Here are some useful points:

  • Pay attention to the quality of your code (flake8, pylint and type annotations). Our pre-commits will help you with that.
  • In case of a new feature add useful documentation (in docstrings or in docs/ directory). Adding a new operator? Check this short guide Consider adding an example DAG that shows how users should use it.
  • Consider using Breeze environment for testing locally, it’s a heavy docker but it ships with a working Airflow and a lot of integrations.
  • Be patient and persistent. It might take some time to get a review or get the final approval from Committers.
  • Please follow ASF Code of Conduct for all communication including (but not limited to) comments on Pull Requests, Mailing list and Slack.
  • Be sure to read the Airflow Coding style.
    Apache Airflow is a community-driven project and together we are making it better 🚀.
    In case of doubts contact the developers at:
    Mailing List: [email protected]
    Slack: https://s.apache.org/airflow-slack

@kurtqq
Copy link
Contributor

kurtqq commented Mar 22, 2021

migration scripts are created with alembic.
you should change the target branch to Master.

@hbrls hbrls force-pushed the fix-mysql-utf8mb3 branch from 22e8736 to 355a8fb Compare April 10, 2021 16:22
@hbrls hbrls changed the base branch from v2-0-stable to master April 10, 2021 16:24
@hbrls
Copy link
Contributor Author

hbrls commented Apr 11, 2021

The docker image apache/airflow:2.0.1-python3.8 on dockerhub.com is quite behind master, which fixed some of the sql. Would you please release a newer tag?

@uranusjr
Copy link
Member

There is a schedule to those; the next release will happen next week, if I understand correctly.

@github-actions
Copy link

This pull request has been automatically marked as stale because it has not had recent activity. It will be closed in 5 days if no further activity occurs. Thank you for your contributions.

@github-actions github-actions bot added the stale Stale PRs per the .github/workflows/stale.yml policy file label May 27, 2021
@hbrls hbrls force-pushed the fix-mysql-utf8mb3 branch from 355a8fb to 5329ecf Compare May 31, 2021 02:43
@hbrls hbrls force-pushed the fix-mysql-utf8mb3 branch 3 times, most recently from 987884b to 66f85dc Compare May 31, 2021 02:49
@hbrls
Copy link
Contributor Author

hbrls commented May 31, 2021

I'm working on the rebase to master. Sorry for false triggers.

@hbrls hbrls force-pushed the fix-mysql-utf8mb3 branch 4 times, most recently from be3a385 to ca910df Compare May 31, 2021 05:46
Copy link
Member

@potiuk potiuk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you please rebase it and fix my comments? I think this should be fine with utf8mb4

@@ -110,7 +110,8 @@ def upgrade():
bop.drop_index('idx_xcom_dag_task_date')
# mssql doesn't allow primary keys with nullable columns
if conn.dialect.name != 'mssql':
bop.create_primary_key('pk_xcom', ['dag_id', 'task_id', 'key', 'execution_date'])
#bop.create_primary_key('pk_xcom', ['dag_id', 'task_id', 'key', 'execution_date'])
bop.create_primary_key('pk_xcom', ['dag_id', 'task_id', 'key'])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why this

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My mysql throws when creating the PK longer than 700. I have no idea how to fix it properly.

But since it's not related to utf8mb3,I will revert this line.

@@ -38,6 +38,8 @@
depends_on = None


print(COLLATION_ARGS)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess we shoudl remove it ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that's for debug. I will revert the line.

@hbrls hbrls force-pushed the fix-mysql-utf8mb3 branch from ca910df to 7ef8f47 Compare June 1, 2021 02:16
@mik-laj mik-laj reopened this Jun 8, 2021
@mik-laj
Copy link
Member

mik-laj commented Jun 8, 2021

It looks like a bug. I reopened this PR.

@github-actions github-actions bot removed the stale Stale PRs per the .github/workflows/stale.yml policy file label Jun 9, 2021
@github-actions
Copy link

This pull request has been automatically marked as stale because it has not had recent activity. It will be closed in 5 days if no further activity occurs. Thank you for your contributions.

@github-actions github-actions bot added the stale Stale PRs per the .github/workflows/stale.yml policy file label Aug 11, 2021
@@ -29,6 +29,8 @@
from sqlalchemy.engine.reflection import Inspector

# revision identifiers, used by Alembic.
from airflow.models.base import COLLATION_ARGS

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This line should go above the comment. In other files as well.

@github-actions github-actions bot closed this Aug 21, 2021
@siddharthvp
Copy link

Should this have been closed? This does indeed seem to be an issue and the patch would likely help with MariaDB compatibility as well (#16907).

@potiuk potiuk reopened this Aug 21, 2021
@potiuk
Copy link
Member

potiuk commented Aug 21, 2021

@hbrls - do you still want to rebase and merge this one? We are also just about to merge the #17729 to automatically default to utf8mb3 for MySQL (and mariadb) and I think it would be worthwile to add the collation to ids. or if not @siddharthvp maybe you would like to take that over and lead to completion?

@hbrls hbrls force-pushed the fix-mysql-utf8mb3 branch from 7ef8f47 to c16bece Compare August 21, 2021 11:39
@hbrls
Copy link
Contributor Author

hbrls commented Aug 21, 2021

@potiuk I've fixed the comment position. And rebased to latest main.

@github-actions
Copy link

The PR most likely needs to run full matrix of tests because it modifies parts of the core of Airflow. However, committers might decide to merge it quickly and take the risk. If they don't merge it quickly - please rebase it to the latest main at your convenience, or amend the last commit of the PR, and push it with --force-with-lease.

@github-actions github-actions bot added the full tests needed We need to run full set of tests for this PR to merge label Aug 21, 2021
@potiuk
Copy link
Member

potiuk commented Aug 21, 2021

Some static check failed. Highly recommend installing pre-commits: https://github.com/apache/airflow/blob/main/STATIC_CODE_CHECKS.rst#pre-commit-hooks

@hbrls hbrls force-pushed the fix-mysql-utf8mb3 branch 3 times, most recently from 4bf517e to ccffe18 Compare August 21, 2021 15:25
@potiuk
Copy link
Member

potiuk commented Aug 21, 2021

I think you still have static checks problem. Again, I hearltily recommend installing pre-commits and running them locally on your changes before pushing (if you install and use pre-commit it will happen automatically - you won't be able to commit until the tests pass. It saves a lot of time for iteration - for both you, and committers looking at your change)

@github-actions github-actions bot removed the stale Stale PRs per the .github/workflows/stale.yml policy file label Aug 22, 2021
@hbrls hbrls force-pushed the fix-mysql-utf8mb3 branch from ccffe18 to 9fad619 Compare August 22, 2021 03:52
@hbrls
Copy link
Contributor Author

hbrls commented Aug 22, 2021

I've done the pre-commit except _build_images.sh. I guess it's ok now.

@potiuk
Copy link
Member

potiuk commented Aug 22, 2021

Would you please rebase to latest main @hbrls ?

@hbrls hbrls force-pushed the fix-mysql-utf8mb3 branch from 9fad619 to dc96815 Compare August 23, 2021 01:13
@hbrls
Copy link
Contributor Author

hbrls commented Aug 23, 2021

Would you please rebase to latest main @hbrls ?

Done.

@potiuk potiuk merged commit 5019916 into apache:main Aug 24, 2021
@boring-cyborg
Copy link

boring-cyborg bot commented Aug 24, 2021

Awesome work, congrats on your first merged pull request!

@potiuk potiuk mentioned this pull request Jul 20, 2022
2 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
full tests needed We need to run full set of tests for this PR to merge
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants