Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix bugs in URI constructor for MySQL connection #24320

Merged
merged 2 commits into from
Jun 19, 2022

Conversation

MaksYermak
Copy link
Contributor

In this PR I have fixed several bugs in the get_uri function in connection.py file. And also I have created unit tests for most of the use cases for this function.

Co-authored-by: Maksim Yermakou [email protected]
Co-authored-by: Wojciech Januszek [email protected]
Co-authored-by: Lukasz Wyszomirski [email protected]


^ Add meaningful description above

Read the Pull Request Guidelines for more information.
In case of fundamental code change, Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in a newsfragement file, named {pr_number}.significant.rst, in newsfragments.

@potiuk
Copy link
Member

potiuk commented Jun 8, 2022

You do realize that "model/connection" is used by more than MySQL?

Are you sure you are not breaking other usages?

It seems it does break a lot of tests at least. I'd be really cautious to modify "airflow core" connection method to fix MySQL.. This looks (without going into details) like a need to override the method in MySQL connection to do mysql-specific stuff rather than change behaaviours of all other connections.

@potiuk
Copy link
Member

potiuk commented Jun 8, 2022

I am not saying it's wrong - but just - that - be cautions and try to understand how this "shared" method is used elsewhere before attempting to change it.

@MaksYermak
Copy link
Contributor Author

You do realize that "model/connection" is used by more than MySQL?

Are you sure you are not breaking other usages?

It seems it does break a lot of tests at least. I'd be really cautious to modify "airflow core" connection method to fix MySQL.. This looks (without going into details) like a need to override the method in MySQL connection to do mysql-specific stuff rather than change behaaviours of all other connections.

@potiuk I think my change in get_uri method shouldn’t break something, conversely this change will fix current issue. Our get_uri function should return DB connection URL for create_engine function in SQLAlchemy in this format dialect[+driver]://user:password@host/dbname[?key=value..]. Currently Airflow has a problem, when user tries to create DB connection without dbname, but with extras user meets error. It happens, because get_uri function adds extras to the host without division by / e.g. conn-type://login:password@:3306?charset=utf-8 is wrong URL the correct URL should look like this conn-type://login:password@:3306/?charset=utf-8.

In last commit I have moved unit tests from test_mysql to test_dbapi, because my change relates to all DB providers.

@potiuk
Copy link
Member

potiuk commented Jun 15, 2022

@potiuk I think my change in get_uri method shouldn’t break something, conversely this change will fix current issue. Our get_uri function should return DB connection URL for create_engine function in SQLAlchemy in this format dialect[+driver]://user:password@host/dbname[?key=value..]. Currently Airflow has a problem, when user tries to create DB connection without dbname, but with extras user meets error. It happens, because get_uri function adds extras to the host without division by / e.g. conn-type://login:password@:3306?charset=utf-8 is wrong URL the correct URL should look like this conn-type://login:password@:3306/?charset=utf-8.

Could you please point me to the docs/specs where conn-type://login:password@:3306?charset=utf-8 is wrong ? Is it a mysql sqlalchemy or "generic" sqlachemy problem? I am afraid that this might break something if this is mysql specific limitation rather than generic sqlalchemy one.

FWIW the URI specification https://datatracker.ietf.org/doc/html/rfc3986#section-3.3 is quite clear that the path component of the URI can be empty, and empty path is different than "root path" - (indicated by "/" ) - so I wonder if this change will not have impact on any other sqlalchemy URIs (especially with schema-less databases).

According to the RFC - conn-type://login:password@:3306?charset=utf-8 is perfectly good URI with "empty" path.

Can you trace down where the limitation of "require / path if there is a query part" comes from?

@MaksYermak
Copy link
Contributor Author

@potiuk I think my change in get_uri method shouldn’t break something, conversely this change will fix current issue. Our get_uri function should return DB connection URL for create_engine function in SQLAlchemy in this format dialect[+driver]://user:password@host/dbname[?key=value..]. Currently Airflow has a problem, when user tries to create DB connection without dbname, but with extras user meets error. It happens, because get_uri function adds extras to the host without division by / e.g. conn-type://login:password@:3306?charset=utf-8 is wrong URL the correct URL should look like this conn-type://login:password@:3306/?charset=utf-8.

Could you please point me to the docs/specs where conn-type://login:password@:3306?charset=utf-8 is wrong ? Is it a mysql sqlalchemy or "generic" sqlachemy problem? I am afraid that this might break something if this is mysql specific limitation rather than generic sqlalchemy one.

FWIW the URI specification https://datatracker.ietf.org/doc/html/rfc3986#section-3.3 is quite clear that the path component of the URI can be empty, and empty path is different than "root path" - (indicated by "/" ) - so I wonder if this change will not have impact on any other sqlalchemy URIs (especially with schema-less databases).

According to the RFC - conn-type://login:password@:3306?charset=utf-8 is perfectly good URI with "empty" path.

Can you trace down where the limitation of "require / path if there is a query part" comes from?

@potiuk yes of course in sqlalchemy doc https://docs.sqlalchemy.org/en/14/core/engines.html#database-urls has link to RFC-1738 spec. This spec in point 3.3 has this string If neither <path> nor <searchpart> is present, the "/" may also be omitted.. For me this string means that we should omitt / only in case when we don't have both of parameters <path> and <searchpart> if we have one of them the / should be. Correct me please if I make mistake?

@potiuk
Copy link
Member

potiuk commented Jun 15, 2022

@potiuk yes of course in sqlalchemy doc https://docs.sqlalchemy.org/en/14/core/engines.html#database-urls has link to RFC-1738 spec. This spec in point 3.3 has this string If neither nor is present, the "/" may also be omitted.. For me this string means that we should omitt / only in case when we don't have both of parameters and if we have one of them the / should be. Correct me please if I make mistake?

Ah OK. Thanks for pointing it out. Indeed '/' was obligatory in 1738. SQL alchemy follow the older URL RFC (1738) which has actually been deprecated by URI (3986) - even though URI would be more approprite :D :

This document obsoletes [RFC2396], which merged "Uniform Resource
Locators" [RFC1738] and "Relative Uniform Resource Locators"
[RFC1808] in order to define a single, generic syntax for all URIs.
It obsoletes [RFC2732], which introduced syntax for an IPv6 address.
It excludes portions of RFC 1738 that defined the specific syntax of
individual URI schemes; those portions will be updated as separate
documents. The process for registration of new URI schemes is
defined separately by [BCP35]. Advice for designers of new URI
schemes can be found in [RFC2718]. All significant changes from RFC
2396 are noted in Appendix D.

@potiuk
Copy link
Member

potiuk commented Jun 15, 2022

It's ok but we should change the title (as it is not MySQL -specific any more - but we can fix it at merge) and since it touches the core, I will need another approval (@uranusjr maybe?).

@github-actions github-actions bot added the full tests needed We need to run full set of tests for this PR to merge label Jun 15, 2022
@github-actions
Copy link

The PR most likely needs to run full matrix of tests because it modifies parts of the core of Airflow. However, committers might decide to merge it quickly and take the risk. If they don't merge it quickly - please rebase it to the latest main at your convenience, or amend the last commit of the PR, and push it with --force-with-lease.

@potiuk potiuk merged commit ea54faf into apache:main Jun 19, 2022
@ephraimbuddy ephraimbuddy added this to the Airflow 2.3.3 milestone Jul 5, 2022
@ephraimbuddy ephraimbuddy added the type:bug-fix Changelog: Bug Fixes label Jul 5, 2022
ephraimbuddy pushed a commit that referenced this pull request Jul 5, 2022
* Fix bugs in URI constructor for MySQL connection

* Update unit tests

(cherry picked from commit ea54faf)
ephraimbuddy pushed a commit that referenced this pull request Jul 5, 2022
* Fix bugs in URI constructor for MySQL connection

* Update unit tests

(cherry picked from commit ea54faf)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:providers full tests needed We need to run full set of tests for this PR to merge type:bug-fix Changelog: Bug Fixes
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants