-
Notifications
You must be signed in to change notification settings - Fork 14.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Task instance log_url is overwrites existing path in base_url #32996
Comments
yes, I think how Additionally the relatively URL should not contain a leading There are some examples mentioned here: python/cpython#96015 |
hi @wolfier where exactly are we seeing the effect of this in the UI? I tried to set a custom path and able to fetch logs for the task in the UI (looks like it is able to get the right path). |
I am seeing this in places that can reference task instances and its attributes. This can be in the following places but not limited to:
|
While working with `log_url` property of the task instances, it is observed that `urljoin` ignores the part of the path after the last slash specified in the base_url when it does not end with a trailing slash and Airflow webserver does not allow setting the base_url with a trailing slash. Additionally, it is also observed that if the relative URL has a leading slash, `urljoin` just ingores the base URL and returns the relative URL. Hence, we add a new utlity method `safe_urljoin` to handle these cases. closes: apache#32996
It is observed that urljoin is not yielding expected results for the task instance's log_url which needs to be a concatenation of the webserver base_url and specified relative url. The current usage of urljoin does not seem to be the right way to achieve this based on what urljoin is meant for and how it works. So, we use simple string concatenation to yield the desired result. More context in the comment apache#31833 (comment) closes: apache#32996
It is observed that urljoin is not yielding expected results for the task instance's log_url which needs to be a concatenation of the webserver base_url and specified relative url. The current usage of urljoin does not seem to be the right way to achieve this based on what urljoin is meant for and how it works. So, we use simple string concatenation to yield the desired result. More context in the comment #31833 (comment) closes: #32996
It is observed that urljoin is not yielding expected results for the task instance's log_url which needs to be a concatenation of the webserver base_url and specified relative url. The current usage of urljoin does not seem to be the right way to achieve this based on what urljoin is meant for and how it works. So, we use simple string concatenation to yield the desired result. More context in the comment #31833 (comment) closes: #32996 (cherry picked from commit baa1bc0)
Apache Airflow version
2.6.3
What happened
A task instance's log_url does not contain the full URL defined in base_url.
What you think should happen instead
The base_url may contain paths that should be acknowledged when build the log_url.
The log_url is built with urljoin. Due to how urljoin builds URLs, any existing paths are ignored leading to a faulty URL.
How to reproduce
This snippet showcases how urljoin ignores existing paths when building the url.
Operating System
n/a
Versions of Apache Airflow Providers
No response
Deployment
Astronomer
Deployment details
No response
Anything else
This was introduced by #31833.
A way to fix this can be to utilize urlsplit and urlunsplit to account for existing paths.
Here is the fix in action.
Are you willing to submit PR?
Code of Conduct
The text was updated successfully, but these errors were encountered: