Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add a check for trailing slash in webserver base_url #31833
Add a check for trailing slash in webserver base_url #31833
Changes from 3 commits
601649c
0480977
e75175f
f697a33
File filter
Filter by extension
Conversations
Jump to
There are no files selected for viewing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am wondering here that why can the base URL not have a trailing
/
?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is due to how web servers rewrite URLs when you deploy a webapp under a prefix. Say you deploy under
https://mydomain/airflow
. When you visithttps://mydomain/airflow/home
in the browser (as an example; any client applies), the proxy server (e.g. NGINX) would strip the prefix, and forward the rest of the URL to the app, so in Python code (e.g. Airflow) we can just handle/home
without needing to consider what prefix the entirety of Airflow is deployed under.Now you may say, hey, why can’t either the proxy server, the gateway protocol (WSGI in Python’s case), or the web framework (Flask for Airflow’s case) be smarter and just strip or add the slash in between as appropriate? And you would be right! Some of them actually can do this (NGINX has
merge_slashes
, for example), but not everyone does since the slash is technically significant and does change the URL’s meaning (even if that meaning can be nonsensical in the prefix), and that extra check is not free. So it’s better to avoid fighting the tools and just stick to the technically correct configuration.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
okay, I am coming from here on how the python
urllib.parse.urljoin
handles joining base URL and relative URLs.Wouldn't
https://my.astronomer.run/path
qualify as a valid base URL? But looks like theurljoin
just ignores the path when it does not end with a trailing/
. If it washttps://my.astronomer.run/path/
,urljoin
does not strip the path and makes the join as expected.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can handle more cases if users need it. This PR was a quick fix to prevent circular redirections.
I’m sure we can make this work with a trailing slash but requires a bit more work.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(Sorry, forgot to reply.)
Python’s
urljoin
implements the logic in<a href="...">
tags; I don’t recall the correct term for this, but it is the URL resolution logic when you click on links in a browser. URLs have two kinds, folder and file (not the right terms, but easier to understand). The URL would not have a trailing slash if you are in a file view; the trailing slash indicates a folder view. When you’re in a file, a relative path likefoo
indicates another file in the same folder. When you’re in a folder view, however,foo
means the foo entry in the folder. This means the resolution logic would change depending on whether the URL has a trailing slash or not.Note that Python’s local path joining logic (both pathlib and os.path) is different and does not consider the trailing slash; it instead matches the common logic used to join format in shell scripts.
But the prefix is an entirely different thing, and is intended to only be joined with simple string concatenation and skips all the folder/file logic because it is not semantically possible to support combining the prefix with absolute paths and relative paths. The only proper way to prepend a prefix is
prefix + path
(or other string concatenation methods such asf"{prefix}{path}"
, of course).There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
understood, thank you @uranusjr for the detailed explanation and propsoal.