-
Notifications
You must be signed in to change notification settings - Fork 14.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Avoid requirement that AWS Secret Manager JSON values be urlencoded. #25432
Avoid requirement that AWS Secret Manager JSON values be urlencoded. #25432
Conversation
…encoding-requirement
Can you please make tests pass before anyone deeply dives? |
Also doc build fail |
Sorry about that, I'll get to this later today. |
f78b016
to
43286d4
Compare
@potiuk Tests passing. Sorry about that; I had a typo in one of the tests I added, and I'm still not entirely sure why the Sphinx issue happened, but it's fixed. |
@o-nikolas @ferruzzi @vincbeck -> I'd also love to hear your opinion on that one. |
docs/apache-airflow-providers-amazon/secrets-backends/aws-secrets-manager.rst
Outdated
Show resolved
Hide resolved
7ef3e3e
to
eb25b18
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Let me know @vincbeck if you have more comments, otherwise I merge it before releasing providers. |
I know the PR is already merged but LGTM :) |
This PR addresses #25104
Changes
1. JSON secrets do not need to be URL-encoded when
full_url_mode=False
.^ The whole reason for this PR. This is the main behavior that is being implemented.
Specifically, this behavior is implemented for when
get_connection()
is called. This method returns aOptional[Connection]
object. TheConnection()
can be built either using auri=?
, or with kwargs corresponding with the parts of the URI, e.g.host=?
,port=?
, etc. In the former case, URL-encoding is required; in the latter case, it is not.Users will receive a DeprecationWarning if the code detects that their secrets are URL-encoded (more on that in section 2 below). In most cases, the user should be able to simply decode their secret and everything will continue working normally.
I tried to make this change, as well as the other changes, as quietly as possible for typical Airflow runtimes. Basically, if the user isn't doing anything weird, the only thing they will be required to do is change their secrets from being URL-encoded to decoded.
To support this behavior, a few additional things were either needed to be changed, or made a lot of sense to change. The rest of this message will describe what those additional changes are.
2. Added
secret_values_are_urlencoded=?
kwarg forSecretsManagerBackend.__init__
, albeit most users do not need to touch this.@potiuk suggested adding a kwarg for assisting in migration. I wanted to avoid this if necessary because it is very obtrusive and causes a negative user experience.
Is it possible to avoid needing to reconfigure the secrets manager backend? Yes, in most cases!
What I dicovered is that if decoding the URL-encoded values is idempotent, the user needs to decode their secrets, but the
backend_kwargs
config option does not need to be adjusted. The reason why idempotency obviates a need to touch the config is that idempotency means the intended value of the secret is unambiguous once the values are decoded. This is a pretty big win from a user experience perspective because adjusting the Airflow configuration can be a nuisance in practice (e.g. a developer might need to get a system administrator involved).I add a longer explanation in the comments for the code:
The kwarg is needed in the very rare case that the kwarg is not idempotent, i.e. the string literal for the decoded secret contains a
%
. In this case, the user will need to manually setsecret_values_are_urlencoded
toFalse
.3. Send DeprecationWarning in a niche situation for
get_conn_value()
.get_conn_value()
is allowed to return a JSON as of 2.3.0.However, in the unlikely case that someone is both (1) using
get_conn_value()
directly, and (2) is usingfull_url_mode=False
, we want to warn them that they will no longer receive a URL in the future.The base implementation of
get_connection()
will generate aConnection
object whenget_conn_value()
returns a JSON-- or more conceptually, when the secret is a valid JSON.When
get_conn_value()
returns a JSON,get_connection()
is able to create aConnection
object from the JSON.a. Crucially, this does not require URL-encoding for the base Airflow APIs.
This is really challenging to do elegantly if the method
SecretsManagerBackend.get_conn_value()
needs to retain 100% backwards compatibility. By that I mean: if it is never allowed to return a JSON string representation of the secret.For example, if
_get_secret()
returns a string'{"host": "foo", "conn_type": "postgres", "schema": "mydb"}'
, then the current behavior is thatSecretsManagerBackend.get_conn_value()
will return a string'postgres://:@foo/mydb'
.Under the base class's implementation,
BaseSecretsBackend.get_conn_value()
is allowed to return a JSON string. ButSecretsManagerBackend
never does that. If the behavior of the overridden is relaxed to allow for JSON strings as per the base implementation, then this becomes a little easier to do without writing complete spaghetti.For "typical" Airflow API usage there is no harm because
get_conn_value()
is not typically called directly. However, this is a pretty big package, and lots of people do lots of things you might not expect, so I opted to go with smoothly transitioning away from returning a URI.4. Deprecate
ast.literal_eval
(i.e. support for dict reprs) for JSON decoding.I want to be diplomatic, but I also want to get to the point, so please do not interpret this negatively. Here is the original code:
^ This is speculation, but upon review, it appears that
ast.literal_eval
may have been added to the PR because one of the test cases contains a trailing comma in a JSON, and the author forced the test to pass usingast.literal_eval
instead of removing the trailing comma and usingjson.loads
, which would be the more typical thing to do.There are two reasons to remove
ast.literal_eval
and just usejson.loads
. The first reason is a bit more philosophical, which is that the API should not support an odd implementation and should not hand-hold for mistakes at this level of simplicity.The second reason is to provide a little more consistency across the Airflow API:
{'conn_type': 'postgres', 'host': 'postgres'}
is not a valid secret elsewhere in Airflow, but{"conn_type": "postgres", "host": "postgres"}
is.get_conn_value()
is allowed to return a JSON string, but not a dict repr. When we migrate towardget_conn_value()
returning a JSON string, we should discourage use of dict reprs.The original PR that implemented the
ast.literal_eval()
approach (#15104) was primarily focused on adding more support for various aliases for keys. Overall, this a fine addition to the code. For various reasons, maintainers should be committed to retaining this feature. However, the use ofast.literal_eval
was never part of the discussion for this specification; there was not a PR specifically devoted to migratingjson.loads
toast.literal_eval
.Removing it also should not be disruptive to the vast majority of people, since:
{'foo': 'bar'}
.)5. Support
extra
being either a JSON string or a dictAWS Secrets Manager allows for storage of arbitrary strings. For example, both of these are valid secrets to store within the Secrets Manager:
and
Right now, the former is supported but not the latter. There doesn't seem to be a good reason why the latter should not be supported, other than that the AWS UI will fail to parse key-value pairs. But it's still a valid secret!
6. Added some type annotations.
Some function signatures were missing type annotations, so I added them in. All of the type annotations for overridden methods adhere to the signatures from the base implementation of the class.