-
Notifications
You must be signed in to change notification settings - Fork 14.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Enable Connection creation from Vault parameters #15013
Enable Connection creation from Vault parameters #15013
Conversation
22defff
to
020d26f
Compare
That's not true. Or at least I don't think that's true. We have a convenience method Try creating your connection as desired with init params including your json Use this URI in vault. If you have a json value for extra that can not be serialized to airflow URI please post it here. edit it is true. currently uri can only store key-value pairs. PR to add support for this: #15100 |
@dstandish Even so, might not this be a good feature anyway? A |
There is a benefit inherent in having only one perfectly good way to do things. Sometimes when you want to do something another way it's best kept in a private repo for your company or posted on a blog or your own github. I would be in favor of only using the airflow conn uri format. But let's see what others think -- I'm just one voice here so don't be too discouraged. In the meantime I encourage you to try the uri generation function we have. There is documentation on generating the uri in the managing connections doc. |
I see you added an example where extra is not valid json. Why do you want to do that? Why not just use JSON? E.g. if you put fubar in quotes probably it works Then when you call extra_dejson it should return just the string fubar update tried this and doesn't work. has to be urlencodable json c = Connection(extra='"hello"')
c.extra_dejson # works
c.get_uri() # does not |
@dstandish Sure, I take your point but it is also nice to have options. For the record, I have tried the |
@dstandish If you read the description of PR you can see an example of It was then required to go in and manually change the application code for tenants that used their
There does not currently exist one perfectly good way to do things. There are two incompatible ways, and the main interface users use is the Just to clarify again, the connection URI format doesn't even handle all valid JSON. It only handles unnested JSON. If you want |
Yeah you make a good point that it can't store arbitrary json (I do imagine we could add support for this within the URI format). Based on that assumption, we do have fairly comprehensive tests that verify that URI parsing and URI generation produce consistent results. But it's true we don't test the case where you store an arbitrary value in extra such as Currently it is assumed dict, so like Anyway I'll let others chime in at this point. |
In the documentation of the airflow/airflow/models/connection.py Lines 85 to 87 in 6b9b067
However, AFAICT, the constructor will not check this, and will not fail when Is there a good reason why |
I was inspired by this conversation to make a PR that adds support for arbitrary json in conn uri format: #15100. We could also add support of the case of an unquoted arbitrary string value by modifying extra_dejson to return the string rather than empty dict in this case (see note at end of the PR description). It's orthogonal to this PR but obviously a closely related topic.
I think that the reason is to provide a reliable interface so that you know that extra params are accessible in a dictionary at property
Yeah probably it makes sense to enforce that |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK i was able to add support in get_uri for arbitrary i.e. non-json string in #15100. behavior is still to return |
We expect the extra field to be a dictionary in Web UI. The use of primitive types also causes a problem with extending a given connection ie. it is not future-proff. We also have a design misunderstood here. This field should contain extra fields, not additional data. |
This must be as of Airflow 2.0, because this isn't the case in 1.10.x. The additional validation should have been added on the |
I like the idea of supporting other formats. LocalFilesystemBackend supports already object representation also: airflow/airflow/secrets/local_filesystem.py Lines 188 to 223 in a4aee3f
|
Yes. I think we should add it. Can you help with it? |
Sure thing, I'll open another PR for that. Curious what the release target will be because it will be a breaking change. So just to summarize where we're at:
|
This will be the release manager's decision, but I would consider that this is not a breaking change as the design assumption of this feature was to store objects that have a name and a value. Now we only improve this feature by adding validation. So I think, we should release it in Airflow 2.1.
Yes. Exactly. It would be fantastic if we could separate the common part with LocalFilesystemBackend. |
@avieth. Do you have any use cases that you can't solve with JSON? Object representation is more future-proof because you can always add a new key and tag the old one as deprecaated. It allows for smooth updates. |
@mik-laj curious what do you think about deprecating the conn uri format and replacing with JSON, or perhaps allowing both globally? We could implement backward compat with try json parse except conn uri parse |
I think we support both. The specific choice depends on the specific use case. To be able to use the same value as an environment variable and to copy easily to the clipboard, you can use a URI. for more convenient editing, you can use object representation. |
No I mean deprecate across airflow including with env vars and cli ( or support both simultaneously ) There's not really anything special about the uri format. We could store json in env vars My support of using conn uri is like you said the convenience of switching between backend and env and cli and the virtue of uniformity but if we switch to json globally that could be a good thing |
I see no reason to do this. Too many users already use URIs. Besides, for simple connections it is a very good representation. |
020d26f
to
2fdf132
Compare
665c855
to
c2af3d3
Compare
from airflow.providers.hashicorp._internal_client.vault_client import _VaultClient # noqa | ||
from airflow.secrets import BaseSecretsBackend | ||
from airflow.secrets.local_filesystem import _create_connection |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should extract the common part to the new package. This is problematic now, so I started discussions on the mailing list. See: https://lists.apache.org/thread.html/r713f180120d0a39b53567812eb5db34f992ec81979818d2175598b71%40%3Cdev.airflow.apache.org%3E
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In this particular case I can see this function existing in airflow.models.connections
as from_dict() -> Connection
. Transforming a dict to the object is the sort of function that usually exists with the class. I agree with the general case that something needs to be done about common functionality between providers.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can't add a new feature to the core because we want to be backwards compatible with Airflow 2.0.0. I opened the discussions to loosen this restriction, because they do not conform to reality. I think we should maintain backward compatibility with the MINOR release, not the MAJOR.
I invite you to the discussion on the mailing list if you would like to share your thoughts.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's an additional function on the Connection class so that retains backwards compatibility no?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Secret Backend won't work with Airflow 2.0.0, so this is a breaking change to providers packages. This doesn't make breaking changes to Airflow 2.0.0, but to providers packages it does.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So you are saying that if this vault provider expects a from_dict
method on Connection
(say, to be added in 2.1) the provider won't work with 2.0, but we've promised that it will.
should we perhaps then just merge this as is? or perhaps duplicate this private function into a vault provider utils module with a note of some kind to replace with Connection.from_dict when it is available?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. Exactly.
We should discuss the approach on the mailing list and choose the best solution that everyone accepts. I personally think that we should mark this package as only supported by Airflow 2.1, because trying to maintain backward compatibility will limit our development possibilities. It should be normal for users that new packages/library versions may require a core version if we add new features. See: https://lists.apache.org/thread.html/r713f180120d0a39b53567812eb5db34f992ec81979818d2175598b71%40%3Cdev.airflow.apache.org%3E
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I started discussions on this subject on the mailing list, but I haven't checked it recently. I encourage you to discuss it yourself because then we will be able to work out a solution to this problem faster.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have the impression that we have managed to reach a consensus. This change will be in the next provider package after Airflow 2.1 is released. There is a slight problem with the provider packages releases process. We always release providers packages from the main branch, so we cannot merge this PR until we release Airflow 2.1 and we will start releasing providers packages that will be compatible with this version.
In the meantime, we can create a new public method that will create connections from the dictionary. This method will be released in Airflow 2.1. PR: #15425
I added this PR to the Airflow 2.1 milestone.
Before this commit there was a documented but unenforced limitation that the extra parameter be encoded JSON. In apache#15013 this issue garnered attention and motivated this PR.
We cannot merge this PR until we release Airflow 2.1 and we will start releasing providers packages that will be compatible with this version. More info: #15013 (comment) |
@mik-laj are we able to merge this now? Or is there something else we're waiting on? |
@jhtimmins Unfortunately not, as #15425 didn't get finished (or possibly reviewed! Sorry) in time to make it for 2.1 |
marked it as 2.2 since it is not a bug fix |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@natanweinberger Could you take a look and review this PR/tell us what changes are needed now your other change has landed?
8c4aaa3
to
faa5008
Compare
docs/apache-airflow-providers-hashicorp/secrets-backends/hashicorp-vault.rst
Outdated
Show resolved
Hide resolved
(Code on this looks good now! A nice simple change too!) |
faa5008
to
1d39db1
Compare
Currently using the Vault secrets backends requires that users store the secrets in connection URI format: https://airflow.apache.org/docs/apache-airflow/stable/howto/connection.html#connection-uri-format Unfortunately the connection URI format is not capable of expressing all values of the Connection class. In particular the Connection class allows for arbitrary string values for the `extra` parameter, while the URI format requires that this parameter be unnested JSON so that it can serialize into query parameters. ``` >>> Connection(conn_id='id', conn_type='http', extra='foobar').get_uri() [2021-03-25 13:31:07,535] {connection.py:337} ERROR - Expecting value: line 1 column 1 (char 0) Traceback (most recent call last): File "/Users/da.lum/code/python/airflow/airflow/models/connection.py", line 335, in extra_dejson obj = json.loads(self.extra) File "/nix/store/8kzdflq0v06fq0mh9m2fd73gnyqp57xr-python3-3.7.3/lib/python3.7/json/__init__.py", line 348, in loads return _default_decoder.decode(s) File "/nix/store/8kzdflq0v06fq0mh9m2fd73gnyqp57xr-python3-3.7.3/lib/python3.7/json/decoder.py", line 337, in decode obj, end = self.raw_decode(s, idx=_w(s, 0).end()) File "/nix/store/8kzdflq0v06fq0mh9m2fd73gnyqp57xr-python3-3.7.3/lib/python3.7/json/decoder.py", line 355, in raw_decode raise JSONDecodeError("Expecting value", s, err.value) from None json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0) [2021-03-25 13:31:07,535] {connection.py:338} ERROR - Failed parsing the json for conn_id id 'http://' ``` As shown, the `extra` data is missing from the return value `http://`. Although there is an error logged, this does not help users who were previously able to store other data.
1d39db1
to
fc0dcff
Compare
Currently using the Vault secrets backends requires that users store
the secrets in connection URI format:
https://airflow.apache.org/docs/apache-airflow/stable/howto/connection.html#connection-uri-format
Unfortunately the connection URI format is not capable of expressing
all values of the Connection class. In particular the Connection
class allows for arbitrary string values for the
extra
parameter,while the URI format requires that this parameter be unnested JSON
so that it can serialize into query parameters.
As shown, the
extra
data is missing from the return valuehttp://
.Although there is an error logged, this does not help users who were
previously able to store other data.
^ Add meaningful description above
Read the Pull Request Guidelines for more information.
In case of fundamental code change, Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in UPDATING.md.