-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Webhook storage (fixes #2835) #3000
Conversation
@jameslamb just letting you know this has not been forgotten. I will take a deeper look at this sometime tomorrow! |
thanks! no problem |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks really really solid! Allow me to give this a true test with an API that I can upload some flows to. In the meantime I actually think this storage type could be expanded to also support the file-based storage options that the other storage types now also support as of the 0.12.5
release 🤔
ooooo I haven't looked at file-based storage yet! I found #2840 and #2944 and took a look. Is this what you're thinking for
I'm happy to add that if all Storage objects are expected to have it. |
@jameslamb Yep that is exactly it! It doesn't necessarily need to be added to WebHook but I see no reason why it couldn't handle it 🙂 |
cool ok, I can do that! I just looked and there is actually a |
I've added support for I'm proposing this design for file-based storage with
I decided on this approach because, in my experience, most services that read and write files expect binary content, not To test this added |
I just pushed one other small change in 8c0e32d, hope it's ok...I think I prefer That is the spelling used everywhere I looked, including Wikipedia, GitHub help, Atlassian Help, and IFTTT's docs. |
}, | ||
}, | ||
get_flow_http_method="POST", | ||
build_secret_config={ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I find this kwarg a bit confusing and limiting (it only works on headers? what if you need to set a query parameter or part of the body? what if you need to modify the secret before transmitting (e.g. adding a Bearer
prefix)). I wonder if we might instead support templating in the (recursive) values in get_flow_kwargs
/build_flow_kwargs
. The semantics might be:
- Recursively search the values (not keys) of dicts passed to
build_kwargs
andget_flow_kwargs
for strings - Replace any template strings found in those values, using environment variables first and falling back to secrets.
storage = Webhook(
build_kwargs={
"url": "https://content.dropboxapi.com/2/files/upload",
"headers": {
"Content-Type": "application/octet-stream",
"Dropbox-API-Arg": json.dumps(
{
"path": "/Apps/prefect-test-app/dropbox-flow.flow",
"mode": "overwrite",
"autorename": False,
"strict_conflict": True,
}
),
},
"Authorization": "${DBOX_OAUTH2_TOKEN}",
},
build_http_method="POST",
get_flow_kwargs={
"url": "https://content.dropboxapi.com/2/files/download",
"headers": {
"Accept": "application/octet-stream",
"Dropbox-API-Arg": json.dumps(
{"path": "/Apps/prefect-test-app/dropbox-flow.flow"}
),
},
},
get_flow_http_method="POST",
)
One way of doing this would be to make use of string.Template
and a magic mapping to handle dynamically looking up fields. We'd might want to change the regex to drop the $
prefix to make it similar to str.format
not (or maybe not? not sure what's clearer) but this works. (note that str.format
converts the mapping to a dict before formatting, so we can't use that to dynamically load secrets/environment variables unfortunately).
In [13]: from collections.abc import Mapping
In [14]: class Magic(Mapping):
...: def __getitem__(self, key):
...: print("Could lookup environment variable or secret here")
...: return "hello-world"
...: def __iter__(self):
...: return iter([])
...: def __len__(self):
...: return 0
...:
In [15]: magic_map = Magic()
In [16]: template = string.Template("Bearer ${token}")
In [17]: template.substitute(magic_map)
Could lookup environment variable or secret here
Out[17]: 'Bearer hello-world'
Could also use the regex module directly, which might be simpler 🤷.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I also find the get_flow
and build
prefixes on these kwargs a bit off. I know they correspond to the requests for the build
and get_flow
methods, but without the word request
in there build_kwargs
looks like kwargs to pass to build
to me. Feels too tied to the interface method names and not tied to what the requests are actually doing (storing and loading bytes). Perhaps?
store_request_kwargs=...,
store_request_method=...,
load_request_kwargs=...,
load_request_method=...,
I'd use put
and get
, except those conflict with the http methods. Not a strong opinion, just a thought.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought this was sufficiently complex that I shouldn't make it part of the first pass, but if you think it's necessary I'm happy to add it!
I'm a little worried about free-form templating everything though...that's going to be a problem if you have JSON jammed in a string, like the DropBox API requires (https://github.com/jameslamb/webhook-storage-integration-tests/blob/3bc93bf2ce4b9a0539306045f2f6a82bc3325c53/test-dropbox.py#L47). That opens you up to needing to know how to escape the right }
, which doesn't sound fun.
maybe it would be simpler to, instead of templating individual string fields, just allow people to replace the entire value of any build_kwarg or get_flow_kwarg with the value of an environment variable / secret?
what if you need to modify the secret before transmitting (e.g. adding a Bearer prefix))
I did think about this specific case. If your API token's literal value is abc
, there's no reason you couldn't put Bearer abc
into an environment variable / secret, right? Without needing to have any code run to add Bearer
to the front.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It feels non-intuitive to me to require storing a full authorization header/url/etc... in a secret to make proper use of it. If we keep the $
prefix requirement that string.Template
uses, that (I believe) avoids the issue of accidentally templating things that just happen to contain {}
characters, since they won't match the regex.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ooooo ook! I like that a lot. I'll add that to this PR.
As for the names, I feel that there's value in coupling to the method names actually. build()
and get_flow()
are important to understand when using a Storage object, I think, and I'd rather couple to those than invent another thing people have to reason about. But I do like adding _request
to make that clearer. How do you feel about build_request_*
and get_flow_request_*
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But I do like adding
_request
to make that clearer.
Makes sense to me!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok I just attempted the templating thing. Awesome suggestion, I like this a lot better than the secret_config
approach.
Commit: 937dd84
I also updated my integration tests and ran them to be sure it's working as expected: https://github.com/jameslamb/webhook-storage-integration-tests.
Note for reviewers
I think it could be valuable to offer more explicit control over whether environment variables or Prefect secrets are used, to avoid issues caused by conflicting names.
I think that could be done in a backwards-compatible way in the future, by adding a render_preferences
argument that is like {"SOME_VARIABLE": "environment"}
, which changes the behavior for rendering ${SOME_VARIABLE}
from "env --> secret --> error-if-absent" to "env --> error-if-absent".
I thought that complexity wasn't worth it for a first pass, but I'd be happy to add something like it if reviewers think it's a good idea.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just one small tweak and this looks good to me! Thanks for working on this!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, happy to merge on CI pass.
Thanks for the thorough reviews and for teaching me the |
changes/
directory (if appropriate)docs/outline.toml
for API reference docs (if appropriate)What does this PR change?
This PR proposes a new type of storage,
WebHook
.WebHook
storage is a type of storage wherebuild()
makes an HTTP request to store a flow andget_flow()
makes another HTTP request to retrieve it.This proposal is documented in #2835 and in the Prefect community Slack thread it links to.
Why is this PR important?
#2835 has more detail on why this is valuable, but the basic thread that led to this is as follows:
Storage
objects have to be deserializable from the JSON sent to Prefect Cloud, users cannot write their own arbitrary Python code: all storage has to use a class inprefect
Notes for reviewers
get_flow()
POST service.whatever/upload?apiKey={apikey}
, but I have worked with such services in the past.build_kwargs
and `get_flow_kwargso
andt
keys on my keyboard have some stuff under them so please keep an eye out for extra letters in my docs 😬Thanks for your time and consideration!