-
Notifications
You must be signed in to change notification settings - Fork 54
Add HEAD
requests support to DelayedRequester
#865
Add HEAD
requests support to DelayedRequester
#865
Conversation
180d6d2
to
367118f
Compare
HEAD
requests support to DelayedRequester
@@ -50,23 +49,24 @@ def __init__(self, delay=0, headers=None): | |||
self._last_request = 0 | |||
self.session = requests.Session() | |||
|
|||
def get(self, url, params=None, **kwargs): | |||
def _make_request(self, method, url, **kwargs): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
def _make_request(self, method, url, **kwargs): | |
def _make_request(self, method: Literal['get', 'head'], url: str, **kwargs): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👋 Hi @obulat - thanks for the suggestion. Could you help me understand this literal type constraint better? I was under the impression that the calls to _make_request
were passing methods from the Requests module and not string values, but I'm probably missing something.
To experiment I changed the Literal
params to something nonsensical (instead of get
and head
) and just lint
and just test
passed when I was expecting them to fail.
Also, if you're suggesting that we make the Requests module calls only within _make_request
based on the string passed then that makes sense, but I wanted to get confirmation. Thank you!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Gah, that's right! This is actually of type callable
, rather than Literal
. I think the way it is presently is fine, but it might be worth adding the generic callable
type annotation (trying to match the call signatures of both .head
and .get
will be a head-ache 😄)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for stepping in, @AetherUnbound, and sorry I couldn't reply sooner, @twstokes.
|
||
Required Arguments: | ||
|
||
url: URL to make the request as a string. | ||
params: Dictionary of query string params | ||
**kwargs: Optional arguments that will be passed to `requests.get` | ||
**kwargs: Optional arguments that will be passed to the `requests` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The docstrings params need to be updated, too: add method
and remove params
(or keep it if it's possible to document a **kwargs
parameter).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated in 8c5d0a2.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I really love this approach! So nice to see you in Openverse, @twstokes!
Looks like we could fold one more fix into this! WordPress/openverse#1578 |
Thank you @obulat and @AetherUnbound for the suggestions! I've been AFK but will have a chance to address these soon. 🙇 |
Hey @AetherUnbound! 👋 I attempted to:
here, but ran into some exceptions that may be related to what you've dealt with in the past. At that point I didn't identify a quick fix and felt like I was going outside of the original mission of this PR. If you have any guidance I'll be glad to wrap it into these changes. Thanks! |
I wanted to highlight this comment by @krysal and that I'm not totally confident the responses are the same because I wasn't sure of how to test the |
Thank you @twstokes! And ah, Freesound is indeed a special case. I think we can move forward with this PR without that as a fix, since we'll work to address it in WordPress/openverse-catalog#754 anyway. If you're interested in including it still, I think we'd want to keep the decorator but also add the Probably the best way to ensure that we're getting the same content back from the requests is to test running each of the altered provider DAGs locally and making sure we get the appropriate filesizes where expected! I will try to run each of those today and share the results 🙂 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I ran both the Stocksnap and WordPress providers locally (after the suggested changes below) and they looked good! We'll definitely need to address the return issue before this is merged.
If you'd like to try and run this yourself @twstokes, do the following:
- Add
AIRFLOW_VAR_INGESTION_LIMIT=200
to your.env
file - Run
just up
- Login to Airflow using
airflow
/airflow
as user/pass respectively. - Flip the switch on the
stocksnap_workflow
DAG and make sure it runs successfully (you can click on the DAG name itself to get to more useful views).
Co-authored-by: Madison Swain-Bowden <[email protected]>
Thanks for all your help @AetherUnbound 😄. I ran both the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks great! I have one more suggestion, but after that this is good to go. Thanks for all the iteration on this! 🚀
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great documentation and typing!
Fixes
HEAD
method toDelayedRequester
openverse#1579 by @AetherUnboundWordPressDataIngester::_get_filesize
should not useget_response_json
openverse#1373 by @AetherUnboundDescription
This PR adds a
head
method to theDelayedRequester
class and updates file size calls in the StockSnap and WordPress ingesters to use it.Testing Instructions
See steps @AetherUnbound provided here.
Checklist
Update index.md
).main
) or a parent feature branch.Developer Certificate of Origin
Developer Certificate of Origin