-
Notifications
You must be signed in to change notification settings - Fork 212
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Provider: justtakeitfree.com #2793
Conversation
4cda04e
to
68a860f
Compare
7178faa
to
c149723
Compare
50b6f82
to
4fc5a90
Compare
@obulat What is this new DAG missing for being ready for review? |
4fc5a90
to
e2a8a7b
Compare
e2a8a7b
to
e94fbea
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks fantastic, no blocking concerns! I see 167 results, data all looks great. Just confirming, were we interested in using the preview_url
from the API for thumbnails?
catalog/tests/dags/providers/provider_api_scripts/test_justtakeitfree.py
Outdated
Show resolved
Hide resolved
That's a great suggestions, the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good! I was able to get the images using the API key and with a few tweaks. There is only one blocking request for it that is related to the last-minute added thumbnails.
catalog/tests/dags/providers/provider_api_scripts/test_justtakeitfree.py
Outdated
Show resolved
Hide resolved
def get_file_info(self, url) -> int | None: | ||
"""Get the image size in bytes.""" | ||
resp = self.delayed_requester.head(url) | ||
if resp: | ||
filesize = int(resp.headers.get("Content-Length", 0)) | ||
return filesize if filesize != 0 else None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we maybe get the sizes in the same/a similar way? 🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I couldn't find the sizes in any response 😞
Output: TSV file containing the media and the | ||
respective meta-data. | ||
|
||
Notes: https://justtakeitfree.com/api/api.php |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we leave a note that it needs authentication? Just linking to the GH issue would help as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I looked at the other DAG notes, and we usually don't say whether the provider needs authentication or not. Do you think we should add it for every provider?
Do you suggest adding something like:
This API requires an API key. For more details, see #2793
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I believe that would be helpful to know when running, especially because this comment gets displayed in the Airflow UI as well!
Co-authored-by: Krystle Salazar <[email protected]>
…eitfree.py Co-authored-by: Krystle Salazar <[email protected]>
Co-authored-by: Krystle Salazar <[email protected]>
b817704
to
8b90085
Compare
Output: TSV file containing the media and the | ||
respective meta-data. | ||
|
||
Notes: https://justtakeitfree.com/api/api.php |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I believe that would be helpful to know when running, especially because this comment gets displayed in the Airflow UI as well!
Fixes
Fixes #1264 by @Aventurier
Description
Adds the script to get all the media from justtakeitfree.com. Currently, there are 167 images.
To collect the filesize, this script makes head requests for individual media items, which make the script slower than expected. But it should be okay considering the number of available images.
Image dimensions are not available, so they will need to be collected separately in the future.
Testing Instructions
Run the
justtakeitfree
DAG and check that it succeeds.See what it saves using
(should return 167).
Check that tags, filesize creator, creator url are saved correctly:
Checklist
Update index.md
).main
) or a parent feature branch.Developer Certificate of Origin
Developer Certificate of Origin