-
Notifications
You must be signed in to change notification settings - Fork 214
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Image source: justtakeitfree.com #1264
Comments
Thank you for the source suggestion, @Aventurier! Do you have an API that Openverse could use to get the images? |
Based on a little digging, the site does not have an API, but they do have a clean markup that can be used to run through and scrape the site. They do provide quite a bit of info (like tags) and all images credit "Justtakeitfree Free Photos" as the author. One thing that's missing is a title. None of the images are titled and only use a numeric ID as the identifier and in places like the HTML I'm not aware of the scraping policy of the catalog and if a REST API is a requirement but this site has a small collection of very high quality images that might make a nice addition to our content. |
Without a response from @Aventurier regarding the API, I think we should plan to scrape. There are currently 6 pages of results and around 178 results (based on https://justtakeitfree.com/photo/178/ existing and anything beyond that like https://justtakeitfree.com/photo/179/ and https://justtakeitfree.com/photo/180/ returning a 404, though https://justtakeitfree.com/photo/1/ also 404s). If the DAG requested one page every two seconds it would only take around 6 minutes to ingest the entire provider. We could do that monthly to reduce impact from the scraping. Seems doable and I think the assumption that we can scrape is safe considering the volume and lack of ToS. As for the title to use, the site itself appears to use the first tag in the list of tags for the filename when you click the download link. We can do the same. Regarding the attribution, the author should be as Dhruv mentioned "Justtakeitfree Free Photos". Based on the text of the issue ("We host only our photos") it sounds like that's an appropriate attribution to credit the creators. I dumped EXIF on one of the images and there is nothing to suggest otherwise. So to clarify the DAG implementation:
|
I'm sorry for long pause. Actually I did small API that can search for an images by tag and retrieve information about image. |
No worries, Aventurier! Thanks for letting us know. Can you share whether there is a way to paginate through the API? For Openverse's catalogue to be able to get all the images, we'd need to be able to use the API to paginate through all the images rather than for just particular tags. Something like: https://justtakeitfree.com/api/api.php?page=1 etc., without any query terms. Is the email on the privacy policy page the best location to get in touch regarding a key specifically for Openverse (to avoid the secret leaking publicly)? |
Done Please, leave me your mail, I will send a new key and then will delete this one |
@Aventurier amazing! Thank you so much. You can email us at [email protected] with a new key. |
API key received. Thank you, @Aventurier! |
Source Site
https://justtakeitfree.com/
Value Provided
It's an independent project from ukrainian family. We host only our photos.
Licenses Provided
CC BY 4.0
Implementation
The text was updated successfully, but these errors were encountered: