Skip to content
This repository has been archived by the owner on Feb 22, 2023. It is now read-only.

Explore the use of Photon as a thumbnail service #979

Closed
1 task
AetherUnbound opened this issue Oct 14, 2022 · 6 comments · Fixed by #1056
Closed
1 task

Explore the use of Photon as a thumbnail service #979

AetherUnbound opened this issue Oct 14, 2022 · 6 comments · Fixed by #1056
Assignees
Labels
🕹 aspect: interface Concerns end-users' experience with the software 🧰 goal: internal improvement Improvement that benefits maintainers, not users 🟩 priority: low Low priority and doesn't need to be rushed

Comments

@AetherUnbound
Copy link
Contributor

Description

We are presently using a version of imaginary with a custom build process for addressing a few open issues. The build process for this service is infrequent, but can take several hours to run.

WordPress.com offers an open source image acceleration service called Photon which I believe performs all of the requisite actions we would need in order to generate thumbnails for Openverse.

This would allow us to remove one of the services required to run this stack in production. We could also use the service itself locally, or spin up a dockerized version of Photon for dev use.

There may be other reasons which might make Photon unsuitable for us, but it's worth exploring as a potential alternative in order to reduce our infrastructure complexity!

Implementation

  • 🙋 I would be interested in implementing this feature.
@AetherUnbound AetherUnbound added 🕹 aspect: interface Concerns end-users' experience with the software 🟩 priority: low Low priority and doesn't need to be rushed 🧰 goal: internal improvement Improvement that benefits maintainers, not users labels Oct 14, 2022
@sarayourfriend
Copy link
Contributor

I chatted about this with Zack today 1-on-1. Here are some things that came to mind during that discussion between the two of us:

  1. If we use Photon we should continue to proxy it via the API thumbnails endpoint. Essentially we would be swapping out the request to the imaginary service to Photon. While this would necessarily add network latency by introducing an "extra hop", it would (a) maintain the validity of all past API responses, (b) not introduce a new URL for thumbnails that people might be concerned about (the WordPress.com Photon instance domain is a bit cryptic), and finally, most importantly (c) would allow us to manage our own content safety by cache busting and blocking thumbnail responses for content that we remove from Openverse for one reason or another.
  2. We should make the upstream thumbnail endpoint more configurable, perhaps by making the setting for it a template string. For example, THUMBNAIL_PROVIDER = "https://w0.wp.com/{image_url}" or THUMBNAIL_PROVIDER = "http://localhost:52045/?url={image_url}" would cover both use cases and any future use cases. This makes it hypothetically easier for someone else to run an alternative Openverse instance, making the project more inclusive, broadly useful, and maintainable into the future.
  3. If in proxying the Photon service via the API, it would be tempting to try to introduce asyncio to that endpoint to stop the Django service from "waiting" on the upstream request to come back. I don't think this would work unless we converted the entire project to ASGI first as the async_to_sync utility will not prevent the request from blocking, it will just allow the use of async code which could itself perform non-blocking operations. I don't think this would free up the event loop to work on another request. It would be nice if we could measure the performance difference here to tell whether it actually mattered. To do this we should take measurements of the thumbnails endpoint now and then after introducing Photon—perhaps as a runtime flippable flag, though that would require a bit of work to make possible—to see whether there are performance considerations we would need to address to make Photon or any other external thumbnail compression provider a viable option for our use case.

Overall I suspect the implementation of this feature would be pretty easy for someone to do in a single PR without too much trouble. For local development, if Photon is easy to spin up in docker-compose then that seems like a good option. Otherwise, continuing to use imaginary for local development also seems fine, though with some inherent risk in the differences in API and clarity of production use-case. Answering the performance question, on the other hand, may take a bit of forethought and planning. Not too much, mind you, just enough to be able to answer the question of whether using Photon introduces a measureable and significant decrease in performance on the endpoint. Deciding what "significant" means here is probably the non-trivial part of this. To measure the current request times we can use log insights and reference the nginx logs for the endpoint timings. I can whip up a median and N95 query for this and share it here so anyone can check this if they work on it and have access to the production infrastructure.

@AetherUnbound
Copy link
Contributor Author

These are really great considerations Sara, thank you for sharing. Your points about proxying and endpoint configuration are well thought out. I was thinking we'd set the responses up that way but I hadn't had any particularly strong reasons as to why; these points solidify that in my mind!

We switched to imaginary without doing this performance analysis IIRC. I think it's a good idea, but I don't think we need add a bunch more logging/metrics in order to make this change. Getting a sense of median & N95 before and after sounds sufficient IMO.

@zackkrida
Copy link
Member

Thanks for summarizing our chat, Sara. I definitely think it would be great for us to implement and test Photon, and ultimately switch to it if the performance analysis looks promising.

@sarayourfriend
Copy link
Contributor

I'm going to work on this as I am blocked on working on additional infrastructure stuff at the moment as I wait for reviews/blockers to release and this will be needed for the completion of the API ECS migration.

@sarayourfriend
Copy link
Contributor

@AetherUnbound do you know if any of our upstream images have query strings in their URLs? I've been trying to play around with Photon to see if it is possible that it can handle them, but I'm having some trouble trying to figure it out.

I'm using the existing thumbnail proxy as an example of an image URL with a query parameter. Take this image:

https://api.openverse.engineering/v1/images/89b25e33-46fc-472f-a88a-75fd6328a8fe/thumb/

If we want the full sized version from our current proxy, we pass full_size=True:

https://api.openverse.engineering/v1/images/89b25e33-46fc-472f-a88a-75fd6328a8fe/thumb/?full_size=True

If we want to proxy this image via Photon (the full sized one), I'm not sure how to do it:

https://i0.wp.com/api.openverse.engineering/v1/images/89b25e33-46fc-472f-a88a-75fd6328a8fe/thumb/?full_size=True

That will not forward the full_size query param, which makes some sense.

It does look like Photon should support this based on the code here: https://code.svn.wordpress.org/photon/index.php

If you search for the handling of the 'q' param and origin_domain_exceptions variable you can see the code that supports it. But it could be disabled entirely or we might need to request access to it. I'll make a request to Automattic's systems folks and see if we're allowed to use it.

@dhruvkb
Copy link
Member

dhruvkb commented Nov 18, 2022

I was just reading up a bit about Photon and it seems like it also supports WEBP conversion like imaginary so that's another point in its favour.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
🕹 aspect: interface Concerns end-users' experience with the software 🧰 goal: internal improvement Improvement that benefits maintainers, not users 🟩 priority: low Low priority and doesn't need to be rushed
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants