Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow querying RSS feeds for items published since some time #7116

Open
alexbecker opened this issue Dec 18, 2019 · 3 comments
Open

Allow querying RSS feeds for items published since some time #7116

alexbecker opened this issue Dec 18, 2019 · 3 comments
Labels
APIs/feeds awaiting-response PRs and issues that are awaiting author response feature request

Comments

@alexbecker
Copy link

What's the problem this feature will solve?
I run a PyPI mirror which uses the RSS feeds to keep in sync with new packages and releases, which always return the 40 most recent items. Currently I query the feeds every 5 minutes, but sometimes there have been more than 40 releases in the last 5 minutes. To get around this with the current API I would have to query much more frequently, which the vast majority of the time is unnecessary load on PyPI and depending on how these packages are being rapidly published might still be insufficient.

Describe the solution you'd like
The RSS feeds could accept a pair of query parameters:

  • limit: allow returning up to limit items instead of the current 40, probably up to some cap for performance reasons
  • max_age: return only items published in the last max_age seconds

This would make mirroring much easier, as I could set max_age to my polling interval plus epsilon. I would not be retrieving any more items than necessary, so I think the load on PyPI would be lighter.

Additional context
This accomplishes something similar to the deprecated XML-RPC changelog, but the documentation warns to use the RSS feed instead of that.

This request is probably of interest to the same audience of #1683 but the feature itself is orthogonal.

@di
Copy link
Member

di commented Jan 2, 2020

Thanks for the feature request! Apologies that we haven't been able to respond before you started work in #7117.

The reason the XML-RPC changelog is deprecated is largely due to the fact that it has this feature, which makes it extremely challenging to cache and thus pretty resource-intensive for PyPI.

Adding limit and max_age parameters would introduce the same problem for our RSS feeds, as there would be a huge number of unique cache entries that these parameters would introduce (which would all need to be invalidated when a new package affects them, and we'd need to know if a new package affects them), instead of the single-cache-entry-per-feed that we currently have.

You seem concerned about putting too much load on PyPI, which we definitely appreciate but I think in this instance is probably unnecessary since you'll almost always be hitting our cache. You could increase your requests to 1/min or even 1/s and it would be a small blip in our total traffic.

That would probably be the best solution in the short-term, but I understand that that might not be ideal for you or might be too resource intensive on your end.

I think the best long-term alternative would be to implement the hypermedia based API discussed in #284. Short term, we could potentially permanently increase the number of items returned by our RSS feeds (not sure why this is 40 right now, but it does seem low).

@yeraydiazdiaz
Copy link
Contributor

Short term, we could potentially permanently increase the number of items returned by our RSS feeds (not sure why this is 40 right now, but it does seem low).

Maybe we should create an issue for this? Maybe returning 100 items?

@di
Copy link
Member

di commented Jan 15, 2020

I was waiting to see if increasing the request rate works for @alexbecker, I think that's the right solution here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
APIs/feeds awaiting-response PRs and issues that are awaiting author response feature request
Projects
None yet
Development

No branches or pull requests

4 participants