-
Notifications
You must be signed in to change notification settings - Fork 33
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Why not implement newest, near and oldest for the CDX Server API as we have for the Availability API #155
Labels
enhancement
New feature or request
Comments
akamhy
changed the title
Why not implement newest, near and oldest fror the CDX Server API as we have for the Availability API
Why not implement newest, near and oldest for the CDX Server API as we have for the Availability API
Feb 16, 2022
|
Near can be implemented by leveraging the waybackpy --url google.com --cdx --limit 1 --from 201010101010 waybackpy --url google.com --cdx --limit -1 --to 201010101010 Pick the closest one which has a better HTTP status code. |
Use internetarchive/wayback#237 (comment) for near. |
Implement near from https://web.archive.org/cdx/search/cdx?url=google.com&limit=1&closest=20101010101010&sort=closest&filter=statuscode:200 |
akamhy
added a commit
that referenced
this issue
Feb 17, 2022
see https://nla.github.io/outbackcdx/api.html#operation/query sort takes string input which must be one of the follwoing: - default - closest - reverse This commit shall help in closing issue at #155
akamhy
added a commit
that referenced
this issue
Feb 17, 2022
see https://nla.github.io/outbackcdx/api.html#operation/query sort takes string input which must be one of the follwoing: - default - closest - reverse This commit shall help in closing issue at #155
akamhy
added a commit
that referenced
this issue
Feb 17, 2022
* add sort param support in CDX API class see https://nla.github.io/outbackcdx/api.html#operation/query sort takes string input which must be one of the follwoing: - default - closest - reverse This commit shall help in closing issue at #155 * add BlockedSiteError for cases when archiving is blocked by site's robots.txt * create check_for_blocked_site for handling the BlockedSiteError for sites that are blocking wayback machine by their robots.txt policy * add attrs use_pagination and closest, which are can be used to use the pagination API and lookup archive close to a timestamp respectively. And now to get out of infinte blank pages loop just check for two succesive black and not total two blank pages while using the CDX server API. * added cli support for sort, use-pagination and closest * added tests * fix codeql warnings, nothing to worry about here. * fix save test for archive_url
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Is your feature request related to a problem? Please describe.
Yes, the Availability API is not reliable when compared to the CDX server API. And Server usage has a steeper curve, and instead of telling users to implement these methods on their own using the interface, it would be cool to have these methods in the
WaybackMachineCDXServerAPI
class.Describe the solution you'd like
The following three methods should be inside the
WaybackMachineCDXServerAPI
class.WaybackMachineCDXServerAPI.newest()
WaybackMachineCDXServerAPI.near()
WaybackMachineCDXServerAPI.oldest()
Describe alternatives you've considered
N/A
Additional context
See #154
The text was updated successfully, but these errors were encountered: