Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Why not implement newest, near and oldest for the CDX Server API as we have for the Availability API #155

Closed
akamhy opened this issue Feb 16, 2022 · 5 comments · Fixed by #159
Assignees
Labels
enhancement New feature or request

Comments

@akamhy
Copy link
Owner

akamhy commented Feb 16, 2022

Is your feature request related to a problem? Please describe.
Yes, the Availability API is not reliable when compared to the CDX server API. And Server usage has a steeper curve, and instead of telling users to implement these methods on their own using the interface, it would be cool to have these methods in the WaybackMachineCDXServerAPI class.

Describe the solution you'd like
The following three methods should be inside the WaybackMachineCDXServerAPI class.

  • WaybackMachineCDXServerAPI.newest()
  • WaybackMachineCDXServerAPI.near()
  • WaybackMachineCDXServerAPI.oldest()

Describe alternatives you've considered
N/A

Additional context
See #154

@akamhy akamhy added the enhancement New feature or request label Feb 16, 2022
@akamhy akamhy self-assigned this Feb 16, 2022
@akamhy akamhy changed the title Why not implement newest, near and oldest fror the CDX Server API as we have for the Availability API Why not implement newest, near and oldest for the CDX Server API as we have for the Availability API Feb 16, 2022
@akamhy
Copy link
Owner Author

akamhy commented Feb 16, 2022

@akamhy
Copy link
Owner Author

akamhy commented Feb 16, 2022

Near can be implemented by leveraging the to and from params.

waybackpy --url google.com --cdx --limit 1 --from 201010101010
waybackpy --url google.com --cdx --limit -1 --to 201010101010

Pick the closest one which has a better HTTP status code.

@akamhy
Copy link
Owner Author

akamhy commented Feb 17, 2022

Use internetarchive/wayback#237 (comment) for near.

@akamhy
Copy link
Owner Author

akamhy commented Feb 17, 2022

Implement near from https://web.archive.org/cdx/search/cdx?url=google.com&limit=1&closest=20101010101010&sort=closest&filter=statuscode:200
and oldest and newest should invoke near with appropriate args.

akamhy added a commit that referenced this issue Feb 17, 2022
see https://nla.github.io/outbackcdx/api.html#operation/query

sort takes string input which must be one of the follwoing:
- default
- closest
- reverse

This commit shall help in closing issue at #155
akamhy added a commit that referenced this issue Feb 17, 2022
see https://nla.github.io/outbackcdx/api.html#operation/query

sort takes string input which must be one of the follwoing:
- default
- closest
- reverse

This commit shall help in closing issue at #155
akamhy added a commit that referenced this issue Feb 17, 2022
* add sort param support in CDX API class

see https://nla.github.io/outbackcdx/api.html#operation/query

sort takes string input which must be one of the follwoing:
- default
- closest
- reverse

This commit shall help in closing issue at #155

* add BlockedSiteError for cases when archiving is blocked by site's robots.txt

* create check_for_blocked_site for handling the BlockedSiteError for sites that are blocking wayback machine by their robots.txt policy

* add attrs use_pagination and closest, which are can be used to use the pagination API and lookup archive close to a timestamp respectively. And now to get out of infinte blank pages loop just check for two succesive black and not total two blank pages while using the CDX server API.

* added cli support for sort, use-pagination and closest

* added tests

* fix codeql warnings, nothing to worry about here.

* fix save test for archive_url
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant